public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 1/4] matcher-1.m: Change return type to int
@ 2020-05-04 19:01 H.J. Lu
  2020-05-04 19:01 ` [PATCH 2/4] x86: Add -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all] H.J. Lu
                   ` (3 more replies)
  0 siblings, 4 replies; 188+ messages in thread
From: H.J. Lu @ 2020-05-04 19:01 UTC (permalink / raw)
  To: gcc-patches
  Cc: Uros Bizjak, Jeff Law, Richard Biener, Jakub Jelinek, Qing Zhao,
	keescook, victor.rodriguez.bahena

my_exception_matcher must return int.  Otherwise, this test fails.

	PR testsuite/84324
	* objc/execute/exceptions/matcher-1.m (my_exception_matcher):
	Change return type to int.
---
 gcc/testsuite/objc/execute/exceptions/matcher-1.m | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/objc/execute/exceptions/matcher-1.m b/gcc/testsuite/objc/execute/exceptions/matcher-1.m
index cbe4365da90..25d6759cc9c 100644
--- a/gcc/testsuite/objc/execute/exceptions/matcher-1.m
+++ b/gcc/testsuite/objc/execute/exceptions/matcher-1.m
@@ -20,7 +20,7 @@ int main(void)
 
 static unsigned int handlerExpected = 0;
 
-void
+int
 my_exception_matcher(Class match_class, id exception)
 {
   /* Always matches.  */
-- 
2.26.2


^ permalink raw reply	[flat|nested] 188+ messages in thread

* [PATCH 2/4] x86: Add -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]
  2020-05-04 19:01 [PATCH 1/4] matcher-1.m: Change return type to int H.J. Lu
@ 2020-05-04 19:01 ` H.J. Lu
  2020-05-04 23:19   ` Rodriguez Bahena, Victor
  2020-05-05  8:14   ` Uros Bizjak
  2020-05-04 19:01 ` [PATCH 3/4] x86: Add ix86_any_return_p H.J. Lu
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 188+ messages in thread
From: H.J. Lu @ 2020-05-04 19:01 UTC (permalink / raw)
  To: gcc-patches
  Cc: Uros Bizjak, Jeff Law, Richard Biener, Jakub Jelinek, Qing Zhao,
	keescook, victor.rodriguez.bahena

Add -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all] command-line
option and zero_caller_saved_regs("skip|used|all") function attribue:

1. -mzero-caller-saved-regs=skip and zero_caller_saved_regs("skip")

Don't zero caller-saved registers upon function return.

2. -mzero-caller-saved-regs=used-gpr and zero_caller_saved_regs("used-gpr")

Zero used caller-saved integer registers upon function return.

3. -mzero-caller-saved-regs=all-gpr and zero_caller_saved_regs("all-gpr")

2. -mzero-caller-saved-regs=used and zero_caller_saved_regs("used")

Zero used caller-saved integer and vector registers upon function return.

3. -mzero-caller-saved-regs=all and zero_caller_saved_regs("all")

Zero all caller-saved integer and vector registers upon function return.

Tested on i686 and x86-64 with bootstrapping GCC trunk, making
-mzero-caller-saved-regs=used-gpr, -mzero-caller-saved-regs=all-gpr
-mzero-caller-saved-regs=used, and -mzero-caller-saved-regs=all enabled
by default.

gcc/

	* i386-expand.c (ix86_find_live_outgoing_regs): New function.
	(ix86_split_simple_return_pop_internal): Removed.
	(ix86_split_simple_return_internal): New function.
	* config/i386/i386-options.c (ix86_set_zero_caller_saved_regs_type):
	New function.
	(ix86_set_current_function): Call ix86_set_zero_caller_saved_regs_type.
	(ix86_handle_fndecl_attribute): Support zero_caller_saved_regs
	attribute.
	(ix86_attribute_table): Add zero_caller_saved_regs.
	* config/i386/i386-opts.h (zero_caller_saved_regs): New enum.
	* config/i386/i386-protos.h (ix86_split_simple_return_pop_internal):
	Renamed to ...
	(ix86_split_simple_return_internal): This.
	* config/i386/i386.c (ix86_expand_prologue): Replace
	gen_prologue_use with gen_pro_epilogue_use.
	(ix86_expand_epilogue): Replace gen_simple_return_pop_internal
	with ix86_split_simple_return_internal.  Replace
	gen_simple_return_internal with ix86_split_simple_return_internal.
	* config/i386/i386.h (machine_function): Add
	zero_caller_saved_regs_type, live_outgoing_int_regs and
	live_outgoing_vector_regs.
	(TARGET_POP_SCRATCH_REGISTER): New.
	* config/i386/i386.md (UNSPEC_SIMPLE_RETURN): New UNSPEC.
	(UNSPECV_PROLOGUE_USE): Renamed to ...
	(UNSPECV_PRO_EPILOGUE_USE): This.
	(prologue_use): Renamed to ...
	(pro_epilogue_use): This.
	(simple_return_internal): Changed to define_insn_and_split.
	(simple_return_internal_1): New pattern.
	(simple_return_pop_internal): Replace
	ix86_split_simple_return_pop_internal with
	ix86_split_simple_return_internal.  Always call
	ix86_split_simple_return_internal if epilogue_completed is
	true.
	(simple_return_pop_internal_1): New pattern.
	(Epilogue deallocator to pop peepholes): Enabled only if
	TARGET_POP_SCRATCH_REGISTER is true.
	* config/i386/i386.opt (mzero-caller-saved-regs=): New option.
	* doc/extend.texi: Document zero_caller_saved_regs attribute.
	* doc/invoke.texi: Document -mzero-caller-saved-regs=.

gcc/testsuite/

	* gcc.target/i386/zero-scratch-regs-1.c: New test.
	* gcc.target/i386/zero-scratch-regs-2.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-3.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-4.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-5.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-6.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-7.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-8.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-9.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-10.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-11.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-12.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-13.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-14.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-15.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-16.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-17.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-18.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-19.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-20.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-21.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-22.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-23.c: Likewise.
---
 gcc/config/i386/i386-expand.c                 | 281 ++++++++++++++++--
 gcc/config/i386/i386-options.c                |  67 +++++
 gcc/config/i386/i386-opts.h                   |   9 +
 gcc/config/i386/i386-protos.h                 |   2 +-
 gcc/config/i386/i386.c                        |   8 +-
 gcc/config/i386/i386.h                        |  16 +
 gcc/config/i386/i386.md                       |  54 +++-
 gcc/config/i386/i386.opt                      |  23 ++
 gcc/doc/extend.texi                           |  12 +
 gcc/doc/invoke.texi                           |  14 +-
 .../gcc.target/i386/zero-scratch-regs-1.c     |  12 +
 .../gcc.target/i386/zero-scratch-regs-10.c    |  21 ++
 .../gcc.target/i386/zero-scratch-regs-11.c    |  39 +++
 .../gcc.target/i386/zero-scratch-regs-12.c    |  39 +++
 .../gcc.target/i386/zero-scratch-regs-13.c    |  21 ++
 .../gcc.target/i386/zero-scratch-regs-14.c    |  19 ++
 .../gcc.target/i386/zero-scratch-regs-15.c    |  14 +
 .../gcc.target/i386/zero-scratch-regs-16.c    |  14 +
 .../gcc.target/i386/zero-scratch-regs-17.c    |  13 +
 .../gcc.target/i386/zero-scratch-regs-18.c    |  13 +
 .../gcc.target/i386/zero-scratch-regs-19.c    |  12 +
 .../gcc.target/i386/zero-scratch-regs-2.c     |  19 ++
 .../gcc.target/i386/zero-scratch-regs-20.c    |  23 ++
 .../gcc.target/i386/zero-scratch-regs-21.c    |  14 +
 .../gcc.target/i386/zero-scratch-regs-22.c    |  19 ++
 .../gcc.target/i386/zero-scratch-regs-23.c    |  19 ++
 .../gcc.target/i386/zero-scratch-regs-3.c     |  12 +
 .../gcc.target/i386/zero-scratch-regs-4.c     |  14 +
 .../gcc.target/i386/zero-scratch-regs-5.c     |  20 ++
 .../gcc.target/i386/zero-scratch-regs-6.c     |  14 +
 .../gcc.target/i386/zero-scratch-regs-7.c     |  13 +
 .../gcc.target/i386/zero-scratch-regs-8.c     |  19 ++
 .../gcc.target/i386/zero-scratch-regs-9.c     |  15 +
 33 files changed, 867 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 26531585c5f..371bbedd9a7 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -8089,37 +8089,272 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
   return call_insn;
 }
 
-/* Split simple return with popping POPC bytes from stack to indirect
-   branch with stack adjustment .  */
+/* Find general registers which are live at the exit of basic block BB
+   and set their corresponding bits in LIVE_OUTGOING_REGS.  */
+
+static void
+ix86_find_live_outgoing_regs (basic_block bb, bool gpr, bool zero_all,
+			      unsigned int &live_outgoing_regs)
+{
+  bitmap live_out = df_get_live_out (bb);
+
+  unsigned int regno;
+
+  /* Check for live outgoing registers.  */
+  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+    {
+      unsigned int i = INVALID_REGNUM;
+
+      if (gpr)
+	{
+	  /* Zero general registers.  */
+	  if (LEGACY_INT_REGNO_P (regno))
+	    i = regno;
+	  else if (TARGET_64BIT && REX_INT_REGNO_P (regno))
+	    i = regno - FIRST_REX_INT_REG + 8;
+	}
+      else if (TARGET_SSE)
+	{
+	  /* Zero vector registers.  */
+	  if (IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG))
+	    i = regno - FIRST_SSE_REG;
+	  else if (TARGET_64BIT)
+	    {
+	      if (REX_SSE_REGNO_P (regno))
+		i = regno - FIRST_REX_SSE_REG + 8;
+	      else if (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno))
+		i = regno - FIRST_EXT_REX_SSE_REG + 16;
+	    }
+	}
+
+      if (i == INVALID_REGNUM)
+	continue;
+
+      /* No need to check it again if it is live.  */
+      if ((live_outgoing_regs & (1 << i)))
+	continue;
+
+      /* A register is considered LIVE if
+	 1. It is a fixed register.
+	 2. If isn't a caller-saved register.
+	 3. If it is a live outgoing register.
+	 4. It is never used in the function and we don't zero all
+	    caller-saved registers.
+       */
+      if (fixed_regs[regno]
+	  || !call_used_regs[regno]
+	  || REGNO_REG_SET_P (live_out, regno)
+	  || (!zero_all && !df_regs_ever_live_p (regno)))
+	live_outgoing_regs |= 1 << i;
+    }
+}
+
+/* Split simple return with popping POPC bytes from stack, if POPC
+   isn't NULL_RTX, and zero caller-saved general registers if needed.
+   When popping POPC bytes from stack for -mfunction-return=, convert
+   return to indirect branch with stack adjustment.  */
 
 void
-ix86_split_simple_return_pop_internal (rtx popc)
+ix86_split_simple_return_internal (rtx popc)
 {
-  struct machine_function *m = cfun->machine;
-  rtx ecx = gen_rtx_REG (SImode, CX_REG);
-  rtx_insn *insn;
+  /* No need to zero caller-saved registers in main ().  Don't zero
+     caller-saved registers if __builtin_eh_return is called since it
+     isn't a normal function return.  */
+  if ((cfun->machine->zero_caller_saved_regs_type
+       != zero_caller_saved_regs_skip)
+      && !crtl->calls_eh_return
+      && cfun->machine->func_type == TYPE_NORMAL
+      && !MAIN_NAME_P (DECL_NAME (current_function_decl)))
+    {
+      bool gpr_only = true;
+      bool zero_all = false;
+      switch (cfun->machine->zero_caller_saved_regs_type)
+	{
+	case zero_caller_saved_regs_all_gpr:
+	  zero_all = true;
+	  break;
+	case zero_caller_saved_regs_used:
+	  gpr_only = false;
+	  break;
+	case zero_caller_saved_regs_all:
+	  gpr_only = false;
+	  zero_all = true;
+	  break;
+	default:
+	  break;
+	}
+
+      unsigned int &live_outgoing_int_regs
+	= cfun->machine->live_outgoing_int_regs;
+      unsigned int &live_outgoing_vector_regs
+	= cfun->machine->live_outgoing_vector_regs;
+
+      edge e;
+      edge_iterator ei;
+
+      if (live_outgoing_int_regs == 0)
+	{
+	  /* ECX register is used for return with pop.  */
+	  if (popc != NULL_RTX
+	      && (cfun->machine->function_return_type
+		  != indirect_branch_keep))
+	    live_outgoing_int_regs = 1 << CX_REG;
+
+	  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
+	    {
+	      ix86_find_live_outgoing_regs (e->src, true, zero_all,
+					    live_outgoing_int_regs);
+	    }
+	}
 
-  /* There is no "pascal" calling convention in any 64bit ABI.  */
-  gcc_assert (!TARGET_64BIT);
+      if (!gpr_only && live_outgoing_vector_regs == 0)
+	FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
+	  {
+	    ix86_find_live_outgoing_regs (e->src, false, zero_all,
+					  live_outgoing_vector_regs);
+	  }
 
-  insn = emit_insn (gen_pop (ecx));
-  m->fs.cfa_offset -= UNITS_PER_WORD;
-  m->fs.sp_offset -= UNITS_PER_WORD;
+      if (!gpr_only && TARGET_AVX && live_outgoing_vector_regs == 0)
+	{
+	  emit_insn (gen_avx_vzeroall ());
+	  gpr_only = true;
+	}
 
-  rtx x = plus_constant (Pmode, stack_pointer_rtx, UNITS_PER_WORD);
-  x = gen_rtx_SET (stack_pointer_rtx, x);
-  add_reg_note (insn, REG_CFA_ADJUST_CFA, x);
-  add_reg_note (insn, REG_CFA_REGISTER, gen_rtx_SET (ecx, pc_rtx));
-  RTX_FRAME_RELATED_P (insn) = 1;
+      rtx zero_gpr = NULL_RTX;
+      rtx zero_vector = NULL_RTX;
 
-  x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, popc);
-  x = gen_rtx_SET (stack_pointer_rtx, x);
-  insn = emit_insn (x);
-  add_reg_note (insn, REG_CFA_ADJUST_CFA, x);
-  RTX_FRAME_RELATED_P (insn) = 1;
+      unsigned int regno;
 
-  /* Now return address is in ECX.  */
-  emit_jump_insn (gen_simple_return_indirect_internal (ecx));
+      for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+	{
+	  unsigned int i = INVALID_REGNUM;
+	  unsigned int live_outgoing_regs;
+	  bool gpr = false;
+
+	  if (LEGACY_INT_REGNO_P (regno))
+	    {
+	      gpr = true;
+	      i = regno;
+	      live_outgoing_regs = live_outgoing_int_regs;
+	    }
+	  else if (TARGET_64BIT && REX_INT_REGNO_P (regno))
+	    {
+	      gpr = true;
+	      live_outgoing_regs = live_outgoing_int_regs;
+	      i = regno - FIRST_REX_INT_REG + 8;
+	    }
+	  else if (!gpr_only && TARGET_SSE)
+	    {
+	      if (IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG))
+		{
+		  live_outgoing_regs = live_outgoing_vector_regs;
+		  i = regno - FIRST_SSE_REG;
+		}
+	      if (TARGET_64BIT)
+		{
+		  if (REX_SSE_REGNO_P (regno))
+		    {
+		      live_outgoing_regs = live_outgoing_vector_regs;
+		      i = regno - FIRST_REX_SSE_REG + 8;
+		    }
+		  else if (TARGET_AVX512F
+			   && EXT_REX_SSE_REGNO_P (regno))
+		    {
+		      live_outgoing_regs = live_outgoing_vector_regs;
+		      i = regno - FIRST_EXT_REX_SSE_REG + 16;
+		    }
+		}
+	    }
+
+	  if (i == INVALID_REGNUM)
+	    continue;
+
+	  if ((live_outgoing_regs & (1 << i)))
+	    continue;
+
+	  rtx reg, tmp;
+
+	  if (gpr)
+	    {
+	      /* Zero out dead caller-saved register.  We only need to
+		 zero the lower 32 bits.  */
+	      reg = gen_rtx_REG (SImode, regno);
+	      if (zero_gpr == NULL_RTX)
+		{
+		  zero_gpr = reg;
+		  tmp = gen_rtx_SET (reg, const0_rtx);
+		  if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())
+		    {
+		      rtx clob = gen_rtx_CLOBBER (VOIDmode,
+						  gen_rtx_REG (CCmode,
+							       FLAGS_REG));
+		      tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2,
+								   tmp,
+								   clob));
+		    }
+		  emit_insn (tmp);
+		}
+	      else
+		emit_move_insn (reg, zero_gpr);
+	    }
+	  else
+	    {
+	      reg = gen_rtx_REG (V4SFmode, regno);
+	      if (zero_vector == NULL_RTX)
+		{
+		  zero_vector = reg;
+		  tmp = gen_rtx_SET (reg, const0_rtx);
+		  emit_insn (tmp);
+		}
+	      else
+		emit_move_insn (reg, zero_vector);
+	    }
+
+	  /* Mark it in use  */
+	  emit_insn (gen_pro_epilogue_use (reg));
+	}
+    }
+
+  if (popc)
+    {
+      if (cfun->machine->function_return_type != indirect_branch_keep)
+	{
+	  struct machine_function *m = cfun->machine;
+	  rtx ecx = gen_rtx_REG (SImode, CX_REG);
+	  rtx_insn *insn;
+
+	  /* There is no "pascal" calling convention in any 64bit ABI.  */
+	  gcc_assert (!TARGET_64BIT);
+
+	  insn = emit_insn (gen_pop (ecx));
+	  m->fs.cfa_offset -= UNITS_PER_WORD;
+	  m->fs.sp_offset -= UNITS_PER_WORD;
+
+	  rtx x = plus_constant (Pmode, stack_pointer_rtx,
+				 UNITS_PER_WORD);
+	  x = gen_rtx_SET (stack_pointer_rtx, x);
+	  add_reg_note (insn, REG_CFA_ADJUST_CFA, x);
+	  add_reg_note (insn, REG_CFA_REGISTER,
+			gen_rtx_SET (ecx, pc_rtx));
+	  RTX_FRAME_RELATED_P (insn) = 1;
+
+	  x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, popc);
+	  x = gen_rtx_SET (stack_pointer_rtx, x);
+	  insn = emit_insn (x);
+	  add_reg_note (insn, REG_CFA_ADJUST_CFA, copy_rtx (x));
+	  RTX_FRAME_RELATED_P (insn) = 1;
+
+	  /* Mark ECX in use  */
+	  emit_insn (gen_pro_epilogue_use (ecx));
+
+	  /* Now return address is in ECX.  */
+	  emit_jump_insn (gen_simple_return_indirect_internal (ecx));
+	}
+      else
+	emit_jump_insn (gen_simple_return_pop_internal_1 (popc));
+    }
+  else
+    emit_jump_insn (gen_simple_return_internal_1 ());
 }
 
 /* Errors in the source file can cause expand_expr to return const0_rtx
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 5c21fce06a4..c9bf79c7a43 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -3040,6 +3040,46 @@ ix86_set_func_type (tree fndecl)
     }
 }
 
+/* Set the zero_caller_saved_regs_type field from the function FNDECL.  */
+
+static void
+ix86_set_zero_caller_saved_regs_type (tree fndecl)
+{
+  if (cfun->machine->zero_caller_saved_regs_type
+      == zero_caller_saved_regs_unset)
+    {
+      tree attr = lookup_attribute ("zero_caller_saved_regs",
+				    DECL_ATTRIBUTES (fndecl));
+      if (attr != NULL)
+	{
+	  tree args = TREE_VALUE (attr);
+	  if (args == NULL)
+	    gcc_unreachable ();
+	  tree cst = TREE_VALUE (args);
+	  if (strcmp (TREE_STRING_POINTER (cst), "skip") == 0)
+	    cfun->machine->zero_caller_saved_regs_type
+	      = zero_caller_saved_regs_skip;
+	  else if (strcmp (TREE_STRING_POINTER (cst), "used-gpr") == 0)
+	    cfun->machine->zero_caller_saved_regs_type
+	      = zero_caller_saved_regs_used_gpr;
+	  else if (strcmp (TREE_STRING_POINTER (cst), "all-gpr") == 0)
+	    cfun->machine->zero_caller_saved_regs_type
+	      = zero_caller_saved_regs_all_gpr;
+	  else if (strcmp (TREE_STRING_POINTER (cst), "used") == 0)
+	    cfun->machine->zero_caller_saved_regs_type
+	      = zero_caller_saved_regs_used;
+	  else if (strcmp (TREE_STRING_POINTER (cst), "all") == 0)
+	    cfun->machine->zero_caller_saved_regs_type
+	      = zero_caller_saved_regs_all;
+	  else
+	    gcc_unreachable ();
+	}
+      else
+	cfun->machine->zero_caller_saved_regs_type
+	  = ix86_zero_caller_saved_regs;
+    }
+}
+
 /* Set the indirect_branch_type field from the function FNDECL.  */
 
 static void
@@ -3154,6 +3194,7 @@ ix86_set_current_function (tree fndecl)
 	{
 	  ix86_set_func_type (fndecl);
 	  ix86_set_indirect_branch_type (fndecl);
+	  ix86_set_zero_caller_saved_regs_type (fndecl);
 	}
       return;
     }
@@ -3175,6 +3216,7 @@ ix86_set_current_function (tree fndecl)
 
   ix86_set_func_type (fndecl);
   ix86_set_indirect_branch_type (fndecl);
+  ix86_set_zero_caller_saved_regs_type (fndecl);
 
   tree new_tree = DECL_FUNCTION_SPECIFIC_TARGET (fndecl);
   if (new_tree == NULL_TREE)
@@ -3635,6 +3677,29 @@ ix86_handle_fndecl_attribute (tree *node, tree name, tree args, int,
 	}
     }
 
+  if (is_attribute_p ("zero_caller_saved_regs", name))
+    {
+      tree cst = TREE_VALUE (args);
+      if (TREE_CODE (cst) != STRING_CST)
+	{
+	  warning (OPT_Wattributes,
+		   "%qE attribute requires a string constant argument",
+		   name);
+	  *no_add_attrs = true;
+	}
+      else if (strcmp (TREE_STRING_POINTER (cst), "skip") != 0
+	       && strcmp (TREE_STRING_POINTER (cst), "used-gpr") != 0
+	       && strcmp (TREE_STRING_POINTER (cst), "all-gpr") != 0
+	       && strcmp (TREE_STRING_POINTER (cst), "used") != 0
+	       && strcmp (TREE_STRING_POINTER (cst), "all") != 0)
+	{
+	  warning (OPT_Wattributes,
+		   "argument to %qE attribute is not (skip|used-gpr|all-gpr|used|all)",
+		   name);
+	  *no_add_attrs = true;
+	}
+    }
+
   return NULL_TREE;
 }
 
@@ -3787,6 +3852,8 @@ const struct attribute_spec ix86_attribute_table[] =
     ix86_handle_fentry_name, NULL },
   { "cf_check", 0, 0, true, false, false, false,
     ix86_handle_fndecl_attribute, NULL },
+  { "zero_caller_saved_regs", 1, 1, true, false, false, false,
+    ix86_handle_fndecl_attribute, NULL },
 
   /* End element.  */
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
index b40317b2427..c45677add98 100644
--- a/gcc/config/i386/i386-opts.h
+++ b/gcc/config/i386/i386-opts.h
@@ -125,4 +125,13 @@ enum instrument_return {
   instrument_return_nop5
 };
 
+enum zero_caller_saved_regs {
+  zero_caller_saved_regs_unset = 0,
+  zero_caller_saved_regs_skip,
+  zero_caller_saved_regs_used_gpr,
+  zero_caller_saved_regs_all_gpr,
+  zero_caller_saved_regs_used,
+  zero_caller_saved_regs_all
+};
+
 #endif
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 39fcaa0ad5f..01732a225f4 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -331,7 +331,7 @@ extern const char * ix86_output_call_insn (rtx_insn *insn, rtx call_op);
 extern const char * ix86_output_indirect_jmp (rtx call_op);
 extern const char * ix86_output_function_return (bool long_p);
 extern const char * ix86_output_indirect_function_return (rtx ret_op);
-extern void ix86_split_simple_return_pop_internal (rtx);
+extern void ix86_split_simple_return_internal (rtx);
 extern bool ix86_operands_ok_for_move_multiple (rtx *operands, bool load,
 						machine_mode mode);
 extern int ix86_min_insn_size (rtx_insn *);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b4ecc3ce832..d433c3d33f2 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -8508,7 +8508,7 @@ ix86_expand_prologue (void)
       insn = emit_insn (gen_set_got (pic));
       RTX_FRAME_RELATED_P (insn) = 1;
       add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
-      emit_insn (gen_prologue_use (pic));
+      emit_insn (gen_pro_epilogue_use (pic));
       /* Deleting already emmitted SET_GOT if exist and allocated to
 	 REAL_PIC_OFFSET_TABLE_REGNUM.  */
       ix86_elim_entry_set_got (pic);
@@ -8537,7 +8537,7 @@ ix86_expand_prologue (void)
      Further, prevent alloca modifications to the stack pointer from being
      combined with prologue modifications.  */
   if (TARGET_SEH)
-    emit_insn (gen_prologue_use (stack_pointer_rtx));
+    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));
 }
 
 /* Emit code to restore REG using a POP insn.  */
@@ -9260,7 +9260,7 @@ ix86_expand_epilogue (int style)
 	  emit_jump_insn (gen_simple_return_indirect_internal (ecx));
 	}
       else
-	emit_jump_insn (gen_simple_return_pop_internal (popc));
+	ix86_split_simple_return_internal (popc);
     }
   else if (!m->call_ms2sysv || !restore_stub_is_tail)
     {
@@ -9287,7 +9287,7 @@ ix86_expand_epilogue (int style)
 	  emit_jump_insn (gen_simple_return_indirect_internal (ecx));
 	}
       else
-	emit_jump_insn (gen_simple_return_internal ());
+	ix86_split_simple_return_internal (NULL_RTX);
     }
 
   /* Restore the state back to the state from the prologue,
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 08245f64322..68f37f42f59 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2823,6 +2823,10 @@ struct GTY(()) machine_function {
      the "interrupt" or "no_caller_saved_registers" attribute.  */
   BOOL_BITFIELD no_caller_saved_registers : 1;
 
+  /* How to clear caller-saved general registers upon function
+     return.  */
+  ENUM_BITFIELD(zero_caller_saved_regs) zero_caller_saved_regs_type : 5;
+
   /* If true, there is register available for argument passing.  This
      is used only in ix86_function_ok_for_sibcall by 32-bit to determine
      if there is scratch register available for indirect sibcall.  In
@@ -2853,6 +2857,12 @@ struct GTY(()) machine_function {
   /* True if the function needs a stack frame.  */
   BOOL_BITFIELD stack_frame_required : 1;
 
+  /* Integer registers live at exit.  */
+  unsigned int live_outgoing_int_regs;
+
+  /* Vector registers live at exit.  */
+  unsigned int live_outgoing_vector_regs;
+
   /* The largest alignment, in bytes, of stack slot actually used.  */
   unsigned int max_used_stack_alignment;
 
@@ -2955,6 +2965,12 @@ extern void debug_dispatch_window (int);
   (ix86_indirect_branch_register \
    || cfun->machine->indirect_branch_type != indirect_branch_keep)
 
+#define TARGET_POP_SCRATCH_REGISTER \
+  (TARGET_64BIT \
+   || (cfun->machine->zero_caller_saved_regs_type \
+       == zero_caller_saved_regs_skip) \
+   || cfun->machine->function_return_type == indirect_branch_keep)
+
 #define IX86_HLE_ACQUIRE (1 << 16)
 #define IX86_HLE_RELEASE (1 << 17)
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 76c00867231..c894fa79fd6 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -184,6 +184,8 @@ (define_c_enum "unspec" [
   UNSPEC_PDEP
   UNSPEC_PEXT
 
+  UNSPEC_SIMPLE_RETURN
+
   ;; IRET support
   UNSPEC_INTERRUPT_RETURN
 ])
@@ -194,7 +196,7 @@ (define_c_enum "unspecv" [
   UNSPECV_STACK_PROBE
   UNSPECV_PROBE_STACK_RANGE
   UNSPECV_ALIGN
-  UNSPECV_PROLOGUE_USE
+  UNSPECV_PRO_EPILOGUE_USE
   UNSPECV_SPLIT_STACK_RETURN
   UNSPECV_CLD
   UNSPECV_NOPS
@@ -13363,8 +13365,8 @@ (define_insn "*memory_blockage"
 
 ;; As USE insns aren't meaningful after reload, this is used instead
 ;; to prevent deleting instructions setting registers for PIC code
-(define_insn "prologue_use"
-  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]
+(define_insn "pro_epilogue_use"
+  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
   ""
   ""
   [(set_attr "length" "0")])
@@ -13405,10 +13407,23 @@ (define_expand "simple_return"
     }
 })
 
-(define_insn "simple_return_internal"
+(define_insn_and_split "simple_return_internal"
   [(simple_return)]
   "reload_completed"
   "* return ix86_output_function_return (false);"
+  "&& epilogue_completed"
+  [(const_int 0)]
+  "ix86_split_simple_return_internal (NULL_RTX); DONE;"
+  [(set_attr "length" "1")
+   (set_attr "atom_unit" "jeu")
+   (set_attr "length_immediate" "0")
+   (set_attr "modrm" "0")])
+
+(define_insn "simple_return_internal_1"
+  [(simple_return)
+   (unspec [(const_int 0)] UNSPEC_SIMPLE_RETURN)]
+  "reload_completed"
+  "* return ix86_output_function_return (false);"
   [(set_attr "length" "1")
    (set_attr "atom_unit" "jeu")
    (set_attr "length_immediate" "0")
@@ -13441,9 +13456,21 @@ (define_insn_and_split "simple_return_pop_internal"
    (use (match_operand:SI 0 "const_int_operand"))]
   "reload_completed"
   "%!ret\t%0"
-  "&& cfun->machine->function_return_type != indirect_branch_keep"
+  "&& (epilogue_completed
+       || cfun->machine->function_return_type != indirect_branch_keep)"
   [(const_int 0)]
-  "ix86_split_simple_return_pop_internal (operands[0]); DONE;"
+  "ix86_split_simple_return_internal (operands[0]); DONE;"
+  [(set_attr "length" "3")
+   (set_attr "atom_unit" "jeu")
+   (set_attr "length_immediate" "2")
+   (set_attr "modrm" "0")])
+
+(define_insn "simple_return_pop_internal_1"
+  [(simple_return)
+   (use (match_operand:SI 0 "const_int_operand"))
+   (unspec [(const_int 0)] UNSPEC_SIMPLE_RETURN)]
+  "reload_completed"
+  "%!ret\t%0"
   [(set_attr "length" "3")
    (set_attr "atom_unit" "jeu")
    (set_attr "length_immediate" "2")
@@ -19864,6 +19891,11 @@ (define_peephole2
    (set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1))])
 
 ;; Convert epilogue deallocator to pop.
+;; Don't do it when
+;; -mfunction-return= -mzero-caller-saved-regs=
+;; is used in 32-bit snce return with stack pop needs to increment
+;; stack register and scratch registers must be zeroed.  Pop scratch
+;; register will load value from stack.
 (define_peephole2
   [(match_scratch:W 1 "r")
    (parallel [(set (reg:P SP_REG)
@@ -19872,6 +19904,7 @@ (define_peephole2
 	      (clobber (reg:CC FLAGS_REG))
 	      (clobber (mem:BLK (scratch)))])]
   "(TARGET_SINGLE_POP || optimize_insn_for_size_p ())
+   && TARGET_POP_SCRATCH_REGISTER
    && INTVAL (operands[0]) == GET_MODE_SIZE (word_mode)"
   [(parallel [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))
 	      (clobber (mem:BLK (scratch)))])])
@@ -19887,6 +19920,7 @@ (define_peephole2
 	      (clobber (reg:CC FLAGS_REG))
 	      (clobber (mem:BLK (scratch)))])]
   "(TARGET_DOUBLE_POP || optimize_insn_for_size_p ())
+   && TARGET_POP_SCRATCH_REGISTER
    && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"
   [(parallel [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))
 	      (clobber (mem:BLK (scratch)))])
@@ -19900,6 +19934,7 @@ (define_peephole2
 	      (clobber (reg:CC FLAGS_REG))
 	      (clobber (mem:BLK (scratch)))])]
   "optimize_insn_for_size_p ()
+   && TARGET_POP_SCRATCH_REGISTER
    && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"
   [(parallel [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))
 	      (clobber (mem:BLK (scratch)))])
@@ -19912,7 +19947,8 @@ (define_peephole2
 		   (plus:P (reg:P SP_REG)
 			   (match_operand:P 0 "const_int_operand")))
 	      (clobber (reg:CC FLAGS_REG))])]
-  "INTVAL (operands[0]) == GET_MODE_SIZE (word_mode)"
+  "TARGET_POP_SCRATCH_REGISTER
+   && INTVAL (operands[0]) == GET_MODE_SIZE (word_mode)"
   [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))])
 
 ;; Two pops case is tricky, since pop causes dependency
@@ -19924,7 +19960,8 @@ (define_peephole2
 		   (plus:P (reg:P SP_REG)
 			   (match_operand:P 0 "const_int_operand")))
 	      (clobber (reg:CC FLAGS_REG))])]
-  "INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"
+  "TARGET_POP_SCRATCH_REGISTER
+   && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"
   [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))
    (set (match_dup 2) (mem:W (post_inc:P (reg:P SP_REG))))])
 
@@ -19935,6 +19972,7 @@ (define_peephole2
 			   (match_operand:P 0 "const_int_operand")))
 	      (clobber (reg:CC FLAGS_REG))])]
   "optimize_insn_for_size_p ()
+   && TARGET_POP_SCRATCH_REGISTER
    && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"
   [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))
    (set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))])
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 185a1d0686b..10ddacbc23b 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1107,3 +1107,26 @@ AVX512BF16 built-in functions and code generation.
 menqcmd
 Target Report Mask(ISA2_ENQCMD) Var(ix86_isa_flags2) Save
 Support ENQCMD built-in functions and code generation.
+
+mzero-caller-saved-regs=
+Target Report RejectNegative Joined Enum(zero_caller_saved_regs) Var(ix86_zero_caller_saved_regs) Init(zero_caller_saved_regs_skip)
+Clear caller-saved registers upon function return.
+
+Enum
+Name(zero_caller_saved_regs) Type(enum zero_caller_saved_regs)
+Known choices of clearing caller-saved registers upon function return (for use with the -mzero-caller-saved-regs= option):
+
+EnumValue
+Enum(zero_caller_saved_regs) String(skip) Value(zero_caller_saved_regs_skip)
+
+EnumValue
+Enum(zero_caller_saved_regs) String(used-gpr) Value(zero_caller_saved_regs_used_gpr)
+
+EnumValue
+Enum(zero_caller_saved_regs) String(all-gpr) Value(zero_caller_saved_regs_all_gpr)
+
+EnumValue
+Enum(zero_caller_saved_regs) String(used) Value(zero_caller_saved_regs_used)
+
+EnumValue
+Enum(zero_caller_saved_regs) String(all) Value(zero_caller_saved_regs_all)
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 936c22e2fe7..8037dcb305f 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -6740,6 +6740,18 @@ On x86 targets, the @code{fentry_section} attribute sets the name
 of the section to record function entry instrumentation calls in when
 enabled with @option{-pg -mrecord-mcount}
 
+@item zero_caller_saved_regs("@var{choice}")
+@cindex @code{zero_caller_saved_regs} function attribute, x86
+On x86 targets, the @code{zero_caller_saved_regs} attribute causes the
+compiler to zero caller-saved integer registers at function return
+according to @var{choice}.  @samp{skip} doesn't zero caller-saved
+registers.  @samp{used-gpr} zeros caller-saved integer registers which
+are used in function.  @samp{all-gpr} zeros all caller-saved integer and
+vector registers.  @samp{used} zeros caller-saved integer and vector
+registers which are used in function.  @samp{all} zeros all caller-saved
+integer and vector registers.  The default for the attribute is
+controlled by @option{-mzero-caller-saved-regs}.
+
 @end table
 
 On the x86, the inliner does not inline a
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 767d1f07801..68d7bc8316a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1365,7 +1365,7 @@ See RS/6000 and PowerPC Options.
 -mstack-protector-guard-symbol=@var{symbol} @gol
 -mgeneral-regs-only  -mcall-ms2sysv-xlogues @gol
 -mindirect-branch=@var{choice}  -mfunction-return=@var{choice} @gol
--mindirect-branch-register}
+-mindirect-branch-register -mzero-caller-saved-regs=@var{choice}}
 
 @emph{x86 Windows Options}
 @gccoptlist{-mconsole  -mcygwin  -mno-cygwin  -mdll @gol
@@ -30128,6 +30128,18 @@ not be reachable in the large code model.
 @opindex mindirect-branch-register
 Force indirect call and jump via register.
 
+@item -mzero-caller-saved-regs=@var{choice}
+@opindex mzero-caller-saved-regs
+Zero caller-saved registers at function return according to
+@var{choice}.  @samp{skip}, which is the default, doesn't zero
+caller-saved registers.  @samp{used-gpr} zeros caller-saved integer
+registers which are used in function.  @samp{all-gpr} zeros all
+caller-saved integer and vector registers.  @samp{used} zeros
+caller-saved integer and vector registers which are used in function.
+@samp{all} zeros all caller-saved integer and vector registers.  You
+can control this behavior for a specific function by using the function
+attribute @code{zero_caller_saved_regs}.  @xref{Function Attributes}.
+
 @end table
 
 These @samp{-m} switches are supported in addition to the above
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
new file mode 100644
index 00000000000..4c9e6d68dab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=used" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
new file mode 100644
index 00000000000..ea614ecba53
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */
+
+extern int foo (int) __attribute__ ((zero_caller_saved_regs("all-gpr")));
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
new file mode 100644
index 00000000000..f19ed7c9a68
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=used-gpr" } */
+
+struct S { int i; };
+__attribute__((const, noinline, noclone))
+struct S foo (int x)
+{
+  struct S s;
+  s.i = x;
+  return s;
+}
+
+int a[2048], b[2048], c[2048], d[2048];
+struct S e[2048];
+
+__attribute__((noinline, noclone)) void
+bar (void)
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    {
+      e[i] = foo (i);
+      a[i+2] = a[i] + a[i+1];
+      b[10] = b[10] + i;
+      c[i] = c[2047 - i];
+      d[i] = d[i + 1];
+    }
+}
+
+int
+main ()
+{
+  int i;
+  bar ();
+  for (i = 0; i < 1024; i++)
+    if (e[i].i != i)
+      __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
new file mode 100644
index 00000000000..f0283d9e750
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=all-gpr" } */
+
+struct S { int i; };
+__attribute__((const, noinline, noclone))
+struct S foo (int x)
+{
+  struct S s;
+  s.i = x;
+  return s;
+}
+
+int a[2048], b[2048], c[2048], d[2048];
+struct S e[2048];
+
+__attribute__((noinline, noclone)) void
+bar (void)
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    {
+      e[i] = foo (i);
+      a[i+2] = a[i] + a[i+1];
+      b[10] = b[10] + i;
+      c[i] = c[2047 - i];
+      d[i] = d[i + 1];
+    }
+}
+
+int
+main ()
+{
+  int i;
+  bar ();
+  for (i = 0; i < 1024; i++)
+    if (e[i].i != i)
+      __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
new file mode 100644
index 00000000000..044da02e244
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=all -march=corei7" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
new file mode 100644
index 00000000000..31487d51f53
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=all -march=corei7 -mavx" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
new file mode 100644
index 00000000000..dc561b0c71d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */
+
+extern void foo (void) __attribute__ ((zero_caller_saved_regs("used")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
new file mode 100644
index 00000000000..24824b0355e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=all" } */
+
+extern void foo (void) __attribute__ ((zero_caller_saved_regs("skip")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
new file mode 100644
index 00000000000..9ba4f547401
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=used" } */
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
new file mode 100644
index 00000000000..529adc26ad1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=used -march=corei7" } */
+
+float
+foo (float z, float y, float x)
+{
+  return x + y;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
new file mode 100644
index 00000000000..ac6201e27c9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=used -march=corei7" } */
+
+float
+foo (float z, float y, float x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
new file mode 100644
index 00000000000..6b9e25abf13
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=all-gpr" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
new file mode 100644
index 00000000000..e8e9c781ed1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
@@ -0,0 +1,23 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=all -march=corei7" } */
+
+float
+foo (float z, float y, float x)
+{
+  return x + y;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
new file mode 100644
index 00000000000..3052eb05503
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=skip -march=corei7" } */
+
+__attribute__ ((zero_caller_saved_regs("used")))
+float
+foo (float z, float y, float x)
+{
+  return x + y;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
new file mode 100644
index 00000000000..71369f56159
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=all -march=corei7 -mavx" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
new file mode 100644
index 00000000000..9a31af9516a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=all -march=corei7 -mavx512f" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
new file mode 100644
index 00000000000..a6f8eb7233a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
new file mode 100644
index 00000000000..bada4c73719
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */
+
+extern void foo (void) __attribute__ ((zero_caller_saved_regs("used-gpr")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
new file mode 100644
index 00000000000..b93719a11df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
@@ -0,0 +1,20 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */
+
+__attribute__ ((zero_caller_saved_regs("all-gpr")))
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
new file mode 100644
index 00000000000..bef1d36eca5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=all-gpr" } */
+
+extern void foo (void) __attribute__ ((zero_caller_saved_regs("skip")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
new file mode 100644
index 00000000000..73a766c1be9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=used-gpr" } */
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
new file mode 100644
index 00000000000..cd982ce27db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=all-gpr" } */
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
new file mode 100644
index 00000000000..23dbed50ab9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */
+
+extern int foo (int) __attribute__ ((zero_caller_saved_regs("used-gpr")));
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
-- 
2.26.2


^ permalink raw reply	[flat|nested] 188+ messages in thread

* [PATCH 3/4] x86: Add ix86_any_return_p
  2020-05-04 19:01 [PATCH 1/4] matcher-1.m: Change return type to int H.J. Lu
  2020-05-04 19:01 ` [PATCH 2/4] x86: Add -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all] H.J. Lu
@ 2020-05-04 19:01 ` H.J. Lu
  2020-05-04 19:01 ` [PATCH 4/4] Update gcc.target/i386/ret-thunk-2[234].c H.J. Lu
  2020-05-05 16:29 ` [PATCH 1/4] matcher-1.m: Change return type to int Jeff Law
  3 siblings, 0 replies; 188+ messages in thread
From: H.J. Lu @ 2020-05-04 19:01 UTC (permalink / raw)
  To: gcc-patches
  Cc: Uros Bizjak, Jeff Law, Richard Biener, Jakub Jelinek, Qing Zhao,
	keescook, victor.rodriguez.bahena

Add ix86_any_return_p to check simple_return in a PARALLEL to support:

(jump_insn 39 38 40 5 (parallel [
            (simple_return)
            (unspec [
                    (const_int 0 [0])
                ] UNSPEC_SIMPLE_RETURN)
        ]) "/tmp/x.c":105 -1
     (nil)
 -> simple_return)

	* config/i386/i386-expand.c (ix86_notrack_prefixed_insn_p):
	Replace ANY_RETURN_P with ix86_any_return_p.
	* config/i386/i386-features.c (rest_of_insert_endbranch):
	Likewise.
	* onfig/i386/i386-protos.h (ix86_any_return_p): New.
	* config/i386/i386.c (ix86_any_return_p): New function.
	(ix86_pad_returns): Replace ANY_RETURN_P with ix86_any_return_p.
	(ix86_count_insn_bb): Likewise.
	(ix86_pad_short_function): Likewise.
---
 gcc/config/i386/i386-expand.c   |  2 +-
 gcc/config/i386/i386-features.c |  2 +-
 gcc/config/i386/i386-protos.h   |  2 ++
 gcc/config/i386/i386.c          | 17 ++++++++++++++---
 4 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 371bbedd9a7..6dcd4554424 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -20143,7 +20143,7 @@ ix86_notrack_prefixed_insn_p (rtx_insn *insn)
   if (JUMP_P (insn) && !flag_cet_switch)
     {
       rtx target = JUMP_LABEL (insn);
-      if (target == NULL_RTX || ANY_RETURN_P (target))
+      if (target == NULL_RTX || ix86_any_return_p (target))
 	return false;
 
       /* Check the jump is a switch table.  */
diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
index 78fb373db6e..a1dd5dee42f 100644
--- a/gcc/config/i386/i386-features.c
+++ b/gcc/config/i386/i386-features.c
@@ -2030,7 +2030,7 @@ rest_of_insert_endbranch (void)
 	  if (JUMP_P (insn) && flag_cet_switch)
 	    {
 	      rtx target = JUMP_LABEL (insn);
-	      if (target == NULL_RTX || ANY_RETURN_P (target))
+	      if (target == NULL_RTX || ix86_any_return_p (target))
 		continue;
 
 	      /* Check the jump is a switch table.  */
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 01732a225f4..5e6a07a6b60 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -210,6 +210,8 @@ extern void ix86_move_vector_high_sse_to_mmx (rtx);
 extern void ix86_split_mmx_pack (rtx[], enum rtx_code);
 extern void ix86_split_mmx_punpck (rtx[], bool);
 
+extern bool ix86_any_return_p (rtx);
+
 #ifdef TREE_CODE
 extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int);
 #endif	/* TREE_CODE  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d433c3d33f2..80d0cfe96d2 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -20632,6 +20632,17 @@ ix86_avoid_jump_mispredicts (void)
 }
 #endif
 
+/* Return true if RET is a return, simple_return or simple_return in
+   a PARALLEL.  */
+
+bool
+ix86_any_return_p (rtx ret)
+{
+  return (ANY_RETURN_P (ret)
+	  || (GET_CODE (ret) == PARALLEL
+	      && GET_CODE (XVECEXP (ret, 0, 0)) == SIMPLE_RETURN));
+}
+
 /* AMD Athlon works faster
    when RET is not destination of conditional jump or directly preceded
    by other jump instruction.  We avoid the penalty by inserting NOP just
@@ -20649,7 +20660,7 @@ ix86_pad_returns (void)
       rtx_insn *prev;
       bool replace = false;
 
-      if (!JUMP_P (ret) || !ANY_RETURN_P (PATTERN (ret))
+      if (!JUMP_P (ret) || !ix86_any_return_p (PATTERN (ret))
 	  || optimize_bb_for_size_p (bb))
 	continue;
       for (prev = PREV_INSN (ret); prev; prev = PREV_INSN (prev))
@@ -20703,7 +20714,7 @@ ix86_count_insn_bb (basic_block bb)
     {
       /* Only happen in exit blocks.  */
       if (JUMP_P (insn)
-	  && ANY_RETURN_P (PATTERN (insn)))
+	  && ix86_any_return_p (PATTERN (insn)))
 	break;
 
       if (NONDEBUG_INSN_P (insn)
@@ -20776,7 +20787,7 @@ ix86_pad_short_function (void)
   FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
     {
       rtx_insn *ret = BB_END (e->src);
-      if (JUMP_P (ret) && ANY_RETURN_P (PATTERN (ret)))
+      if (JUMP_P (ret) && ix86_any_return_p (PATTERN (ret)))
 	{
 	  int insn_count = ix86_count_insn (e->src);
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 188+ messages in thread

* [PATCH 4/4] Update gcc.target/i386/ret-thunk-2[234].c
  2020-05-04 19:01 [PATCH 1/4] matcher-1.m: Change return type to int H.J. Lu
  2020-05-04 19:01 ` [PATCH 2/4] x86: Add -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all] H.J. Lu
  2020-05-04 19:01 ` [PATCH 3/4] x86: Add ix86_any_return_p H.J. Lu
@ 2020-05-04 19:01 ` H.J. Lu
  2020-05-05 16:29 ` [PATCH 1/4] matcher-1.m: Change return type to int Jeff Law
  3 siblings, 0 replies; 188+ messages in thread
From: H.J. Lu @ 2020-05-04 19:01 UTC (permalink / raw)
  To: gcc-patches
  Cc: Uros Bizjak, Jeff Law, Richard Biener, Jakub Jelinek, Qing Zhao,
	keescook, victor.rodriguez.bahena

---
 gcc/testsuite/gcc.target/i386/ret-thunk-22.c | 2 +-
 gcc/testsuite/gcc.target/i386/ret-thunk-23.c | 2 +-
 gcc/testsuite/gcc.target/i386/ret-thunk-24.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-22.c b/gcc/testsuite/gcc.target/i386/ret-thunk-22.c
index 9a9f42ea6a1..5796148c652 100644
--- a/gcc/testsuite/gcc.target/i386/ret-thunk-22.c
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-22.c
@@ -7,7 +7,7 @@ struct s gs = { 100 + 200i };
 struct s __attribute__((noinline)) foo (void) { return gs; }
 
 /* { dg-final { scan-assembler-times "popl\[\\t \]*%ecx" 1 { target { ! *-*-darwin* } } } } */
-/* { dg-final { scan-assembler "lea\[l\]?\[\\t \]*4\\(%esp\\), %esp" { target { ! *-*-darwin* } } } } */
+/* { dg-final { scan-assembler-times "popl\[\\t \]*%edx" 1 { target { ! *-*-darwin* } } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*_?__x86_return_thunk_ecx" { target { ! *-*-darwin* } } } } */
 /* { dg-final { scan-assembler {call[ \t]*___x86.get_pc_thunk.cx} { target { *-*-darwin* } } } } */
 /* { dg-final { scan-assembler {jmp[ \t]*___x86_return_thunk} { target { *-*-darwin* } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-23.c b/gcc/testsuite/gcc.target/i386/ret-thunk-23.c
index 69469a43606..1739d8f8d53 100644
--- a/gcc/testsuite/gcc.target/i386/ret-thunk-23.c
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-23.c
@@ -7,7 +7,7 @@ struct s gs = { 100 + 200i };
 struct s __attribute__((noinline)) foo (void) { return gs; }
 
 /* { dg-final { scan-assembler-times "popl\[\\t \]*%ecx" 1 { target { ! *-*-darwin* } } } } */
-/* { dg-final { scan-assembler "lea\[l\]?\[\\t \]*4\\(%esp\\), %esp" { target { ! *-*-darwin* } } } } */
+/* { dg-final { scan-assembler-times "popl\[\\t \]*%edx" 1 { target { ! *-*-darwin* } } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*_?__x86_return_thunk_ecx" { target { ! *-*-darwin* } } } } */
 /* { dg-final { scan-assembler {call[ \t]*___x86.get_pc_thunk.cx} { target { *-*-darwin* } } } } */
 /* { dg-final { scan-assembler {jmp[ \t]*___x86_return_thunk} { target { *-*-darwin* } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-24.c b/gcc/testsuite/gcc.target/i386/ret-thunk-24.c
index 0e7877970d7..4df5d9b8131 100644
--- a/gcc/testsuite/gcc.target/i386/ret-thunk-24.c
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-24.c
@@ -7,7 +7,7 @@ struct s gs = { 100 + 200i };
 struct s __attribute__((noinline)) foo (void) { return gs; }
 
 /* { dg-final { scan-assembler-times "popl\[\\t \]*%ecx" 1 { target { ! *-*-darwin* } } } } */
-/* { dg-final { scan-assembler "lea\[l\]?\[\\t \]*4\\(%esp\\), %esp" { target { ! *-*-darwin* } } } } */
+/* { dg-final { scan-assembler-times "popl\[\\t \]*%edx" 1 { target { ! *-*-darwin* } } } } */
 /* { dg-final { scan-assembler-not "jmp\[ \t\]*_?__x86_return_thunk_ecx" { target { ! *-*-darwin* } } } } */
 /* { dg-final { scan-assembler {call[ \t]*___x86.get_pc_thunk.cx} { target { *-*-darwin* } } } } */
 /* { dg-final { scan-assembler-not {jmp[ \t]*___x86_return_thunk} { target { *-*-darwin* } } } } */
-- 
2.26.2


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: [PATCH 2/4] x86: Add -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]
  2020-05-04 19:01 ` [PATCH 2/4] x86: Add -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all] H.J. Lu
@ 2020-05-04 23:19   ` Rodriguez Bahena, Victor
  2020-05-05  8:14   ` Uros Bizjak
  1 sibling, 0 replies; 188+ messages in thread
From: Rodriguez Bahena, Victor @ 2020-05-04 23:19 UTC (permalink / raw)
  To: H.J. Lu, gcc-patches
  Cc: Uros Bizjak, Jeff Law, Richard Biener, Jakub Jelinek, Qing Zhao,
	keescook



-----Original Message-----
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Monday, May 4, 2020 at 2:02 PM
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: Uros Bizjak <ubizjak@gmail.com>, Jeff Law <law@redhat.com>, Richard Biener <rguenther@suse.de>, Jakub Jelinek <jakub@redhat.com>, Qing Zhao <QING.ZHAO@oracle.com>, "keescook@chromium.org" <keescook@chromium.org>, "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com>
Subject: [PATCH 2/4] x86: Add -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]

    Add -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all] command-line
    option and zero_caller_saved_regs("skip|used|all") function attribue:

    1. -mzero-caller-saved-regs=skip and zero_caller_saved_regs("skip")

    Don't zero caller-saved registers upon function return.

    2. -mzero-caller-saved-regs=used-gpr and zero_caller_saved_regs("used-gpr")

    Zero used caller-saved integer registers upon function return.

    3. -mzero-caller-saved-regs=all-gpr and zero_caller_saved_regs("all-gpr")

    2. -mzero-caller-saved-regs=used and zero_caller_saved_regs("used")

    Zero used caller-saved integer and vector registers upon function return.

    3. -mzero-caller-saved-regs=all and zero_caller_saved_regs("all")

    Zero all caller-saved integer and vector registers upon function return.

    Tested on i686 and x86-64 with bootstrapping GCC trunk, making
    -mzero-caller-saved-regs=used-gpr, -mzero-caller-saved-regs=all-gpr
    -mzero-caller-saved-regs=used, and -mzero-caller-saved-regs=all enabled
    by default.

Hi gcc team 

Clear Linux project has been using this patch since GCC 8
This is intended to make threats such as ROP, COP, and JOP attacks much harder.
It will be nice if we could have this patch in GCC 10 or master. 

Thanks

Victor Rodriguez

    gcc/

    	* i386-expand.c (ix86_find_live_outgoing_regs): New function.
    	(ix86_split_simple_return_pop_internal): Removed.
    	(ix86_split_simple_return_internal): New function.
    	* config/i386/i386-options.c (ix86_set_zero_caller_saved_regs_type):
    	New function.
    	(ix86_set_current_function): Call ix86_set_zero_caller_saved_regs_type.
    	(ix86_handle_fndecl_attribute): Support zero_caller_saved_regs
    	attribute.
    	(ix86_attribute_table): Add zero_caller_saved_regs.
    	* config/i386/i386-opts.h (zero_caller_saved_regs): New enum.
    	* config/i386/i386-protos.h (ix86_split_simple_return_pop_internal):
    	Renamed to ...
    	(ix86_split_simple_return_internal): This.
    	* config/i386/i386.c (ix86_expand_prologue): Replace
    	gen_prologue_use with gen_pro_epilogue_use.
    	(ix86_expand_epilogue): Replace gen_simple_return_pop_internal
    	with ix86_split_simple_return_internal.  Replace
    	gen_simple_return_internal with ix86_split_simple_return_internal.
    	* config/i386/i386.h (machine_function): Add
    	zero_caller_saved_regs_type, live_outgoing_int_regs and
    	live_outgoing_vector_regs.
    	(TARGET_POP_SCRATCH_REGISTER): New.
    	* config/i386/i386.md (UNSPEC_SIMPLE_RETURN): New UNSPEC.
    	(UNSPECV_PROLOGUE_USE): Renamed to ...
    	(UNSPECV_PRO_EPILOGUE_USE): This.
    	(prologue_use): Renamed to ...
    	(pro_epilogue_use): This.
    	(simple_return_internal): Changed to define_insn_and_split.
    	(simple_return_internal_1): New pattern.
    	(simple_return_pop_internal): Replace
    	ix86_split_simple_return_pop_internal with
    	ix86_split_simple_return_internal.  Always call
    	ix86_split_simple_return_internal if epilogue_completed is
    	true.
    	(simple_return_pop_internal_1): New pattern.
    	(Epilogue deallocator to pop peepholes): Enabled only if
    	TARGET_POP_SCRATCH_REGISTER is true.
    	* config/i386/i386.opt (mzero-caller-saved-regs=): New option.
    	* doc/extend.texi: Document zero_caller_saved_regs attribute.
    	* doc/invoke.texi: Document -mzero-caller-saved-regs=.

    gcc/testsuite/

    	* gcc.target/i386/zero-scratch-regs-1.c: New test.
    	* gcc.target/i386/zero-scratch-regs-2.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-3.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-4.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-5.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-6.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-7.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-8.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-9.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-10.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-11.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-12.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-13.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-14.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-15.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-16.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-17.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-18.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-19.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-20.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-21.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-22.c: Likewise.
    	* gcc.target/i386/zero-scratch-regs-23.c: Likewise.
    ---
     gcc/config/i386/i386-expand.c                 | 281 ++++++++++++++++--
     gcc/config/i386/i386-options.c                |  67 +++++
     gcc/config/i386/i386-opts.h                   |   9 +
     gcc/config/i386/i386-protos.h                 |   2 +-
     gcc/config/i386/i386.c                        |   8 +-
     gcc/config/i386/i386.h                        |  16 +
     gcc/config/i386/i386.md                       |  54 +++-
     gcc/config/i386/i386.opt                      |  23 ++
     gcc/doc/extend.texi                           |  12 +
     gcc/doc/invoke.texi                           |  14 +-
     .../gcc.target/i386/zero-scratch-regs-1.c     |  12 +
     .../gcc.target/i386/zero-scratch-regs-10.c    |  21 ++
     .../gcc.target/i386/zero-scratch-regs-11.c    |  39 +++
     .../gcc.target/i386/zero-scratch-regs-12.c    |  39 +++
     .../gcc.target/i386/zero-scratch-regs-13.c    |  21 ++
     .../gcc.target/i386/zero-scratch-regs-14.c    |  19 ++
     .../gcc.target/i386/zero-scratch-regs-15.c    |  14 +
     .../gcc.target/i386/zero-scratch-regs-16.c    |  14 +
     .../gcc.target/i386/zero-scratch-regs-17.c    |  13 +
     .../gcc.target/i386/zero-scratch-regs-18.c    |  13 +
     .../gcc.target/i386/zero-scratch-regs-19.c    |  12 +
     .../gcc.target/i386/zero-scratch-regs-2.c     |  19 ++
     .../gcc.target/i386/zero-scratch-regs-20.c    |  23 ++
     .../gcc.target/i386/zero-scratch-regs-21.c    |  14 +
     .../gcc.target/i386/zero-scratch-regs-22.c    |  19 ++
     .../gcc.target/i386/zero-scratch-regs-23.c    |  19 ++
     .../gcc.target/i386/zero-scratch-regs-3.c     |  12 +
     .../gcc.target/i386/zero-scratch-regs-4.c     |  14 +
     .../gcc.target/i386/zero-scratch-regs-5.c     |  20 ++
     .../gcc.target/i386/zero-scratch-regs-6.c     |  14 +
     .../gcc.target/i386/zero-scratch-regs-7.c     |  13 +
     .../gcc.target/i386/zero-scratch-regs-8.c     |  19 ++
     .../gcc.target/i386/zero-scratch-regs-9.c     |  15 +
     33 files changed, 867 insertions(+), 37 deletions(-)
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
     create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c

    diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
    index 26531585c5f..371bbedd9a7 100644
    --- a/gcc/config/i386/i386-expand.c
    +++ b/gcc/config/i386/i386-expand.c
    @@ -8089,37 +8089,272 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
       return call_insn;
     }

    -/* Split simple return with popping POPC bytes from stack to indirect
    -   branch with stack adjustment .  */
    +/* Find general registers which are live at the exit of basic block BB
    +   and set their corresponding bits in LIVE_OUTGOING_REGS.  */
    +
    +static void
    +ix86_find_live_outgoing_regs (basic_block bb, bool gpr, bool zero_all,
    +			      unsigned int &live_outgoing_regs)
    +{
    +  bitmap live_out = df_get_live_out (bb);
    +
    +  unsigned int regno;
    +
    +  /* Check for live outgoing registers.  */
    +  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
    +    {
    +      unsigned int i = INVALID_REGNUM;
    +
    +      if (gpr)
    +	{
    +	  /* Zero general registers.  */
    +	  if (LEGACY_INT_REGNO_P (regno))
    +	    i = regno;
    +	  else if (TARGET_64BIT && REX_INT_REGNO_P (regno))
    +	    i = regno - FIRST_REX_INT_REG + 8;
    +	}
    +      else if (TARGET_SSE)
    +	{
    +	  /* Zero vector registers.  */
    +	  if (IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG))
    +	    i = regno - FIRST_SSE_REG;
    +	  else if (TARGET_64BIT)
    +	    {
    +	      if (REX_SSE_REGNO_P (regno))
    +		i = regno - FIRST_REX_SSE_REG + 8;
    +	      else if (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno))
    +		i = regno - FIRST_EXT_REX_SSE_REG + 16;
    +	    }
    +	}
    +
    +      if (i == INVALID_REGNUM)
    +	continue;
    +
    +      /* No need to check it again if it is live.  */
    +      if ((live_outgoing_regs & (1 << i)))
    +	continue;
    +
    +      /* A register is considered LIVE if
    +	 1. It is a fixed register.
    +	 2. If isn't a caller-saved register.
    +	 3. If it is a live outgoing register.
    +	 4. It is never used in the function and we don't zero all
    +	    caller-saved registers.
    +       */
    +      if (fixed_regs[regno]
    +	  || !call_used_regs[regno]
    +	  || REGNO_REG_SET_P (live_out, regno)
    +	  || (!zero_all && !df_regs_ever_live_p (regno)))
    +	live_outgoing_regs |= 1 << i;
    +    }
    +}
    +
    +/* Split simple return with popping POPC bytes from stack, if POPC
    +   isn't NULL_RTX, and zero caller-saved general registers if needed.
    +   When popping POPC bytes from stack for -mfunction-return=, convert
    +   return to indirect branch with stack adjustment.  */

     void
    -ix86_split_simple_return_pop_internal (rtx popc)
    +ix86_split_simple_return_internal (rtx popc)
     {
    -  struct machine_function *m = cfun->machine;
    -  rtx ecx = gen_rtx_REG (SImode, CX_REG);
    -  rtx_insn *insn;
    +  /* No need to zero caller-saved registers in main ().  Don't zero
    +     caller-saved registers if __builtin_eh_return is called since it
    +     isn't a normal function return.  */
    +  if ((cfun->machine->zero_caller_saved_regs_type
    +       != zero_caller_saved_regs_skip)
    +      && !crtl->calls_eh_return
    +      && cfun->machine->func_type == TYPE_NORMAL
    +      && !MAIN_NAME_P (DECL_NAME (current_function_decl)))
    +    {
    +      bool gpr_only = true;
    +      bool zero_all = false;
    +      switch (cfun->machine->zero_caller_saved_regs_type)
    +	{
    +	case zero_caller_saved_regs_all_gpr:
    +	  zero_all = true;
    +	  break;
    +	case zero_caller_saved_regs_used:
    +	  gpr_only = false;
    +	  break;
    +	case zero_caller_saved_regs_all:
    +	  gpr_only = false;
    +	  zero_all = true;
    +	  break;
    +	default:
    +	  break;
    +	}
    +
    +      unsigned int &live_outgoing_int_regs
    +	= cfun->machine->live_outgoing_int_regs;
    +      unsigned int &live_outgoing_vector_regs
    +	= cfun->machine->live_outgoing_vector_regs;
    +
    +      edge e;
    +      edge_iterator ei;
    +
    +      if (live_outgoing_int_regs == 0)
    +	{
    +	  /* ECX register is used for return with pop.  */
    +	  if (popc != NULL_RTX
    +	      && (cfun->machine->function_return_type
    +		  != indirect_branch_keep))
    +	    live_outgoing_int_regs = 1 << CX_REG;
    +
    +	  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
    +	    {
    +	      ix86_find_live_outgoing_regs (e->src, true, zero_all,
    +					    live_outgoing_int_regs);
    +	    }
    +	}

    -  /* There is no "pascal" calling convention in any 64bit ABI.  */
    -  gcc_assert (!TARGET_64BIT);
    +      if (!gpr_only && live_outgoing_vector_regs == 0)
    +	FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
    +	  {
    +	    ix86_find_live_outgoing_regs (e->src, false, zero_all,
    +					  live_outgoing_vector_regs);
    +	  }

    -  insn = emit_insn (gen_pop (ecx));
    -  m->fs.cfa_offset -= UNITS_PER_WORD;
    -  m->fs.sp_offset -= UNITS_PER_WORD;
    +      if (!gpr_only && TARGET_AVX && live_outgoing_vector_regs == 0)
    +	{
    +	  emit_insn (gen_avx_vzeroall ());
    +	  gpr_only = true;
    +	}

    -  rtx x = plus_constant (Pmode, stack_pointer_rtx, UNITS_PER_WORD);
    -  x = gen_rtx_SET (stack_pointer_rtx, x);
    -  add_reg_note (insn, REG_CFA_ADJUST_CFA, x);
    -  add_reg_note (insn, REG_CFA_REGISTER, gen_rtx_SET (ecx, pc_rtx));
    -  RTX_FRAME_RELATED_P (insn) = 1;
    +      rtx zero_gpr = NULL_RTX;
    +      rtx zero_vector = NULL_RTX;

    -  x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, popc);
    -  x = gen_rtx_SET (stack_pointer_rtx, x);
    -  insn = emit_insn (x);
    -  add_reg_note (insn, REG_CFA_ADJUST_CFA, x);
    -  RTX_FRAME_RELATED_P (insn) = 1;
    +      unsigned int regno;

    -  /* Now return address is in ECX.  */
    -  emit_jump_insn (gen_simple_return_indirect_internal (ecx));
    +      for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
    +	{
    +	  unsigned int i = INVALID_REGNUM;
    +	  unsigned int live_outgoing_regs;
    +	  bool gpr = false;
    +
    +	  if (LEGACY_INT_REGNO_P (regno))
    +	    {
    +	      gpr = true;
    +	      i = regno;
    +	      live_outgoing_regs = live_outgoing_int_regs;
    +	    }
    +	  else if (TARGET_64BIT && REX_INT_REGNO_P (regno))
    +	    {
    +	      gpr = true;
    +	      live_outgoing_regs = live_outgoing_int_regs;
    +	      i = regno - FIRST_REX_INT_REG + 8;
    +	    }
    +	  else if (!gpr_only && TARGET_SSE)
    +	    {
    +	      if (IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG))
    +		{
    +		  live_outgoing_regs = live_outgoing_vector_regs;
    +		  i = regno - FIRST_SSE_REG;
    +		}
    +	      if (TARGET_64BIT)
    +		{
    +		  if (REX_SSE_REGNO_P (regno))
    +		    {
    +		      live_outgoing_regs = live_outgoing_vector_regs;
    +		      i = regno - FIRST_REX_SSE_REG + 8;
    +		    }
    +		  else if (TARGET_AVX512F
    +			   && EXT_REX_SSE_REGNO_P (regno))
    +		    {
    +		      live_outgoing_regs = live_outgoing_vector_regs;
    +		      i = regno - FIRST_EXT_REX_SSE_REG + 16;
    +		    }
    +		}
    +	    }
    +
    +	  if (i == INVALID_REGNUM)
    +	    continue;
    +
    +	  if ((live_outgoing_regs & (1 << i)))
    +	    continue;
    +
    +	  rtx reg, tmp;
    +
    +	  if (gpr)
    +	    {
    +	      /* Zero out dead caller-saved register.  We only need to
    +		 zero the lower 32 bits.  */
    +	      reg = gen_rtx_REG (SImode, regno);
    +	      if (zero_gpr == NULL_RTX)
    +		{
    +		  zero_gpr = reg;
    +		  tmp = gen_rtx_SET (reg, const0_rtx);
    +		  if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())
    +		    {
    +		      rtx clob = gen_rtx_CLOBBER (VOIDmode,
    +						  gen_rtx_REG (CCmode,
    +							       FLAGS_REG));
    +		      tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2,
    +								   tmp,
    +								   clob));
    +		    }
    +		  emit_insn (tmp);
    +		}
    +	      else
    +		emit_move_insn (reg, zero_gpr);
    +	    }
    +	  else
    +	    {
    +	      reg = gen_rtx_REG (V4SFmode, regno);
    +	      if (zero_vector == NULL_RTX)
    +		{
    +		  zero_vector = reg;
    +		  tmp = gen_rtx_SET (reg, const0_rtx);
    +		  emit_insn (tmp);
    +		}
    +	      else
    +		emit_move_insn (reg, zero_vector);
    +	    }
    +
    +	  /* Mark it in use  */
    +	  emit_insn (gen_pro_epilogue_use (reg));
    +	}
    +    }
    +
    +  if (popc)
    +    {
    +      if (cfun->machine->function_return_type != indirect_branch_keep)
    +	{
    +	  struct machine_function *m = cfun->machine;
    +	  rtx ecx = gen_rtx_REG (SImode, CX_REG);
    +	  rtx_insn *insn;
    +
    +	  /* There is no "pascal" calling convention in any 64bit ABI.  */
    +	  gcc_assert (!TARGET_64BIT);
    +
    +	  insn = emit_insn (gen_pop (ecx));
    +	  m->fs.cfa_offset -= UNITS_PER_WORD;
    +	  m->fs.sp_offset -= UNITS_PER_WORD;
    +
    +	  rtx x = plus_constant (Pmode, stack_pointer_rtx,
    +				 UNITS_PER_WORD);
    +	  x = gen_rtx_SET (stack_pointer_rtx, x);
    +	  add_reg_note (insn, REG_CFA_ADJUST_CFA, x);
    +	  add_reg_note (insn, REG_CFA_REGISTER,
    +			gen_rtx_SET (ecx, pc_rtx));
    +	  RTX_FRAME_RELATED_P (insn) = 1;
    +
    +	  x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, popc);
    +	  x = gen_rtx_SET (stack_pointer_rtx, x);
    +	  insn = emit_insn (x);
    +	  add_reg_note (insn, REG_CFA_ADJUST_CFA, copy_rtx (x));
    +	  RTX_FRAME_RELATED_P (insn) = 1;
    +
    +	  /* Mark ECX in use  */
    +	  emit_insn (gen_pro_epilogue_use (ecx));
    +
    +	  /* Now return address is in ECX.  */
    +	  emit_jump_insn (gen_simple_return_indirect_internal (ecx));
    +	}
    +      else
    +	emit_jump_insn (gen_simple_return_pop_internal_1 (popc));
    +    }
    +  else
    +    emit_jump_insn (gen_simple_return_internal_1 ());
     }

     /* Errors in the source file can cause expand_expr to return const0_rtx
    diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
    index 5c21fce06a4..c9bf79c7a43 100644
    --- a/gcc/config/i386/i386-options.c
    +++ b/gcc/config/i386/i386-options.c
    @@ -3040,6 +3040,46 @@ ix86_set_func_type (tree fndecl)
         }
     }

    +/* Set the zero_caller_saved_regs_type field from the function FNDECL.  */
    +
    +static void
    +ix86_set_zero_caller_saved_regs_type (tree fndecl)
    +{
    +  if (cfun->machine->zero_caller_saved_regs_type
    +      == zero_caller_saved_regs_unset)
    +    {
    +      tree attr = lookup_attribute ("zero_caller_saved_regs",
    +				    DECL_ATTRIBUTES (fndecl));
    +      if (attr != NULL)
    +	{
    +	  tree args = TREE_VALUE (attr);
    +	  if (args == NULL)
    +	    gcc_unreachable ();
    +	  tree cst = TREE_VALUE (args);
    +	  if (strcmp (TREE_STRING_POINTER (cst), "skip") == 0)
    +	    cfun->machine->zero_caller_saved_regs_type
    +	      = zero_caller_saved_regs_skip;
    +	  else if (strcmp (TREE_STRING_POINTER (cst), "used-gpr") == 0)
    +	    cfun->machine->zero_caller_saved_regs_type
    +	      = zero_caller_saved_regs_used_gpr;
    +	  else if (strcmp (TREE_STRING_POINTER (cst), "all-gpr") == 0)
    +	    cfun->machine->zero_caller_saved_regs_type
    +	      = zero_caller_saved_regs_all_gpr;
    +	  else if (strcmp (TREE_STRING_POINTER (cst), "used") == 0)
    +	    cfun->machine->zero_caller_saved_regs_type
    +	      = zero_caller_saved_regs_used;
    +	  else if (strcmp (TREE_STRING_POINTER (cst), "all") == 0)
    +	    cfun->machine->zero_caller_saved_regs_type
    +	      = zero_caller_saved_regs_all;
    +	  else
    +	    gcc_unreachable ();
    +	}
    +      else
    +	cfun->machine->zero_caller_saved_regs_type
    +	  = ix86_zero_caller_saved_regs;
    +    }
    +}
    +
     /* Set the indirect_branch_type field from the function FNDECL.  */

     static void
    @@ -3154,6 +3194,7 @@ ix86_set_current_function (tree fndecl)
     	{
     	  ix86_set_func_type (fndecl);
     	  ix86_set_indirect_branch_type (fndecl);
    +	  ix86_set_zero_caller_saved_regs_type (fndecl);
     	}
           return;
         }
    @@ -3175,6 +3216,7 @@ ix86_set_current_function (tree fndecl)

       ix86_set_func_type (fndecl);
       ix86_set_indirect_branch_type (fndecl);
    +  ix86_set_zero_caller_saved_regs_type (fndecl);

       tree new_tree = DECL_FUNCTION_SPECIFIC_TARGET (fndecl);
       if (new_tree == NULL_TREE)
    @@ -3635,6 +3677,29 @@ ix86_handle_fndecl_attribute (tree *node, tree name, tree args, int,
     	}
         }

    +  if (is_attribute_p ("zero_caller_saved_regs", name))
    +    {
    +      tree cst = TREE_VALUE (args);
    +      if (TREE_CODE (cst) != STRING_CST)
    +	{
    +	  warning (OPT_Wattributes,
    +		   "%qE attribute requires a string constant argument",
    +		   name);
    +	  *no_add_attrs = true;
    +	}
    +      else if (strcmp (TREE_STRING_POINTER (cst), "skip") != 0
    +	       && strcmp (TREE_STRING_POINTER (cst), "used-gpr") != 0
    +	       && strcmp (TREE_STRING_POINTER (cst), "all-gpr") != 0
    +	       && strcmp (TREE_STRING_POINTER (cst), "used") != 0
    +	       && strcmp (TREE_STRING_POINTER (cst), "all") != 0)
    +	{
    +	  warning (OPT_Wattributes,
    +		   "argument to %qE attribute is not (skip|used-gpr|all-gpr|used|all)",
    +		   name);
    +	  *no_add_attrs = true;
    +	}
    +    }
    +
       return NULL_TREE;
     }

    @@ -3787,6 +3852,8 @@ const struct attribute_spec ix86_attribute_table[] =
         ix86_handle_fentry_name, NULL },
       { "cf_check", 0, 0, true, false, false, false,
         ix86_handle_fndecl_attribute, NULL },
    +  { "zero_caller_saved_regs", 1, 1, true, false, false, false,
    +    ix86_handle_fndecl_attribute, NULL },

       /* End element.  */
       { NULL, 0, 0, false, false, false, false, NULL, NULL }
    diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
    index b40317b2427..c45677add98 100644
    --- a/gcc/config/i386/i386-opts.h
    +++ b/gcc/config/i386/i386-opts.h
    @@ -125,4 +125,13 @@ enum instrument_return {
       instrument_return_nop5
     };

    +enum zero_caller_saved_regs {
    +  zero_caller_saved_regs_unset = 0,
    +  zero_caller_saved_regs_skip,
    +  zero_caller_saved_regs_used_gpr,
    +  zero_caller_saved_regs_all_gpr,
    +  zero_caller_saved_regs_used,
    +  zero_caller_saved_regs_all
    +};
    +
     #endif
    diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
    index 39fcaa0ad5f..01732a225f4 100644
    --- a/gcc/config/i386/i386-protos.h
    +++ b/gcc/config/i386/i386-protos.h
    @@ -331,7 +331,7 @@ extern const char * ix86_output_call_insn (rtx_insn *insn, rtx call_op);
     extern const char * ix86_output_indirect_jmp (rtx call_op);
     extern const char * ix86_output_function_return (bool long_p);
     extern const char * ix86_output_indirect_function_return (rtx ret_op);
    -extern void ix86_split_simple_return_pop_internal (rtx);
    +extern void ix86_split_simple_return_internal (rtx);
     extern bool ix86_operands_ok_for_move_multiple (rtx *operands, bool load,
     						machine_mode mode);
     extern int ix86_min_insn_size (rtx_insn *);
    diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
    index b4ecc3ce832..d433c3d33f2 100644
    --- a/gcc/config/i386/i386.c
    +++ b/gcc/config/i386/i386.c
    @@ -8508,7 +8508,7 @@ ix86_expand_prologue (void)
           insn = emit_insn (gen_set_got (pic));
           RTX_FRAME_RELATED_P (insn) = 1;
           add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
    -      emit_insn (gen_prologue_use (pic));
    +      emit_insn (gen_pro_epilogue_use (pic));
           /* Deleting already emmitted SET_GOT if exist and allocated to
     	 REAL_PIC_OFFSET_TABLE_REGNUM.  */
           ix86_elim_entry_set_got (pic);
    @@ -8537,7 +8537,7 @@ ix86_expand_prologue (void)
          Further, prevent alloca modifications to the stack pointer from being
          combined with prologue modifications.  */
       if (TARGET_SEH)
    -    emit_insn (gen_prologue_use (stack_pointer_rtx));
    +    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));
     }

     /* Emit code to restore REG using a POP insn.  */
    @@ -9260,7 +9260,7 @@ ix86_expand_epilogue (int style)
     	  emit_jump_insn (gen_simple_return_indirect_internal (ecx));
     	}
           else
    -	emit_jump_insn (gen_simple_return_pop_internal (popc));
    +	ix86_split_simple_return_internal (popc);
         }
       else if (!m->call_ms2sysv || !restore_stub_is_tail)
         {
    @@ -9287,7 +9287,7 @@ ix86_expand_epilogue (int style)
     	  emit_jump_insn (gen_simple_return_indirect_internal (ecx));
     	}
           else
    -	emit_jump_insn (gen_simple_return_internal ());
    +	ix86_split_simple_return_internal (NULL_RTX);
         }

       /* Restore the state back to the state from the prologue,
    diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
    index 08245f64322..68f37f42f59 100644
    --- a/gcc/config/i386/i386.h
    +++ b/gcc/config/i386/i386.h
    @@ -2823,6 +2823,10 @@ struct GTY(()) machine_function {
          the "interrupt" or "no_caller_saved_registers" attribute.  */
       BOOL_BITFIELD no_caller_saved_registers : 1;

    +  /* How to clear caller-saved general registers upon function
    +     return.  */
    +  ENUM_BITFIELD(zero_caller_saved_regs) zero_caller_saved_regs_type : 5;
    +
       /* If true, there is register available for argument passing.  This
          is used only in ix86_function_ok_for_sibcall by 32-bit to determine
          if there is scratch register available for indirect sibcall.  In
    @@ -2853,6 +2857,12 @@ struct GTY(()) machine_function {
       /* True if the function needs a stack frame.  */
       BOOL_BITFIELD stack_frame_required : 1;

    +  /* Integer registers live at exit.  */
    +  unsigned int live_outgoing_int_regs;
    +
    +  /* Vector registers live at exit.  */
    +  unsigned int live_outgoing_vector_regs;
    +
       /* The largest alignment, in bytes, of stack slot actually used.  */
       unsigned int max_used_stack_alignment;

    @@ -2955,6 +2965,12 @@ extern void debug_dispatch_window (int);
       (ix86_indirect_branch_register \
        || cfun->machine->indirect_branch_type != indirect_branch_keep)

    +#define TARGET_POP_SCRATCH_REGISTER \
    +  (TARGET_64BIT \
    +   || (cfun->machine->zero_caller_saved_regs_type \
    +       == zero_caller_saved_regs_skip) \
    +   || cfun->machine->function_return_type == indirect_branch_keep)
    +
     #define IX86_HLE_ACQUIRE (1 << 16)
     #define IX86_HLE_RELEASE (1 << 17)

    diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
    index 76c00867231..c894fa79fd6 100644
    --- a/gcc/config/i386/i386.md
    +++ b/gcc/config/i386/i386.md
    @@ -184,6 +184,8 @@ (define_c_enum "unspec" [
       UNSPEC_PDEP
       UNSPEC_PEXT

    +  UNSPEC_SIMPLE_RETURN
    +
       ;; IRET support
       UNSPEC_INTERRUPT_RETURN
     ])
    @@ -194,7 +196,7 @@ (define_c_enum "unspecv" [
       UNSPECV_STACK_PROBE
       UNSPECV_PROBE_STACK_RANGE
       UNSPECV_ALIGN
    -  UNSPECV_PROLOGUE_USE
    +  UNSPECV_PRO_EPILOGUE_USE
       UNSPECV_SPLIT_STACK_RETURN
       UNSPECV_CLD
       UNSPECV_NOPS
    @@ -13363,8 +13365,8 @@ (define_insn "*memory_blockage"

     ;; As USE insns aren't meaningful after reload, this is used instead
     ;; to prevent deleting instructions setting registers for PIC code
    -(define_insn "prologue_use"
    -  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]
    +(define_insn "pro_epilogue_use"
    +  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
       ""
       ""
       [(set_attr "length" "0")])
    @@ -13405,10 +13407,23 @@ (define_expand "simple_return"
         }
     })

    -(define_insn "simple_return_internal"
    +(define_insn_and_split "simple_return_internal"
       [(simple_return)]
       "reload_completed"
       "* return ix86_output_function_return (false);"
    +  "&& epilogue_completed"
    +  [(const_int 0)]
    +  "ix86_split_simple_return_internal (NULL_RTX); DONE;"
    +  [(set_attr "length" "1")
    +   (set_attr "atom_unit" "jeu")
    +   (set_attr "length_immediate" "0")
    +   (set_attr "modrm" "0")])
    +
    +(define_insn "simple_return_internal_1"
    +  [(simple_return)
    +   (unspec [(const_int 0)] UNSPEC_SIMPLE_RETURN)]
    +  "reload_completed"
    +  "* return ix86_output_function_return (false);"
       [(set_attr "length" "1")
        (set_attr "atom_unit" "jeu")
        (set_attr "length_immediate" "0")
    @@ -13441,9 +13456,21 @@ (define_insn_and_split "simple_return_pop_internal"
        (use (match_operand:SI 0 "const_int_operand"))]
       "reload_completed"
       "%!ret\t%0"
    -  "&& cfun->machine->function_return_type != indirect_branch_keep"
    +  "&& (epilogue_completed
    +       || cfun->machine->function_return_type != indirect_branch_keep)"
       [(const_int 0)]
    -  "ix86_split_simple_return_pop_internal (operands[0]); DONE;"
    +  "ix86_split_simple_return_internal (operands[0]); DONE;"
    +  [(set_attr "length" "3")
    +   (set_attr "atom_unit" "jeu")
    +   (set_attr "length_immediate" "2")
    +   (set_attr "modrm" "0")])
    +
    +(define_insn "simple_return_pop_internal_1"
    +  [(simple_return)
    +   (use (match_operand:SI 0 "const_int_operand"))
    +   (unspec [(const_int 0)] UNSPEC_SIMPLE_RETURN)]
    +  "reload_completed"
    +  "%!ret\t%0"
       [(set_attr "length" "3")
        (set_attr "atom_unit" "jeu")
        (set_attr "length_immediate" "2")
    @@ -19864,6 +19891,11 @@ (define_peephole2
        (set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1))])

     ;; Convert epilogue deallocator to pop.
    +;; Don't do it when
    +;; -mfunction-return= -mzero-caller-saved-regs=
    +;; is used in 32-bit snce return with stack pop needs to increment
    +;; stack register and scratch registers must be zeroed.  Pop scratch
    +;; register will load value from stack.
     (define_peephole2
       [(match_scratch:W 1 "r")
        (parallel [(set (reg:P SP_REG)
    @@ -19872,6 +19904,7 @@ (define_peephole2
     	      (clobber (reg:CC FLAGS_REG))
     	      (clobber (mem:BLK (scratch)))])]
       "(TARGET_SINGLE_POP || optimize_insn_for_size_p ())
    +   && TARGET_POP_SCRATCH_REGISTER
        && INTVAL (operands[0]) == GET_MODE_SIZE (word_mode)"
       [(parallel [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))
     	      (clobber (mem:BLK (scratch)))])])
    @@ -19887,6 +19920,7 @@ (define_peephole2
     	      (clobber (reg:CC FLAGS_REG))
     	      (clobber (mem:BLK (scratch)))])]
       "(TARGET_DOUBLE_POP || optimize_insn_for_size_p ())
    +   && TARGET_POP_SCRATCH_REGISTER
        && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"
       [(parallel [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))
     	      (clobber (mem:BLK (scratch)))])
    @@ -19900,6 +19934,7 @@ (define_peephole2
     	      (clobber (reg:CC FLAGS_REG))
     	      (clobber (mem:BLK (scratch)))])]
       "optimize_insn_for_size_p ()
    +   && TARGET_POP_SCRATCH_REGISTER
        && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"
       [(parallel [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))
     	      (clobber (mem:BLK (scratch)))])
    @@ -19912,7 +19947,8 @@ (define_peephole2
     		   (plus:P (reg:P SP_REG)
     			   (match_operand:P 0 "const_int_operand")))
     	      (clobber (reg:CC FLAGS_REG))])]
    -  "INTVAL (operands[0]) == GET_MODE_SIZE (word_mode)"
    +  "TARGET_POP_SCRATCH_REGISTER
    +   && INTVAL (operands[0]) == GET_MODE_SIZE (word_mode)"
       [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))])

     ;; Two pops case is tricky, since pop causes dependency
    @@ -19924,7 +19960,8 @@ (define_peephole2
     		   (plus:P (reg:P SP_REG)
     			   (match_operand:P 0 "const_int_operand")))
     	      (clobber (reg:CC FLAGS_REG))])]
    -  "INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"
    +  "TARGET_POP_SCRATCH_REGISTER
    +   && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"
       [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))
        (set (match_dup 2) (mem:W (post_inc:P (reg:P SP_REG))))])

    @@ -19935,6 +19972,7 @@ (define_peephole2
     			   (match_operand:P 0 "const_int_operand")))
     	      (clobber (reg:CC FLAGS_REG))])]
       "optimize_insn_for_size_p ()
    +   && TARGET_POP_SCRATCH_REGISTER
        && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"
       [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))
        (set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))])
    diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
    index 185a1d0686b..10ddacbc23b 100644
    --- a/gcc/config/i386/i386.opt
    +++ b/gcc/config/i386/i386.opt
    @@ -1107,3 +1107,26 @@ AVX512BF16 built-in functions and code generation.
     menqcmd
     Target Report Mask(ISA2_ENQCMD) Var(ix86_isa_flags2) Save
     Support ENQCMD built-in functions and code generation.
    +
    +mzero-caller-saved-regs=
    +Target Report RejectNegative Joined Enum(zero_caller_saved_regs) Var(ix86_zero_caller_saved_regs) Init(zero_caller_saved_regs_skip)
    +Clear caller-saved registers upon function return.
    +
    +Enum
    +Name(zero_caller_saved_regs) Type(enum zero_caller_saved_regs)
    +Known choices of clearing caller-saved registers upon function return (for use with the -mzero-caller-saved-regs= option):
    +
    +EnumValue
    +Enum(zero_caller_saved_regs) String(skip) Value(zero_caller_saved_regs_skip)
    +
    +EnumValue
    +Enum(zero_caller_saved_regs) String(used-gpr) Value(zero_caller_saved_regs_used_gpr)
    +
    +EnumValue
    +Enum(zero_caller_saved_regs) String(all-gpr) Value(zero_caller_saved_regs_all_gpr)
    +
    +EnumValue
    +Enum(zero_caller_saved_regs) String(used) Value(zero_caller_saved_regs_used)
    +
    +EnumValue
    +Enum(zero_caller_saved_regs) String(all) Value(zero_caller_saved_regs_all)
    diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
    index 936c22e2fe7..8037dcb305f 100644
    --- a/gcc/doc/extend.texi
    +++ b/gcc/doc/extend.texi
    @@ -6740,6 +6740,18 @@ On x86 targets, the @code{fentry_section} attribute sets the name
     of the section to record function entry instrumentation calls in when
     enabled with @option{-pg -mrecord-mcount}

    +@item zero_caller_saved_regs("@var{choice}")
    +@cindex @code{zero_caller_saved_regs} function attribute, x86
    +On x86 targets, the @code{zero_caller_saved_regs} attribute causes the
    +compiler to zero caller-saved integer registers at function return
    +according to @var{choice}.  @samp{skip} doesn't zero caller-saved
    +registers.  @samp{used-gpr} zeros caller-saved integer registers which
    +are used in function.  @samp{all-gpr} zeros all caller-saved integer and
    +vector registers.  @samp{used} zeros caller-saved integer and vector
    +registers which are used in function.  @samp{all} zeros all caller-saved
    +integer and vector registers.  The default for the attribute is
    +controlled by @option{-mzero-caller-saved-regs}.
    +
     @end table

     On the x86, the inliner does not inline a
    diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
    index 767d1f07801..68d7bc8316a 100644
    --- a/gcc/doc/invoke.texi
    +++ b/gcc/doc/invoke.texi
    @@ -1365,7 +1365,7 @@ See RS/6000 and PowerPC Options.
     -mstack-protector-guard-symbol=@var{symbol} @gol
     -mgeneral-regs-only  -mcall-ms2sysv-xlogues @gol
     -mindirect-branch=@var{choice}  -mfunction-return=@var{choice} @gol
    --mindirect-branch-register}
    +-mindirect-branch-register -mzero-caller-saved-regs=@var{choice}}

     @emph{x86 Windows Options}
     @gccoptlist{-mconsole  -mcygwin  -mno-cygwin  -mdll @gol
    @@ -30128,6 +30128,18 @@ not be reachable in the large code model.
     @opindex mindirect-branch-register
     Force indirect call and jump via register.

    +@item -mzero-caller-saved-regs=@var{choice}
    +@opindex mzero-caller-saved-regs
    +Zero caller-saved registers at function return according to
    +@var{choice}.  @samp{skip}, which is the default, doesn't zero
    +caller-saved registers.  @samp{used-gpr} zeros caller-saved integer
    +registers which are used in function.  @samp{all-gpr} zeros all
    +caller-saved integer and vector registers.  @samp{used} zeros
    +caller-saved integer and vector registers which are used in function.
    +@samp{all} zeros all caller-saved integer and vector registers.  You
    +can control this behavior for a specific function by using the function
    +attribute @code{zero_caller_saved_regs}.  @xref{Function Attributes}.
    +
     @end table

     These @samp{-m} switches are supported in addition to the above
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
    new file mode 100644
    index 00000000000..4c9e6d68dab
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
    @@ -0,0 +1,12 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=used" } */
    +
    +void
    +foo (void)
    +{
    +}
    +
    +/* { dg-final { scan-assembler-not "vzeroall" } } */
    +/* { dg-final { scan-assembler-not "%xmm" } } */
    +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
    +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
    new file mode 100644
    index 00000000000..ea614ecba53
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
    @@ -0,0 +1,21 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */
    +
    +extern int foo (int) __attribute__ ((zero_caller_saved_regs("all-gpr")));
    +
    +int
    +foo (int x)
    +{
    +  return x;
    +}
    +
    +/* { dg-final { scan-assembler-not "vzeroall" } } */
    +/* { dg-final { scan-assembler-not "%xmm" } } */
    +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
    new file mode 100644
    index 00000000000..f19ed7c9a68
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
    @@ -0,0 +1,39 @@
    +/* { dg-do run { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=used-gpr" } */
    +
    +struct S { int i; };
    +__attribute__((const, noinline, noclone))
    +struct S foo (int x)
    +{
    +  struct S s;
    +  s.i = x;
    +  return s;
    +}
    +
    +int a[2048], b[2048], c[2048], d[2048];
    +struct S e[2048];
    +
    +__attribute__((noinline, noclone)) void
    +bar (void)
    +{
    +  int i;
    +  for (i = 0; i < 1024; i++)
    +    {
    +      e[i] = foo (i);
    +      a[i+2] = a[i] + a[i+1];
    +      b[10] = b[10] + i;
    +      c[i] = c[2047 - i];
    +      d[i] = d[i + 1];
    +    }
    +}
    +
    +int
    +main ()
    +{
    +  int i;
    +  bar ();
    +  for (i = 0; i < 1024; i++)
    +    if (e[i].i != i)
    +      __builtin_abort ();
    +  return 0;
    +}
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
    new file mode 100644
    index 00000000000..f0283d9e750
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
    @@ -0,0 +1,39 @@
    +/* { dg-do run { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=all-gpr" } */
    +
    +struct S { int i; };
    +__attribute__((const, noinline, noclone))
    +struct S foo (int x)
    +{
    +  struct S s;
    +  s.i = x;
    +  return s;
    +}
    +
    +int a[2048], b[2048], c[2048], d[2048];
    +struct S e[2048];
    +
    +__attribute__((noinline, noclone)) void
    +bar (void)
    +{
    +  int i;
    +  for (i = 0; i < 1024; i++)
    +    {
    +      e[i] = foo (i);
    +      a[i+2] = a[i] + a[i+1];
    +      b[10] = b[10] + i;
    +      c[i] = c[2047 - i];
    +      d[i] = d[i + 1];
    +    }
    +}
    +
    +int
    +main ()
    +{
    +  int i;
    +  bar ();
    +  for (i = 0; i < 1024; i++)
    +    if (e[i].i != i)
    +      __builtin_abort ();
    +  return 0;
    +}
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
    new file mode 100644
    index 00000000000..044da02e244
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
    @@ -0,0 +1,21 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=all -march=corei7" } */
    +
    +void
    +foo (void)
    +{
    +}
    +
    +/* { dg-final { scan-assembler-not "vzeroall" } } */
    +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
    +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
    +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
    new file mode 100644
    index 00000000000..31487d51f53
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
    @@ -0,0 +1,19 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=all -march=corei7 -mavx" } */
    +
    +void
    +foo (void)
    +{
    +}
    +
    +/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
    +/* { dg-final { scan-assembler-not "%xmm" } } */
    +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
    new file mode 100644
    index 00000000000..dc561b0c71d
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
    @@ -0,0 +1,14 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */
    +
    +extern void foo (void) __attribute__ ((zero_caller_saved_regs("used")));
    +
    +void
    +foo (void)
    +{
    +}
    +
    +/* { dg-final { scan-assembler-not "vzeroall" } } */
    +/* { dg-final { scan-assembler-not "%xmm" } } */
    +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
    +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
    new file mode 100644
    index 00000000000..24824b0355e
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
    @@ -0,0 +1,14 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=all" } */
    +
    +extern void foo (void) __attribute__ ((zero_caller_saved_regs("skip")));
    +
    +void
    +foo (void)
    +{
    +}
    +
    +/* { dg-final { scan-assembler-not "vzeroall" } } */
    +/* { dg-final { scan-assembler-not "%xmm" } } */
    +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
    +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
    new file mode 100644
    index 00000000000..9ba4f547401
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
    @@ -0,0 +1,13 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=used" } */
    +
    +int
    +foo (int x)
    +{
    +  return x;
    +}
    +
    +/* { dg-final { scan-assembler-not "vzeroall" } } */
    +/* { dg-final { scan-assembler-not "%xmm" } } */
    +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
    +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
    new file mode 100644
    index 00000000000..529adc26ad1
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
    @@ -0,0 +1,13 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=used -march=corei7" } */
    +
    +float
    +foo (float z, float y, float x)
    +{
    +  return x + y;
    +}
    +
    +/* { dg-final { scan-assembler-not "vzeroall" } } */
    +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
    new file mode 100644
    index 00000000000..ac6201e27c9
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
    @@ -0,0 +1,12 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=used -march=corei7" } */
    +
    +float
    +foo (float z, float y, float x)
    +{
    +  return x;
    +}
    +
    +/* { dg-final { scan-assembler-not "vzeroall" } } */
    +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
    new file mode 100644
    index 00000000000..6b9e25abf13
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
    @@ -0,0 +1,19 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=all-gpr" } */
    +
    +void
    +foo (void)
    +{
    +}
    +
    +/* { dg-final { scan-assembler-not "vzeroall" } } */
    +/* { dg-final { scan-assembler-not "%xmm" } } */
    +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
    new file mode 100644
    index 00000000000..e8e9c781ed1
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
    @@ -0,0 +1,23 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=all -march=corei7" } */
    +
    +float
    +foo (float z, float y, float x)
    +{
    +  return x + y;
    +}
    +
    +/* { dg-final { scan-assembler-not "vzeroall" } } */
    +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
    +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
    +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
    new file mode 100644
    index 00000000000..3052eb05503
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
    @@ -0,0 +1,14 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=skip -march=corei7" } */
    +
    +__attribute__ ((zero_caller_saved_regs("used")))
    +float
    +foo (float z, float y, float x)
    +{
    +  return x + y;
    +}
    +
    +/* { dg-final { scan-assembler-not "vzeroall" } } */
    +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
    new file mode 100644
    index 00000000000..71369f56159
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
    @@ -0,0 +1,19 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=all -march=corei7 -mavx" } */
    +
    +void
    +foo (void)
    +{
    +}
    +
    +/* { dg-final { scan-assembler "vzeroall" } } */
    +/* { dg-final { scan-assembler-not "%xmm" } } */
    +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
    new file mode 100644
    index 00000000000..9a31af9516a
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
    @@ -0,0 +1,19 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=all -march=corei7 -mavx512f" } */
    +
    +void
    +foo (void)
    +{
    +}
    +
    +/* { dg-final { scan-assembler "vzeroall" } } */
    +/* { dg-final { scan-assembler-not "%xmm" } } */
    +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
    new file mode 100644
    index 00000000000..a6f8eb7233a
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
    @@ -0,0 +1,12 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */
    +
    +void
    +foo (void)
    +{
    +}
    +
    +/* { dg-final { scan-assembler-not "vzeroall" } } */
    +/* { dg-final { scan-assembler-not "%xmm" } } */
    +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
    +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
    new file mode 100644
    index 00000000000..bada4c73719
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
    @@ -0,0 +1,14 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */
    +
    +extern void foo (void) __attribute__ ((zero_caller_saved_regs("used-gpr")));
    +
    +void
    +foo (void)
    +{
    +}
    +
    +/* { dg-final { scan-assembler-not "vzeroall" } } */
    +/* { dg-final { scan-assembler-not "%xmm" } } */
    +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
    +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
    new file mode 100644
    index 00000000000..b93719a11df
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
    @@ -0,0 +1,20 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */
    +
    +__attribute__ ((zero_caller_saved_regs("all-gpr")))
    +void
    +foo (void)
    +{
    +}
    +
    +/* { dg-final { scan-assembler-not "vzeroall" } } */
    +/* { dg-final { scan-assembler-not "%xmm" } } */
    +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
    new file mode 100644
    index 00000000000..bef1d36eca5
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
    @@ -0,0 +1,14 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=all-gpr" } */
    +
    +extern void foo (void) __attribute__ ((zero_caller_saved_regs("skip")));
    +
    +void
    +foo (void)
    +{
    +}
    +
    +/* { dg-final { scan-assembler-not "vzeroall" } } */
    +/* { dg-final { scan-assembler-not "%xmm" } } */
    +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
    +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
    new file mode 100644
    index 00000000000..73a766c1be9
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
    @@ -0,0 +1,13 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=used-gpr" } */
    +
    +int
    +foo (int x)
    +{
    +  return x;
    +}
    +
    +/* { dg-final { scan-assembler-not "vzeroall" } } */
    +/* { dg-final { scan-assembler-not "%xmm" } } */
    +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
    +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
    new file mode 100644
    index 00000000000..cd982ce27db
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
    @@ -0,0 +1,19 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=all-gpr" } */
    +
    +int
    +foo (int x)
    +{
    +  return x;
    +}
    +
    +/* { dg-final { scan-assembler-not "vzeroall" } } */
    +/* { dg-final { scan-assembler-not "%xmm" } } */
    +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
    +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
    diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
    new file mode 100644
    index 00000000000..23dbed50ab9
    --- /dev/null
    +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
    @@ -0,0 +1,15 @@
    +/* { dg-do compile { target *-*-linux* } } */
    +/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */
    +
    +extern int foo (int) __attribute__ ((zero_caller_saved_regs("used-gpr")));
    +
    +int
    +foo (int x)
    +{
    +  return x;
    +}
    +
    +/* { dg-final { scan-assembler-not "vzeroall" } } */
    +/* { dg-final { scan-assembler-not "%xmm" } } */
    +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
    +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
    -- 
    2.26.2



^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: [PATCH 2/4] x86: Add -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]
  2020-05-04 19:01 ` [PATCH 2/4] x86: Add -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all] H.J. Lu
  2020-05-04 23:19   ` Rodriguez Bahena, Victor
@ 2020-05-05  8:14   ` Uros Bizjak
  2020-05-05  8:20     ` Richard Biener
  1 sibling, 1 reply; 188+ messages in thread
From: Uros Bizjak @ 2020-05-05  8:14 UTC (permalink / raw)
  To: H.J. Lu
  Cc: gcc-patches, Jeff Law, Richard Biener, Jakub Jelinek, Qing Zhao,
	keescook, victor.rodriguez.bahena

On Mon, May 4, 2020 at 9:01 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> Add -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all] command-line
> option and zero_caller_saved_regs("skip|used|all") function attribue:
>
> 1. -mzero-caller-saved-regs=skip and zero_caller_saved_regs("skip")
>
> Don't zero caller-saved registers upon function return.
>
> 2. -mzero-caller-saved-regs=used-gpr and zero_caller_saved_regs("used-gpr")
>
> Zero used caller-saved integer registers upon function return.
>
> 3. -mzero-caller-saved-regs=all-gpr and zero_caller_saved_regs("all-gpr")
>
> 2. -mzero-caller-saved-regs=used and zero_caller_saved_regs("used")
>
> Zero used caller-saved integer and vector registers upon function return.
>
> 3. -mzero-caller-saved-regs=all and zero_caller_saved_regs("all")
>
> Zero all caller-saved integer and vector registers upon function return.
>
> Tested on i686 and x86-64 with bootstrapping GCC trunk, making
> -mzero-caller-saved-regs=used-gpr, -mzero-caller-saved-regs=all-gpr
> -mzero-caller-saved-regs=used, and -mzero-caller-saved-regs=all enabled
> by default.

This patch should be completely rewritten to use regsets infrastructure.

Please note accessible_reg_set global variable that nowadays holds all
ABI-dependent active registers.

To ease the review, the patch should be split to at least options
part, attribute part, infrastructure part and insn pattern change
part, not to mention separate testsuite part.

Please also get global reviewer on board. This *IS* a new
functionality, even if for some reason lives in i386 directory, so it
requires global functionality review, community consensus, etc...
*BEFORE* target maintainer reviews the patch from the implementation
point of view. Asking for a target review at this time is just putting
the cart before the horse and will get you nowhere.

Uros.

> gcc/
>
>         * i386-expand.c (ix86_find_live_outgoing_regs): New function.
>         (ix86_split_simple_return_pop_internal): Removed.
>         (ix86_split_simple_return_internal): New function.
>         * config/i386/i386-options.c (ix86_set_zero_caller_saved_regs_type):
>         New function.
>         (ix86_set_current_function): Call ix86_set_zero_caller_saved_regs_type.
>         (ix86_handle_fndecl_attribute): Support zero_caller_saved_regs
>         attribute.
>         (ix86_attribute_table): Add zero_caller_saved_regs.
>         * config/i386/i386-opts.h (zero_caller_saved_regs): New enum.
>         * config/i386/i386-protos.h (ix86_split_simple_return_pop_internal):
>         Renamed to ...
>         (ix86_split_simple_return_internal): This.
>         * config/i386/i386.c (ix86_expand_prologue): Replace
>         gen_prologue_use with gen_pro_epilogue_use.
>         (ix86_expand_epilogue): Replace gen_simple_return_pop_internal
>         with ix86_split_simple_return_internal.  Replace
>         gen_simple_return_internal with ix86_split_simple_return_internal.
>         * config/i386/i386.h (machine_function): Add
>         zero_caller_saved_regs_type, live_outgoing_int_regs and
>         live_outgoing_vector_regs.
>         (TARGET_POP_SCRATCH_REGISTER): New.
>         * config/i386/i386.md (UNSPEC_SIMPLE_RETURN): New UNSPEC.
>         (UNSPECV_PROLOGUE_USE): Renamed to ...
>         (UNSPECV_PRO_EPILOGUE_USE): This.
>         (prologue_use): Renamed to ...
>         (pro_epilogue_use): This.
>         (simple_return_internal): Changed to define_insn_and_split.
>         (simple_return_internal_1): New pattern.
>         (simple_return_pop_internal): Replace
>         ix86_split_simple_return_pop_internal with
>         ix86_split_simple_return_internal.  Always call
>         ix86_split_simple_return_internal if epilogue_completed is
>         true.
>         (simple_return_pop_internal_1): New pattern.
>         (Epilogue deallocator to pop peepholes): Enabled only if
>         TARGET_POP_SCRATCH_REGISTER is true.
>         * config/i386/i386.opt (mzero-caller-saved-regs=): New option.
>         * doc/extend.texi: Document zero_caller_saved_regs attribute.
>         * doc/invoke.texi: Document -mzero-caller-saved-regs=.
>
> gcc/testsuite/
>
>         * gcc.target/i386/zero-scratch-regs-1.c: New test.
>         * gcc.target/i386/zero-scratch-regs-2.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-3.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-4.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-5.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-6.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-7.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-8.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-9.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-10.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-11.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-12.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-13.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-14.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-15.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-16.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-17.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-18.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-19.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-20.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-21.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-22.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-23.c: Likewise.

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: [PATCH 2/4] x86: Add -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]
  2020-05-05  8:14   ` Uros Bizjak
@ 2020-05-05  8:20     ` Richard Biener
  2020-07-14 14:45       ` [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all] Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Richard Biener @ 2020-05-05  8:20 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: H.J. Lu, gcc-patches, Jeff Law, Jakub Jelinek, Qing Zhao,
	keescook, victor.rodriguez.bahena

On Tue, 5 May 2020, Uros Bizjak wrote:

> On Mon, May 4, 2020 at 9:01 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > Add -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all] command-line
> > option and zero_caller_saved_regs("skip|used|all") function attribue:
> >
> > 1. -mzero-caller-saved-regs=skip and zero_caller_saved_regs("skip")
> >
> > Don't zero caller-saved registers upon function return.
> >
> > 2. -mzero-caller-saved-regs=used-gpr and zero_caller_saved_regs("used-gpr")
> >
> > Zero used caller-saved integer registers upon function return.
> >
> > 3. -mzero-caller-saved-regs=all-gpr and zero_caller_saved_regs("all-gpr")
> >
> > 2. -mzero-caller-saved-regs=used and zero_caller_saved_regs("used")
> >
> > Zero used caller-saved integer and vector registers upon function return.
> >
> > 3. -mzero-caller-saved-regs=all and zero_caller_saved_regs("all")
> >
> > Zero all caller-saved integer and vector registers upon function return.
> >
> > Tested on i686 and x86-64 with bootstrapping GCC trunk, making
> > -mzero-caller-saved-regs=used-gpr, -mzero-caller-saved-regs=all-gpr
> > -mzero-caller-saved-regs=used, and -mzero-caller-saved-regs=all enabled
> > by default.
> 
> This patch should be completely rewritten to use regsets infrastructure.
> 
> Please note accessible_reg_set global variable that nowadays holds all
> ABI-dependent active registers.
> 
> To ease the review, the patch should be split to at least options
> part, attribute part, infrastructure part and insn pattern change
> part, not to mention separate testsuite part.
> 
> Please also get global reviewer on board. This *IS* a new
> functionality, even if for some reason lives in i386 directory, so it
> requires global functionality review, community consensus, etc...
> *BEFORE* target maintainer reviews the patch from the implementation
> point of view. Asking for a target review at this time is just putting
> the cart before the horse and will get you nowhere.

Agreed.  I also wonder whether the functionality can live in the 
middle-end where possibly global info can be taken into
account, the RA (LRA for the actual transform I guess).

Richard.

> Uros.
> 
> > gcc/
> >
> >         * i386-expand.c (ix86_find_live_outgoing_regs): New function.
> >         (ix86_split_simple_return_pop_internal): Removed.
> >         (ix86_split_simple_return_internal): New function.
> >         * config/i386/i386-options.c (ix86_set_zero_caller_saved_regs_type):
> >         New function.
> >         (ix86_set_current_function): Call ix86_set_zero_caller_saved_regs_type.
> >         (ix86_handle_fndecl_attribute): Support zero_caller_saved_regs
> >         attribute.
> >         (ix86_attribute_table): Add zero_caller_saved_regs.
> >         * config/i386/i386-opts.h (zero_caller_saved_regs): New enum.
> >         * config/i386/i386-protos.h (ix86_split_simple_return_pop_internal):
> >         Renamed to ...
> >         (ix86_split_simple_return_internal): This.
> >         * config/i386/i386.c (ix86_expand_prologue): Replace
> >         gen_prologue_use with gen_pro_epilogue_use.
> >         (ix86_expand_epilogue): Replace gen_simple_return_pop_internal
> >         with ix86_split_simple_return_internal.  Replace
> >         gen_simple_return_internal with ix86_split_simple_return_internal.
> >         * config/i386/i386.h (machine_function): Add
> >         zero_caller_saved_regs_type, live_outgoing_int_regs and
> >         live_outgoing_vector_regs.
> >         (TARGET_POP_SCRATCH_REGISTER): New.
> >         * config/i386/i386.md (UNSPEC_SIMPLE_RETURN): New UNSPEC.
> >         (UNSPECV_PROLOGUE_USE): Renamed to ...
> >         (UNSPECV_PRO_EPILOGUE_USE): This.
> >         (prologue_use): Renamed to ...
> >         (pro_epilogue_use): This.
> >         (simple_return_internal): Changed to define_insn_and_split.
> >         (simple_return_internal_1): New pattern.
> >         (simple_return_pop_internal): Replace
> >         ix86_split_simple_return_pop_internal with
> >         ix86_split_simple_return_internal.  Always call
> >         ix86_split_simple_return_internal if epilogue_completed is
> >         true.
> >         (simple_return_pop_internal_1): New pattern.
> >         (Epilogue deallocator to pop peepholes): Enabled only if
> >         TARGET_POP_SCRATCH_REGISTER is true.
> >         * config/i386/i386.opt (mzero-caller-saved-regs=): New option.
> >         * doc/extend.texi: Document zero_caller_saved_regs attribute.
> >         * doc/invoke.texi: Document -mzero-caller-saved-regs=.
> >
> > gcc/testsuite/
> >
> >         * gcc.target/i386/zero-scratch-regs-1.c: New test.
> >         * gcc.target/i386/zero-scratch-regs-2.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-3.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-4.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-5.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-6.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-7.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-8.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-9.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-10.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-11.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-12.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-13.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-14.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-15.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-16.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-17.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-18.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-19.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-20.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-21.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-22.c: Likewise.
> >         * gcc.target/i386/zero-scratch-regs-23.c: Likewise.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: [PATCH 1/4] matcher-1.m: Change return type to int
  2020-05-04 19:01 [PATCH 1/4] matcher-1.m: Change return type to int H.J. Lu
                   ` (2 preceding siblings ...)
  2020-05-04 19:01 ` [PATCH 4/4] Update gcc.target/i386/ret-thunk-2[234].c H.J. Lu
@ 2020-05-05 16:29 ` Jeff Law
  3 siblings, 0 replies; 188+ messages in thread
From: Jeff Law @ 2020-05-05 16:29 UTC (permalink / raw)
  To: H.J. Lu, gcc-patches
  Cc: Uros Bizjak, Richard Biener, Jakub Jelinek, Qing Zhao, keescook,
	victor.rodriguez.bahena

On Mon, 2020-05-04 at 12:01 -0700, H.J. Lu wrote:
> my_exception_matcher must return int.  Otherwise, this test fails.
> 
> 	PR testsuite/84324
> 	* objc/execute/exceptions/matcher-1.m (my_exception_matcher):
> 	Change return type to int.
OK
jeff
> 


^ permalink raw reply	[flat|nested] 188+ messages in thread

* [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-05-05  8:20     ` Richard Biener
@ 2020-07-14 14:45       ` Qing Zhao
  2020-07-16 13:17         ` Victor Rodriguez
  2020-07-28 20:05         ` PING " Qing Zhao
  0 siblings, 2 replies; 188+ messages in thread
From: Qing Zhao @ 2020-07-14 14:45 UTC (permalink / raw)
  To: Richard Biener, Uros Bizjak, H.J. Lu
  Cc: gcc-patches, Jeff Law, Jakub Jelinek, Kees Cook,
	Rodriguez Bahena, Victor

Hi, Gcc team,

This patch is a follow-up on the previous patch and corresponding discussion:
https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>

From the previous round of discussion, the major issues raised were:

A. should be rewritten by using regsets infrastructure.  
B. Put the patch into middle-end instead of x86 backend. 

This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:

1. Change the names of the option and attribute from 
-mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
to:
-fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”) 
Add the new option and  new attribute in general. 
2. The main code generation part is moved from i386 backend to middle-end;
3. Add 4 target-hooks;
4. Implement these 4 target-hooks on i386 backend. 
5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.

The patch is as following:

[PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
command-line option and
zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:

  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")

  Don't zero call-used registers upon function return.

  2. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")

  Zero used call-used general purpose registers upon function return.

  3. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")

  Zero all call-used general purpose registers upon function return.

  4. -fzero-call-used-regs=used and zero_call_used_regs("used")

  Zero used call-used registers upon function return.

  5. -fzero-call-used-regs=all and zero_call_used_regs("all")

  Zero all call-used registers upon function return.

The feature is implemented in middle-end. But currently is only valid on X86.

Tested on x86-64 and aarch64 with bootstrapping GCC trunk, making
-fzero-call-used-regs=used-gpr, -fzero-call-used-regs=all-gpr
-fzero-call-used-regs=used, and -fzero-call-used-regs=all enabled
by default on x86-64.

Please take a look and let me know any more comment?

thanks.

Qing


====================================

gcc/ChangeLog:

2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>

	* common.opt: Add new option -fzero-call-used-regs.
	* config/i386/i386.c (ix86_zero_call_used_regno_p): New function.
	(ix86_zero_call_used_regno_mode): Likewise.
	(ix86_zero_all_vector_registers): Likewise.
	(ix86_expand_prologue): Replace gen_prologue_use with
	gen_pro_epilogue_use.
	(TARGET_ZERO_CALL_USED_REGNO_P): Define.
	(TARGET_ZERO_CALL_USED_REGNO_MODE): Define.
	(TARGET_PRO_EPILOGUE_USE): Define.
	(TARGET_ZERO_ALL_VECTOR_REGISTERS): Define.
	* config/i386/i386.md: Replace UNSPECV_PROLOGUE_USE
	with UNSPECV_PRO_EPILOGUE_USE.
	* coretypes.h (enum zero_call_used_regs): New type.
	* doc/extend.texi: Document the new zero_call_used_regs attribute.
	* doc/invoke.texi: Document the new -fzero-call-used-regs option.
	* doc/tm.texi: Regenerate.
	* doc/tm.texi.in (TARGET_ZERO_CALL_USED_REGNO_P): New hook.
	(TARGET_ZERO_CALL_USED_REGNO_MODE): Likewise.
	(TARGET_PRO_EPILOGUE_USE): Likewise.
	(TARGET_ZERO_ALL_VECTOR_REGISTERS): Likewise.
	* function.c (is_live_reg_at_exit): New function.
	(gen_call_used_regs_seq): Likewise.
	(make_epilogue_seq): Call gen_call_used_regs_seq.
	* function.h (is_live_reg_at_exit): Declare.
	* target.def (zero_call_used_regno_p): New hook.
	(zero_call_used_regno_mode): Likewise.
	(pro_epilogue_use): Likewise.
	(zero_all_vector_registers): Likewise.
	* targhooks.c (default_zero_call_used_regno_p): New function.
	(default_zero_call_used_regno_mode): Likewise.
	* targhooks.h (default_zero_call_used_regno_p): Declare.
	(default_zero_call_used_regno_mode): Declare.
	* toplev.c (process_options): Issue errors when -fzero-call-used-regs
	is used on targets that do not support it.
	* tree-core.h (struct tree_decl_with_vis): New field 
	zero_call_used_regs_type.
	* tree.h (DECL_ZERO_CALL_USED_REGS): New macro.

gcc/c-family/ChangeLog:

2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>

	* c-attribs.c (c_common_attribute_table): Add new attribute
	zero_call_used_regs.
	(handle_zero_call_used_regs_attribute): New function.

gcc/c/ChangeLog:

2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>

	* c-decl.c (merge_decls): Merge zero_call_used_regs_type.

gcc/testsuite/ChangeLog:

2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>

	* c-c++-common/zero-scratch-regs-1.c: New test.
	* c-c++-common/zero-scratch-regs-2.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-1.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-10.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-11.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-12.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-13.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-14.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-15.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-16.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-17.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-18.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-19.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-2.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-20.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-21.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-22.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-23.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-3.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-4.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-5.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-6.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-7.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-8.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-9.c: Likewise.

---
gcc/c-family/c-attribs.c                           |  68 ++++++++++
gcc/c/c-decl.c                                     |   4 +
gcc/common.opt                                     |  23 ++++
gcc/config/i386/i386.c                             |  58 ++++++++-
gcc/config/i386/i386.md                            |   6 +-
gcc/coretypes.h                                    |  10 ++
gcc/doc/extend.texi                                |  11 ++
gcc/doc/invoke.texi                                |  13 +-
gcc/doc/tm.texi                                    |  27 ++++
gcc/doc/tm.texi.in                                 |   8 ++
gcc/function.c                                     | 145 +++++++++++++++++++++
gcc/function.h                                     |   2 +
gcc/target.def                                     |  33 +++++
gcc/targhooks.c                                    |  17 +++
gcc/targhooks.h                                    |   3 +
gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |   3 +
gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |   4 +
.../gcc.target/i386/zero-scratch-regs-1.c          |  12 ++
.../gcc.target/i386/zero-scratch-regs-10.c         |  21 +++
.../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++++
.../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++++
.../gcc.target/i386/zero-scratch-regs-13.c         |  21 +++
.../gcc.target/i386/zero-scratch-regs-14.c         |  19 +++
.../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
.../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
.../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
.../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
.../gcc.target/i386/zero-scratch-regs-19.c         |  12 ++
.../gcc.target/i386/zero-scratch-regs-2.c          |  19 +++
.../gcc.target/i386/zero-scratch-regs-20.c         |  23 ++++
.../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
.../gcc.target/i386/zero-scratch-regs-22.c         |  19 +++
.../gcc.target/i386/zero-scratch-regs-23.c         |  19 +++
.../gcc.target/i386/zero-scratch-regs-3.c          |  12 ++
.../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
.../gcc.target/i386/zero-scratch-regs-5.c          |  20 +++
.../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
.../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
.../gcc.target/i386/zero-scratch-regs-8.c          |  19 +++
.../gcc.target/i386/zero-scratch-regs-9.c          |  15 +++
gcc/toplev.c                                       |   9 ++
gcc/tree-core.h                                    |   6 +-
gcc/tree.h                                         |   5 +
43 files changed, 866 insertions(+), 7 deletions(-)
create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 3721483..cc93d6f 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -136,6 +136,8 @@ static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
static tree ignore_attribute (tree *, tree, tree, int, bool *);
static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
+static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
+						 bool *);
static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
static tree handle_returns_nonnull_attribute (tree *, tree, tree, int, bool *);
@@ -434,6 +436,9 @@ const struct attribute_spec c_common_attribute_table[] =
			      ignore_attribute, NULL },
  { "no_split_stack",	      0, 0, true,  false, false, false,
			      handle_no_split_stack_attribute, NULL },
+  { "zero_call_used_regs",    1, 1, true, false, false, false,
+			      handle_zero_call_used_regs_attribute, NULL },
+
  /* For internal use (marking of builtins and runtime functions) only.
     The name contains space to prevent its usage in source code.  */
  { "fn spec",		      1, 1, false, true, true, false,
@@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, tree name,
  return NULL_TREE;
}

+/* Handle a "zero_call_used_regs" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
+				      int ARG_UNUSED (flags),
+				      bool *no_add_attris)
+{
+  tree decl = *node;
+  tree id = TREE_VALUE (args);
+  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
+
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+		"%qE attribute applies only to functions", name);
+      *no_add_attris = true;
+      return NULL_TREE;
+    }
+  else if (DECL_INITIAL (decl))
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+		"cannot set %qE attribute after definition", name);
+      *no_add_attris = true;
+      return NULL_TREE;
+    }
+
+  if (TREE_CODE (id) != STRING_CST)
+    {
+      error ("attribute %qE arguments not a string", name);
+      *no_add_attris = true;
+      return NULL_TREE;
+    }
+
+  if (!targetm.calls.pro_epilogue_use)
+    {
+      warning (OPT_Wattributes, "%qE attribute directive ignored", name);
+      return NULL_TREE;
+    }
+
+  if (strcmp (TREE_STRING_POINTER (id), "skip") == 0)
+    zero_call_used_regs_type = zero_call_used_regs_skip;
+  else if (strcmp (TREE_STRING_POINTER (id), "used-gpr") == 0)
+    zero_call_used_regs_type = zero_call_used_regs_used_gpr;
+  else if (strcmp (TREE_STRING_POINTER (id), "all-gpr") == 0)
+    zero_call_used_regs_type = zero_call_used_regs_all_gpr;
+  else if (strcmp (TREE_STRING_POINTER (id), "used") == 0)
+    zero_call_used_regs_type = zero_call_used_regs_used;
+  else if (strcmp (TREE_STRING_POINTER (id), "all") == 0)
+    zero_call_used_regs_type = zero_call_used_regs_all;
+  else
+    {
+      error ("attribute %qE argument must be one of %qs, %qs, %qs, %qs, or %qs",
+ 	     name, "skip", "used-gpr", "all-gpr", "used", "all");
+      *no_add_attris = true;
+      return NULL_TREE;
+    }
+
+  DECL_ZERO_CALL_USED_REGS (decl) = zero_call_used_regs_type;
+
+  return NULL_TREE;
+}
+
/* Handle a "returns_nonnull" attribute; arguments as in
   struct attribute_spec.handler.  */

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 81bd2ee..ded1880 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
	  DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
	}

+      /* Merge the zero_call_used_regs_type information.  */
+      if (TREE_CODE (newdecl) == FUNCTION_DECL)
+	DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS (olddecl);
+
      /* Merge the storage class information.  */
      merge_weak (newdecl, olddecl);

diff --git a/gcc/common.opt b/gcc/common.opt
index df8af36..19900f9 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3083,6 +3083,29 @@ fzero-initialized-in-bss
Common Report Var(flag_zero_initialized_in_bss) Init(1)
Put zero initialized data in the bss section.

+fzero-call-used-regs=
+Common Report RejectNegative Joined Enum(zero_call_used_regs) Var(flag_zero_call_used_regs) Init(zero_call_used_regs_skip)
+Clear call-used registers upon function return.
+
+Enum
+Name(zero_call_used_regs) Type(enum zero_call_used_regs)
+Known choices of clearing call-used registers upon function return (for use with the -fzero-call-used-regs= option):
+
+EnumValue
+Enum(zero_call_used_regs) String(skip) Value(zero_call_used_regs_skip)
+
+EnumValue
+Enum(zero_call_used_regs) String(used-gpr) Value(zero_call_used_regs_used_gpr)
+
+EnumValue
+Enum(zero_call_used_regs) String(all-gpr) Value(zero_call_used_regs_all_gpr)
+
+EnumValue
+Enum(zero_call_used_regs) String(used) Value(zero_call_used_regs_used)
+
+EnumValue
+Enum(zero_call_used_regs) String(all) Value(zero_call_used_regs_all)
+
g
Common Driver RejectNegative JoinedOrMissing
Generate debug information in default format.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 5c373c0..fd1aa9c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3551,6 +3551,48 @@ ix86_function_value_regno_p (const unsigned int regno)
  return false;
}

+/* TARGET_ZERO_CALL_USED_REGNO_P.  */
+
+static bool
+ix86_zero_call_used_regno_p (const unsigned int regno,
+			     bool gpr_only)
+{
+  return GENERAL_REGNO_P (regno) || (!gpr_only && SSE_REGNO_P (regno));
+}
+
+/* TARGET_ZERO_CALL_USED_REGNO_MODE.  */
+
+static machine_mode
+ix86_zero_call_used_regno_mode (const unsigned int regno, machine_mode)
+{
+  /* NB: We only need to zero the lower 32 bits for integer registers
+     and the lower 128 bits for vector registers since destination are
+     zero-extended to the full register width.  */
+  return GENERAL_REGNO_P (regno) ? SImode : V4SFmode;
+}
+
+/* TARGET_ZERO_ALL_VECTOR_REGISTERS.  */
+
+static rtx
+ix86_zero_all_vector_registers (bool used_only)
+{
+  if (!TARGET_AVX)
+    return NULL;
+
+  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
+	 || (TARGET_64BIT
+	     && (REX_SSE_REGNO_P (regno)
+		 || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
+	&& (!this_target_hard_regs->x_call_used_regs[regno]
+	    || fixed_regs[regno]
+	    || is_live_reg_at_exit (regno)
+	    || (used_only && !df_regs_ever_live_p (regno))))
+      return NULL;
+
+  return gen_avx_vzeroall ();
+}
+
/* Define how to find the value returned by a function.
   VALTYPE is the data type of the value (as a tree).
   If the precise function being called is known, FUNC is its FUNCTION_DECL;
@@ -8513,7 +8555,7 @@ ix86_expand_prologue (void)
      insn = emit_insn (gen_set_got (pic));
      RTX_FRAME_RELATED_P (insn) = 1;
      add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
-      emit_insn (gen_prologue_use (pic));
+      emit_insn (gen_pro_epilogue_use (pic));
      /* Deleting already emmitted SET_GOT if exist and allocated to
	 REAL_PIC_OFFSET_TABLE_REGNUM.  */
      ix86_elim_entry_set_got (pic);
@@ -8542,7 +8584,7 @@ ix86_expand_prologue (void)
     Further, prevent alloca modifications to the stack pointer from being
     combined with prologue modifications.  */
  if (TARGET_SEH)
-    emit_insn (gen_prologue_use (stack_pointer_rtx));
+    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));
}

/* Emit code to restore REG using a POP insn.  */
@@ -23319,6 +23361,18 @@ ix86_run_selftests (void)
#undef TARGET_FUNCTION_VALUE_REGNO_P
#define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p

+#undef TARGET_ZERO_CALL_USED_REGNO_P
+#define TARGET_ZERO_CALL_USED_REGNO_P ix86_zero_call_used_regno_p
+
+#undef TARGET_ZERO_CALL_USED_REGNO_MODE
+#define TARGET_ZERO_CALL_USED_REGNO_MODE ix86_zero_call_used_regno_mode
+
+#undef TARGET_PRO_EPILOGUE_USE
+#define TARGET_PRO_EPILOGUE_USE gen_pro_epilogue_use
+
+#undef TARGET_ZERO_ALL_VECTOR_REGISTERS
+#define TARGET_ZERO_ALL_VECTOR_REGISTERS ix86_zero_all_vector_registers
+
#undef TARGET_PROMOTE_FUNCTION_MODE
#define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d0ecd9e..e7df59f 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -194,7 +194,7 @@
  UNSPECV_STACK_PROBE
  UNSPECV_PROBE_STACK_RANGE
  UNSPECV_ALIGN
-  UNSPECV_PROLOGUE_USE
+  UNSPECV_PRO_EPILOGUE_USE
  UNSPECV_SPLIT_STACK_RETURN
  UNSPECV_CLD
  UNSPECV_NOPS
@@ -13525,8 +13525,8 @@

;; As USE insns aren't meaningful after reload, this is used instead
;; to prevent deleting instructions setting registers for PIC code
-(define_insn "prologue_use"
-  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]
+(define_insn "pro_epilogue_use"
+  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
  ""
  ""
  [(set_attr "length" "0")])
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 6b6cfcd..e56d6ec 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -418,6 +418,16 @@ enum symbol_visibility
  VISIBILITY_INTERNAL
};

+/* Zero call-used registers type.  */
+enum zero_call_used_regs {
+  zero_call_used_regs_unset = 0,
+  zero_call_used_regs_skip,
+  zero_call_used_regs_used_gpr,
+  zero_call_used_regs_all_gpr,
+  zero_call_used_regs_used,
+  zero_call_used_regs_all
+};
+
/* enums used by the targetm.excess_precision hook.  */

enum flt_eval_method
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index c800b74..b32c55f 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3984,6 +3984,17 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
A declaration to which @code{weakref} is attached and that is associated
with a named @code{target} must be @code{static}.

+@item zero_call_used_regs ("@var{choice}")
+@cindex @code{zero_call_used_regs} function attribute
+The @code{zero_call_used_regs} attribute causes the compiler to zero
+call-used registers at function return according to @var{choice}.
+@samp{skip} doesn't zero call-used registers. @samp{used-gpr} zeros
+call-used general purpose registers which are used in funciton.
+@samp{all-gpr} zeros all call-used general purpose registers.
+@samp{used} zeros call-used registers which are used in function.
+@samp{all} zeros all call-used registers.  The default for the
+attribute is controlled by @option{-fzero-call-used-regs}.
+
@end table

@c This is the end of the target-independent attribute table
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 09bcc5b..da02686 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -542,7 +542,7 @@ Objective-C and Objective-C++ Dialects}.
-funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
-funsafe-math-optimizations  -funswitch-loops @gol
-fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
--fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
+-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
--param @var{name}=@var{value}
-O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}

@@ -12273,6 +12273,17 @@ int foo (void)

Not all targets support this option.

+@item -fzero-call-used-regs=@var{choice}
+@opindex fzero-call-used-regs
+Zero call-used registers at function return according to
+@var{choice}.  @samp{skip}, which is the default, doesn't zero
+call-used registers.  @samp{used-gpr} zeros call-used general purpose
+registers which are used in function.  @samp{all-gpr} zeros all
+call-used registers.  @samp{used} zeros call-used registers which
+are used in function.  @samp{all} zeros all call-used registers.  You
+can control this behavior for a specific function by using the function
+attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
+
@item --param @var{name}=@var{value}
@opindex param
In some places, GCC uses various constants to control the amount of
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 6e7d9dc..43dddd3 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4571,6 +4571,22 @@ should recognize only the caller's register numbers.
If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.
@end deftypefn

+@deftypefn {Target Hook} bool TARGET_ZERO_CALL_USED_REGNO_P (const unsigned int @var{regno}, bool @var{general_reg_only_p})
+A target hook that returns @code{true} if @var{regno} is the number of a
+call used register.  If @var{general_reg_only_p} is @code{true},
+@var{regno} must be the number of a hard general register.
+
+If this hook is not defined, then default_zero_call_used_regno_p will be used.
+@end deftypefn
+
+@deftypefn {Target Hook} machine_mode TARGET_ZERO_CALL_USED_REGNO_MODE (const unsigned int @var{regno}, machine_mode @var{mode})
+A target hook that returns a mode of suitable to zero the register for the
+call used register @var{regno} in @var{mode}.
+
+If this hook is not defined, then default_zero_call_used_regno_mode will be
+used.
+@end deftypefn
+
@defmac APPLY_RESULT_SIZE
Define this macro if @samp{untyped_call} and @samp{untyped_return}
need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
@@ -12043,6 +12059,17 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
is needed.
@end deftypefn

+@deftypefn {Target Hook} rtx TARGET_PRO_EPILOGUE_USE (rtx @var{reg})
+This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
+prevent deleting register setting instructions in proprologue and epilogue.
+@end deftypefn
+
+@deftypefn {Target Hook} rtx TARGET_ZERO_ALL_VECTOR_REGISTERS (bool @var{used_only})
+This hook should return an rtx to zero all vector registers at function
+exit.  If @var{used_only} is @code{true}, only used vector registers should
+be zeroed.  Return @code{NULL} if possible
+@end deftypefn
+
@deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
When optimization is disabled, this hook indicates whether or not
arguments should be allocated to stack slots.  Normally, GCC allocates
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 3be984b..bee917a 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3430,6 +3430,10 @@ for a new target instead.

@hook TARGET_FUNCTION_VALUE_REGNO_P

+@hook TARGET_ZERO_CALL_USED_REGNO_P
+
+@hook TARGET_ZERO_CALL_USED_REGNO_MODE
+
@defmac APPLY_RESULT_SIZE
Define this macro if @samp{untyped_call} and @samp{untyped_return}
need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
@@ -8109,6 +8113,10 @@ and the associated definitions of those functions.

@hook TARGET_GET_DRAP_RTX

+@hook TARGET_PRO_EPILOGUE_USE
+
+@hook TARGET_ZERO_ALL_VECTOR_REGISTERS
+
@hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS

@hook TARGET_CONST_ANCHOR
diff --git a/gcc/function.c b/gcc/function.c
index 9eee9b5..9908530 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
#include "emit-rtl.h"
#include "recog.h"
#include "rtl-error.h"
+#include "hard-reg-set.h"
#include "alias.h"
#include "fold-const.h"
#include "stor-layout.h"
@@ -5808,6 +5809,147 @@ make_prologue_seq (void)
  return seq;
}

+/* Check whether the hard register REGNO is live at the exit block
+ * of the current routine.  */
+bool
+is_live_reg_at_exit (unsigned int regno)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
+    {
+      bitmap live_out = df_get_live_out (e->src);
+      if (REGNO_REG_SET_P (live_out, regno))
+	return true;
+    }
+
+  return false;
+}
+
+/* Emit a sequence of insns to zero the call-used-registers for the current
+ * function.  */
+
+static void
+gen_call_used_regs_seq (void)
+{
+  if (!targetm.calls.pro_epilogue_use)
+    return;
+
+  bool gpr_only = true;
+  bool used_only = true;
+  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
+
+  if (flag_zero_call_used_regs)
+    if (DECL_ZERO_CALL_USED_REGS (current_function_decl)
+	== zero_call_used_regs_unset)
+      zero_call_used_regs_type = flag_zero_call_used_regs;
+    else
+      zero_call_used_regs_type
+	= DECL_ZERO_CALL_USED_REGS (current_function_decl);
+  else
+    zero_call_used_regs_type = DECL_ZERO_CALL_USED_REGS (current_function_decl);
+
+  /* No need to zero call-used-regs when no user request is present.  */
+  if (zero_call_used_regs_type <= zero_call_used_regs_skip)
+    return;
+
+  /* No need to zero call-used-regs in main ().  */
+  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
+    return;
+
+  /* No need to zero call-used-regs if __builtin_eh_return is called
+     since it isn't a normal function return.  */
+  if (crtl->calls_eh_return)
+    return;
+
+  /* If gpr_only is true, only zero call-used-registers that are
+     general-purpose registers; if used_only is true, only zero
+     call-used-registers that are used in the current function.  */
+  switch (zero_call_used_regs_type)
+    {
+      case zero_call_used_regs_all_gpr:
+	used_only = false;
+	break;
+      case zero_call_used_regs_used:
+	gpr_only = false;
+	break;
+      case zero_call_used_regs_all:
+	gpr_only = false;
+	used_only = false;
+	break;
+      default:
+	break;
+    }
+
+  /* An optimization to use a single hard insn to zero all vector registers on
+     the target that provides such insn.  */
+  if (!gpr_only
+      && targetm.calls.zero_all_vector_registers)
+    {
+      rtx zero_all_vec_insn
+	= targetm.calls.zero_all_vector_registers (used_only);
+      if (zero_all_vec_insn)
+	{
+	  emit_insn (zero_all_vec_insn);
+	  gpr_only = true;
+	}
+    }
+
+  /* For each of the hard registers, check to see whether we should zero it if:
+     1. it is a call-used-registers;
+ and 2. it is not a fixed-registers;
+ and 3. it is not live at the end of the routine;
+ and 4. it is general purpose register if gpr_only is true;
+ and 5. it is used in the routine if used_only is true;
+   */
+
+  /* This array holds the zero rtx with the correponding machine mode.  */
+  rtx zero_rtx[(int)MAX_MACHINE_MODE];
+  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
+    zero_rtx[i] = NULL_RTX;
+
+  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+    {
+      if (!this_target_hard_regs->x_call_used_regs[regno])
+	continue;
+      if (fixed_regs[regno])
+	continue;
+      if (is_live_reg_at_exit (regno))
+	continue;
+      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
+	continue;
+      if (used_only && !df_regs_ever_live_p (regno))
+	continue;
+
+      /* Now we can emit insn to zero this register.  */
+      rtx reg, tmp;
+
+      machine_mode mode
+	= targetm.calls.zero_call_used_regno_mode (regno,
+						   reg_raw_mode[regno]);
+      if (mode == VOIDmode)
+	continue;
+      if (!have_regs_of_mode[mode])
+	continue;
+
+      reg = gen_rtx_REG (mode, regno);
+      if (zero_rtx[(int)mode] == NULL_RTX)
+	{
+	  zero_rtx[(int)mode] = reg;
+	  tmp = gen_rtx_SET (reg, const0_rtx);
+	  emit_insn (tmp);
+	}
+      else
+	emit_move_insn (reg, zero_rtx[(int)mode]);
+
+      emit_insn (targetm.calls.pro_epilogue_use (reg));
+    }
+
+  return;
+}
+
+
/* Return a sequence to be used as the epilogue for the current function,
   or NULL.  */

@@ -5819,6 +5961,9 @@ make_epilogue_seq (void)

  start_sequence ();
  emit_note (NOTE_INSN_EPILOGUE_BEG);
+
+  gen_call_used_regs_seq ();
+
  rtx_insn *seq = targetm.gen_epilogue ();
  if (seq)
    emit_jump_insn (seq);
diff --git a/gcc/function.h b/gcc/function.h
index d55cbdd..fc36c3e 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -705,4 +705,6 @@ extern const char *current_function_name (void);

extern void used_types_insert (tree);

+extern bool is_live_reg_at_exit (unsigned int);
+
#endif  /* GCC_FUNCTION_H */
diff --git a/gcc/target.def b/gcc/target.def
index 07059a8..8aab63e 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5022,6 +5022,26 @@ If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.",
 default_function_value_regno_p)

DEFHOOK
+(zero_call_used_regno_p,
+ "A target hook that returns @code{true} if @var{regno} is the number of a\n\
+call used register.  If @var{general_reg_only_p} is @code{true},\n\
+@var{regno} must be the number of a hard general register.\n\
+\n\
+If this hook is not defined, then default_zero_call_used_regno_p will be used.",
+ bool, (const unsigned int regno, bool general_reg_only_p),
+ default_zero_call_used_regno_p)
+
+DEFHOOK
+(zero_call_used_regno_mode,
+ "A target hook that returns a mode of suitable to zero the register for the\n\
+call used register @var{regno} in @var{mode}.\n\
+\n\
+If this hook is not defined, then default_zero_call_used_regno_mode will be\n\
+used.",
+ machine_mode, (const unsigned int regno, machine_mode mode),
+ default_zero_call_used_regno_mode)
+
+DEFHOOK
(fntype_abi,
 "Return the ABI used by a function with type @var{type}; see the\n\
definition of @code{predefined_function_abi} for details of the ABI\n\
@@ -5068,6 +5088,19 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
is needed.",
 rtx, (void), NULL)

+DEFHOOK
+(pro_epilogue_use,
+ "This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to\n\
+prevent deleting register setting instructions in proprologue and epilogue.",
+ rtx, (rtx reg), NULL)
+
+DEFHOOK
+(zero_all_vector_registers,
+ "This hook should return an rtx to zero all vector registers at function\n\
+exit.  If @var{used_only} is @code{true}, only used vector registers should\n\
+be zeroed.  Return @code{NULL} if possible",
+ rtx, (bool used_only), NULL)
+
/* Return true if all function parameters should be spilled to the
   stack.  */
DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 0113c7b..ed02173 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -987,6 +987,23 @@ default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
#endif
}

+/* The default hook for TARGET_ZERO_CALL_USED_REGNO_P.  */
+
+bool
+default_zero_call_used_regno_p (const unsigned int,
+				bool)
+{
+  return false;
+}
+
+/* The default hook for TARGET_ZERO_CALL_USED_REGNO_MODE.  */
+
+machine_mode
+default_zero_call_used_regno_mode (const unsigned int, machine_mode mode)
+{
+  return mode;
+}
+
rtx
default_internal_arg_pointer (void)
{
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index b572a36..370df19 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -162,6 +162,9 @@ extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
extern rtx default_function_value (const_tree, const_tree, bool);
extern rtx default_libcall_value (machine_mode, const_rtx);
extern bool default_function_value_regno_p (const unsigned int);
+extern bool default_zero_call_used_regno_p (const unsigned int, bool);
+extern machine_mode default_zero_call_used_regno_mode (const unsigned int,
+						       machine_mode);
extern rtx default_internal_arg_pointer (void);
extern rtx default_static_chain (const_tree, bool);
extern void default_trampoline_init (rtx, tree, rtx);
diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
new file mode 100644
index 0000000..3c2ac72
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
@@ -0,0 +1,3 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fzero-call-used-regs=used" } */
+/* { dg-error "'-fzero-call-used-regs=' is not supported for this target" "" { target { ! "i?86-*-* x86_64-*-*" } } 0 } */
diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
new file mode 100644
index 0000000..acf48c4
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
@@ -0,0 +1,4 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2" } */
+
+extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr"))); /* { dg-warning " attribute directive ignored" "" {target { ! "i?86-*-* x86_64-*-*" } } 0 } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
new file mode 100644
index 0000000..9f61dc4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
new file mode 100644
index 0000000..09048e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
new file mode 100644
index 0000000..4862688
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
+
+struct S { int i; };
+__attribute__((const, noinline, noclone))
+struct S foo (int x)
+{
+  struct S s;
+  s.i = x;
+  return s;
+}
+
+int a[2048], b[2048], c[2048], d[2048];
+struct S e[2048];
+
+__attribute__((noinline, noclone)) void
+bar (void)
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    {
+      e[i] = foo (i);
+      a[i+2] = a[i] + a[i+1];
+      b[10] = b[10] + i;
+      c[i] = c[2047 - i];
+      d[i] = d[i + 1];
+    }
+}
+
+int
+main ()
+{
+  int i;
+  bar ();
+  for (i = 0; i < 1024; i++)
+    if (e[i].i != i)
+      __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
new file mode 100644
index 0000000..500251b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
+
+struct S { int i; };
+__attribute__((const, noinline, noclone))
+struct S foo (int x)
+{
+  struct S s;
+  s.i = x;
+  return s;
+}
+
+int a[2048], b[2048], c[2048], d[2048];
+struct S e[2048];
+
+__attribute__((noinline, noclone)) void
+bar (void)
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    {
+      e[i] = foo (i);
+      a[i+2] = a[i] + a[i+1];
+      b[10] = b[10] + i;
+      c[i] = c[2047 - i];
+      d[i] = d[i + 1];
+    }
+}
+
+int
+main ()
+{
+  int i;
+  bar ();
+  for (i = 0; i < 1024; i++)
+    if (e[i].i != i)
+      __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
new file mode 100644
index 0000000..8b058e3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
new file mode 100644
index 0000000..d4eaaf7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
new file mode 100644
index 0000000..dd3bb90
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
new file mode 100644
index 0000000..e2274f6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all" } */
+
+extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
new file mode 100644
index 0000000..7f5d153
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used" } */
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
new file mode 100644
index 0000000..fe13d2b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
+
+float
+foo (float z, float y, float x)
+{
+  return x + y;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
new file mode 100644
index 0000000..205a532
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
+
+float
+foo (float z, float y, float x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
new file mode 100644
index 0000000..e046684
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
new file mode 100644
index 0000000..4be8ff6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
@@ -0,0 +1,23 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
+
+float
+foo (float z, float y, float x)
+{
+  return x + y;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
new file mode 100644
index 0000000..0eb34e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
+
+__attribute__ ((zero_call_used_regs("used")))
+float
+foo (float z, float y, float x)
+{
+  return x + y;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
new file mode 100644
index 0000000..cbb63a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
new file mode 100644
index 0000000..7573197
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
new file mode 100644
index 0000000..de71223
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
new file mode 100644
index 0000000..ccfa441
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
new file mode 100644
index 0000000..6b46ca3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
@@ -0,0 +1,20 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+__attribute__ ((zero_call_used_regs("all-gpr")))
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
new file mode 100644
index 0000000..0680f38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
+
+extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
new file mode 100644
index 0000000..534defa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
new file mode 100644
index 0000000..477bb19
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
new file mode 100644
index 0000000..a305a60
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 95eea63..01a1f24 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1464,6 +1464,15 @@ process_options (void)
	}
    }

+  if (flag_zero_call_used_regs != zero_call_used_regs_skip
+      && !targetm.calls.pro_epilogue_use)
+    {
+      error_at (UNKNOWN_LOCATION,
+		"%<-fzero-call-used-regs=%> is not supported for this "
+		"target");
+      flag_zero_call_used_regs = zero_call_used_regs_skip;
+    }
+
  /* One region RA really helps to decrease the code size.  */
  if (flag_ira_region == IRA_REGION_AUTODETECT)
    flag_ira_region
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 8c5a2e3..71badbd 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1825,7 +1825,11 @@ struct GTY(()) tree_decl_with_vis {
 unsigned final : 1;
 /* Belong to FUNCTION_DECL exclusively.  */
 unsigned regdecl_flag : 1;
- /* 14 unused bits. */
+
+ /* How to clear call-used registers upon function return.  */
+ ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
+
+ /* 11 unused bits.  */
};

struct GTY(()) tree_var_decl {
diff --git a/gcc/tree.h b/gcc/tree.h
index cf546ed..d378a88 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -2925,6 +2925,11 @@ extern void decl_value_expr_insert (tree, tree);
#define DECL_VISIBILITY(NODE) \
  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.visibility)

+/* Value of the function decl's type of zeroing the call used
+   registers upon return from function.  */
+#define DECL_ZERO_CALL_USED_REGS(NODE) \
+  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.zero_call_used_regs_type)
+
/* Nonzero means that the decl (or an enclosing scope) had its
   visibility specified rather than being inferred.  */
#define DECL_VISIBILITY_SPECIFIED(NODE) \
-- 
1.9.1

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-07-14 14:45       ` [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all] Qing Zhao
@ 2020-07-16 13:17         ` Victor Rodriguez
  2020-07-28 20:05         ` PING " Qing Zhao
  1 sibling, 0 replies; 188+ messages in thread
From: Victor Rodriguez @ 2020-07-16 13:17 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Biener, Uros Bizjak, H.J. Lu, Jakub Jelinek, gcc-patches,
	Kees Cook, Rodriguez Bahena, Victor

On Tue, Jul 14, 2020 at 9:52 AM Qing Zhao via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi, Gcc team,
>
> This patch is a follow-up on the previous patch and corresponding discussion:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>
>
> From the previous round of discussion, the major issues raised were:
>
> A. should be rewritten by using regsets infrastructure.
> B. Put the patch into middle-end instead of x86 backend.
>
> This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
>
> 1. Change the names of the option and attribute from
> -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> to:
> -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> Add the new option and  new attribute in general.
> 2. The main code generation part is moved from i386 backend to middle-end;
> 3. Add 4 target-hooks;
> 4. Implement these 4 target-hooks on i386 backend.
> 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
>
> The patch is as following:
>
> [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> command-line option and
> zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
>
>   1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
>
>   Don't zero call-used registers upon function return.
>
>   2. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")
>
>   Zero used call-used general purpose registers upon function return.
>
>   3. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")
>
>   Zero all call-used general purpose registers upon function return.
>
>   4. -fzero-call-used-regs=used and zero_call_used_regs("used")
>
>   Zero used call-used registers upon function return.
>
>   5. -fzero-call-used-regs=all and zero_call_used_regs("all")
>
>   Zero all call-used registers upon function return.
>
> The feature is implemented in middle-end. But currently is only valid on X86.
>
> Tested on x86-64 and aarch64 with bootstrapping GCC trunk, making
> -fzero-call-used-regs=used-gpr, -fzero-call-used-regs=all-gpr
> -fzero-call-used-regs=used, and -fzero-call-used-regs=all enabled
> by default on x86-64.
>
> Please take a look and let me know any more comment?
>
> thanks.
>
> Qing
>
>
> ====================================
>
> gcc/ChangeLog:
>
> 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
> 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
>
>         * common.opt: Add new option -fzero-call-used-regs.
>         * config/i386/i386.c (ix86_zero_call_used_regno_p): New function.
>         (ix86_zero_call_used_regno_mode): Likewise.
>         (ix86_zero_all_vector_registers): Likewise.
>         (ix86_expand_prologue): Replace gen_prologue_use with
>         gen_pro_epilogue_use.
>         (TARGET_ZERO_CALL_USED_REGNO_P): Define.
>         (TARGET_ZERO_CALL_USED_REGNO_MODE): Define.
>         (TARGET_PRO_EPILOGUE_USE): Define.
>         (TARGET_ZERO_ALL_VECTOR_REGISTERS): Define.
>         * config/i386/i386.md: Replace UNSPECV_PROLOGUE_USE
>         with UNSPECV_PRO_EPILOGUE_USE.
>         * coretypes.h (enum zero_call_used_regs): New type.
>         * doc/extend.texi: Document the new zero_call_used_regs attribute.
>         * doc/invoke.texi: Document the new -fzero-call-used-regs option.
>         * doc/tm.texi: Regenerate.
>         * doc/tm.texi.in (TARGET_ZERO_CALL_USED_REGNO_P): New hook.
>         (TARGET_ZERO_CALL_USED_REGNO_MODE): Likewise.
>         (TARGET_PRO_EPILOGUE_USE): Likewise.
>         (TARGET_ZERO_ALL_VECTOR_REGISTERS): Likewise.
>         * function.c (is_live_reg_at_exit): New function.
>         (gen_call_used_regs_seq): Likewise.
>         (make_epilogue_seq): Call gen_call_used_regs_seq.
>         * function.h (is_live_reg_at_exit): Declare.
>         * target.def (zero_call_used_regno_p): New hook.
>         (zero_call_used_regno_mode): Likewise.
>         (pro_epilogue_use): Likewise.
>         (zero_all_vector_registers): Likewise.
>         * targhooks.c (default_zero_call_used_regno_p): New function.
>         (default_zero_call_used_regno_mode): Likewise.
>         * targhooks.h (default_zero_call_used_regno_p): Declare.
>         (default_zero_call_used_regno_mode): Declare.
>         * toplev.c (process_options): Issue errors when -fzero-call-used-regs
>         is used on targets that do not support it.
>         * tree-core.h (struct tree_decl_with_vis): New field
>         zero_call_used_regs_type.
>         * tree.h (DECL_ZERO_CALL_USED_REGS): New macro.
>
> gcc/c-family/ChangeLog:
>
> 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
> 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
>
>         * c-attribs.c (c_common_attribute_table): Add new attribute
>         zero_call_used_regs.
>         (handle_zero_call_used_regs_attribute): New function.
>
> gcc/c/ChangeLog:
>
> 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
> 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
>
>         * c-decl.c (merge_decls): Merge zero_call_used_regs_type.
>
> gcc/testsuite/ChangeLog:
>
> 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
> 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
>
>         * c-c++-common/zero-scratch-regs-1.c: New test.
>         * c-c++-common/zero-scratch-regs-2.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-1.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-10.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-11.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-12.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-13.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-14.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-15.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-16.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-17.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-18.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-19.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-2.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-20.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-21.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-22.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-23.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-3.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-4.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-5.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-6.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-7.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-8.c: Likewise.
>         * gcc.target/i386/zero-scratch-regs-9.c: Likewise.
>
> ---
> gcc/c-family/c-attribs.c                           |  68 ++++++++++
> gcc/c/c-decl.c                                     |   4 +
> gcc/common.opt                                     |  23 ++++
> gcc/config/i386/i386.c                             |  58 ++++++++-
> gcc/config/i386/i386.md                            |   6 +-
> gcc/coretypes.h                                    |  10 ++
> gcc/doc/extend.texi                                |  11 ++
> gcc/doc/invoke.texi                                |  13 +-
> gcc/doc/tm.texi                                    |  27 ++++
> gcc/doc/tm.texi.in                                 |   8 ++
> gcc/function.c                                     | 145 +++++++++++++++++++++
> gcc/function.h                                     |   2 +
> gcc/target.def                                     |  33 +++++
> gcc/targhooks.c                                    |  17 +++
> gcc/targhooks.h                                    |   3 +
> gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |   3 +
> gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |   4 +
> .../gcc.target/i386/zero-scratch-regs-1.c          |  12 ++
> .../gcc.target/i386/zero-scratch-regs-10.c         |  21 +++
> .../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++++
> .../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++++
> .../gcc.target/i386/zero-scratch-regs-13.c         |  21 +++
> .../gcc.target/i386/zero-scratch-regs-14.c         |  19 +++
> .../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
> .../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
> .../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
> .../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
> .../gcc.target/i386/zero-scratch-regs-19.c         |  12 ++
> .../gcc.target/i386/zero-scratch-regs-2.c          |  19 +++
> .../gcc.target/i386/zero-scratch-regs-20.c         |  23 ++++
> .../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
> .../gcc.target/i386/zero-scratch-regs-22.c         |  19 +++
> .../gcc.target/i386/zero-scratch-regs-23.c         |  19 +++
> .../gcc.target/i386/zero-scratch-regs-3.c          |  12 ++
> .../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
> .../gcc.target/i386/zero-scratch-regs-5.c          |  20 +++
> .../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
> .../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
> .../gcc.target/i386/zero-scratch-regs-8.c          |  19 +++
> .../gcc.target/i386/zero-scratch-regs-9.c          |  15 +++
> gcc/toplev.c                                       |   9 ++
> gcc/tree-core.h                                    |   6 +-
> gcc/tree.h                                         |   5 +
> 43 files changed, 866 insertions(+), 7 deletions(-)
> create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
>
> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> index 3721483..cc93d6f 100644
> --- a/gcc/c-family/c-attribs.c
> +++ b/gcc/c-family/c-attribs.c
> @@ -136,6 +136,8 @@ static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
> static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> static tree ignore_attribute (tree *, tree, tree, int, bool *);
> static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
> +static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
> +                                                bool *);
> static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
> static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
> static tree handle_returns_nonnull_attribute (tree *, tree, tree, int, bool *);
> @@ -434,6 +436,9 @@ const struct attribute_spec c_common_attribute_table[] =
>                               ignore_attribute, NULL },
>   { "no_split_stack",         0, 0, true,  false, false, false,
>                               handle_no_split_stack_attribute, NULL },
> +  { "zero_call_used_regs",    1, 1, true, false, false, false,
> +                             handle_zero_call_used_regs_attribute, NULL },
> +
>   /* For internal use (marking of builtins and runtime functions) only.
>      The name contains space to prevent its usage in source code.  */
>   { "fn spec",                1, 1, false, true, true, false,
> @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, tree name,
>   return NULL_TREE;
> }
>
> +/* Handle a "zero_call_used_regs" attribute; arguments as in
> +   struct attribute_spec.handler.  */
> +
> +static tree
> +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> +                                     int ARG_UNUSED (flags),
> +                                     bool *no_add_attris)
> +{
> +  tree decl = *node;
> +  tree id = TREE_VALUE (args);
> +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> +
> +  if (TREE_CODE (decl) != FUNCTION_DECL)
> +    {
> +      error_at (DECL_SOURCE_LOCATION (decl),
> +               "%qE attribute applies only to functions", name);
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +  else if (DECL_INITIAL (decl))
> +    {
> +      error_at (DECL_SOURCE_LOCATION (decl),
> +               "cannot set %qE attribute after definition", name);
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  if (TREE_CODE (id) != STRING_CST)
> +    {
> +      error ("attribute %qE arguments not a string", name);
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  if (!targetm.calls.pro_epilogue_use)
> +    {
> +      warning (OPT_Wattributes, "%qE attribute directive ignored", name);
> +      return NULL_TREE;
> +    }
> +
> +  if (strcmp (TREE_STRING_POINTER (id), "skip") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_skip;
> +  else if (strcmp (TREE_STRING_POINTER (id), "used-gpr") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_used_gpr;
> +  else if (strcmp (TREE_STRING_POINTER (id), "all-gpr") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_all_gpr;
> +  else if (strcmp (TREE_STRING_POINTER (id), "used") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_used;
> +  else if (strcmp (TREE_STRING_POINTER (id), "all") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_all;
> +  else
> +    {
> +      error ("attribute %qE argument must be one of %qs, %qs, %qs, %qs, or %qs",
> +            name, "skip", "used-gpr", "all-gpr", "used", "all");
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  DECL_ZERO_CALL_USED_REGS (decl) = zero_call_used_regs_type;
> +
> +  return NULL_TREE;
> +}
> +
> /* Handle a "returns_nonnull" attribute; arguments as in
>    struct attribute_spec.handler.  */
>
> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> index 81bd2ee..ded1880 100644
> --- a/gcc/c/c-decl.c
> +++ b/gcc/c/c-decl.c
> @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
>           DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
>         }
>
> +      /* Merge the zero_call_used_regs_type information.  */
> +      if (TREE_CODE (newdecl) == FUNCTION_DECL)
> +       DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS (olddecl);
> +
>       /* Merge the storage class information.  */
>       merge_weak (newdecl, olddecl);
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index df8af36..19900f9 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -3083,6 +3083,29 @@ fzero-initialized-in-bss
> Common Report Var(flag_zero_initialized_in_bss) Init(1)
> Put zero initialized data in the bss section.
>
> +fzero-call-used-regs=
> +Common Report RejectNegative Joined Enum(zero_call_used_regs) Var(flag_zero_call_used_regs) Init(zero_call_used_regs_skip)
> +Clear call-used registers upon function return.
> +
> +Enum
> +Name(zero_call_used_regs) Type(enum zero_call_used_regs)
> +Known choices of clearing call-used registers upon function return (for use with the -fzero-call-used-regs= option):
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(skip) Value(zero_call_used_regs_skip)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(used-gpr) Value(zero_call_used_regs_used_gpr)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(all-gpr) Value(zero_call_used_regs_all_gpr)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(used) Value(zero_call_used_regs_used)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(all) Value(zero_call_used_regs_all)
> +
> g
> Common Driver RejectNegative JoinedOrMissing
> Generate debug information in default format.
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 5c373c0..fd1aa9c 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -3551,6 +3551,48 @@ ix86_function_value_regno_p (const unsigned int regno)
>   return false;
> }
>
> +/* TARGET_ZERO_CALL_USED_REGNO_P.  */
> +
> +static bool
> +ix86_zero_call_used_regno_p (const unsigned int regno,
> +                            bool gpr_only)
> +{
> +  return GENERAL_REGNO_P (regno) || (!gpr_only && SSE_REGNO_P (regno));
> +}
> +
> +/* TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> +
> +static machine_mode
> +ix86_zero_call_used_regno_mode (const unsigned int regno, machine_mode)
> +{
> +  /* NB: We only need to zero the lower 32 bits for integer registers
> +     and the lower 128 bits for vector registers since destination are
> +     zero-extended to the full register width.  */
> +  return GENERAL_REGNO_P (regno) ? SImode : V4SFmode;
> +}
> +
> +/* TARGET_ZERO_ALL_VECTOR_REGISTERS.  */
> +
> +static rtx
> +ix86_zero_all_vector_registers (bool used_only)
> +{
> +  if (!TARGET_AVX)
> +    return NULL;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> +        || (TARGET_64BIT
> +            && (REX_SSE_REGNO_P (regno)
> +                || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> +       && (!this_target_hard_regs->x_call_used_regs[regno]
> +           || fixed_regs[regno]
> +           || is_live_reg_at_exit (regno)
> +           || (used_only && !df_regs_ever_live_p (regno))))
> +      return NULL;
> +
> +  return gen_avx_vzeroall ();
> +}
> +
> /* Define how to find the value returned by a function.
>    VALTYPE is the data type of the value (as a tree).
>    If the precise function being called is known, FUNC is its FUNCTION_DECL;
> @@ -8513,7 +8555,7 @@ ix86_expand_prologue (void)
>       insn = emit_insn (gen_set_got (pic));
>       RTX_FRAME_RELATED_P (insn) = 1;
>       add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
> -      emit_insn (gen_prologue_use (pic));
> +      emit_insn (gen_pro_epilogue_use (pic));
>       /* Deleting already emmitted SET_GOT if exist and allocated to
>          REAL_PIC_OFFSET_TABLE_REGNUM.  */
>       ix86_elim_entry_set_got (pic);
> @@ -8542,7 +8584,7 @@ ix86_expand_prologue (void)
>      Further, prevent alloca modifications to the stack pointer from being
>      combined with prologue modifications.  */
>   if (TARGET_SEH)
> -    emit_insn (gen_prologue_use (stack_pointer_rtx));
> +    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));
> }
>
> /* Emit code to restore REG using a POP insn.  */
> @@ -23319,6 +23361,18 @@ ix86_run_selftests (void)
> #undef TARGET_FUNCTION_VALUE_REGNO_P
> #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p
>
> +#undef TARGET_ZERO_CALL_USED_REGNO_P
> +#define TARGET_ZERO_CALL_USED_REGNO_P ix86_zero_call_used_regno_p
> +
> +#undef TARGET_ZERO_CALL_USED_REGNO_MODE
> +#define TARGET_ZERO_CALL_USED_REGNO_MODE ix86_zero_call_used_regno_mode
> +
> +#undef TARGET_PRO_EPILOGUE_USE
> +#define TARGET_PRO_EPILOGUE_USE gen_pro_epilogue_use
> +
> +#undef TARGET_ZERO_ALL_VECTOR_REGISTERS
> +#define TARGET_ZERO_ALL_VECTOR_REGISTERS ix86_zero_all_vector_registers
> +
> #undef TARGET_PROMOTE_FUNCTION_MODE
> #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index d0ecd9e..e7df59f 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -194,7 +194,7 @@
>   UNSPECV_STACK_PROBE
>   UNSPECV_PROBE_STACK_RANGE
>   UNSPECV_ALIGN
> -  UNSPECV_PROLOGUE_USE
> +  UNSPECV_PRO_EPILOGUE_USE
>   UNSPECV_SPLIT_STACK_RETURN
>   UNSPECV_CLD
>   UNSPECV_NOPS
> @@ -13525,8 +13525,8 @@
>
> ;; As USE insns aren't meaningful after reload, this is used instead
> ;; to prevent deleting instructions setting registers for PIC code
> -(define_insn "prologue_use"
> -  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]
> +(define_insn "pro_epilogue_use"
> +  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
>   ""
>   ""
>   [(set_attr "length" "0")])
> diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> index 6b6cfcd..e56d6ec 100644
> --- a/gcc/coretypes.h
> +++ b/gcc/coretypes.h
> @@ -418,6 +418,16 @@ enum symbol_visibility
>   VISIBILITY_INTERNAL
> };
>
> +/* Zero call-used registers type.  */
> +enum zero_call_used_regs {
> +  zero_call_used_regs_unset = 0,
> +  zero_call_used_regs_skip,
> +  zero_call_used_regs_used_gpr,
> +  zero_call_used_regs_all_gpr,
> +  zero_call_used_regs_used,
> +  zero_call_used_regs_all
> +};
> +
> /* enums used by the targetm.excess_precision hook.  */
>
> enum flt_eval_method
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index c800b74..b32c55f 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -3984,6 +3984,17 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
> A declaration to which @code{weakref} is attached and that is associated
> with a named @code{target} must be @code{static}.
>
> +@item zero_call_used_regs ("@var{choice}")
> +@cindex @code{zero_call_used_regs} function attribute
> +The @code{zero_call_used_regs} attribute causes the compiler to zero
> +call-used registers at function return according to @var{choice}.
> +@samp{skip} doesn't zero call-used registers. @samp{used-gpr} zeros
> +call-used general purpose registers which are used in funciton.
> +@samp{all-gpr} zeros all call-used general purpose registers.
> +@samp{used} zeros call-used registers which are used in function.
> +@samp{all} zeros all call-used registers.  The default for the
> +attribute is controlled by @option{-fzero-call-used-regs}.
> +
> @end table
>
> @c This is the end of the target-independent attribute table
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 09bcc5b..da02686 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -542,7 +542,7 @@ Objective-C and Objective-C++ Dialects}.
> -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
> -funsafe-math-optimizations  -funswitch-loops @gol
> -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
> --fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
> +-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
> --param @var{name}=@var{value}
> -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}
>
> @@ -12273,6 +12273,17 @@ int foo (void)
>
> Not all targets support this option.
>
> +@item -fzero-call-used-regs=@var{choice}
> +@opindex fzero-call-used-regs
> +Zero call-used registers at function return according to
> +@var{choice}.  @samp{skip}, which is the default, doesn't zero
> +call-used registers.  @samp{used-gpr} zeros call-used general purpose
> +registers which are used in function.  @samp{all-gpr} zeros all
> +call-used registers.  @samp{used} zeros call-used registers which
> +are used in function.  @samp{all} zeros all call-used registers.  You
> +can control this behavior for a specific function by using the function
> +attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
> +
> @item --param @var{name}=@var{value}
> @opindex param
> In some places, GCC uses various constants to control the amount of
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 6e7d9dc..43dddd3 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -4571,6 +4571,22 @@ should recognize only the caller's register numbers.
> If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.
> @end deftypefn
>
> +@deftypefn {Target Hook} bool TARGET_ZERO_CALL_USED_REGNO_P (const unsigned int @var{regno}, bool @var{general_reg_only_p})
> +A target hook that returns @code{true} if @var{regno} is the number of a
> +call used register.  If @var{general_reg_only_p} is @code{true},
> +@var{regno} must be the number of a hard general register.
> +
> +If this hook is not defined, then default_zero_call_used_regno_p will be used.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} machine_mode TARGET_ZERO_CALL_USED_REGNO_MODE (const unsigned int @var{regno}, machine_mode @var{mode})
> +A target hook that returns a mode of suitable to zero the register for the
> +call used register @var{regno} in @var{mode}.
> +
> +If this hook is not defined, then default_zero_call_used_regno_mode will be
> +used.
> +@end deftypefn
> +
> @defmac APPLY_RESULT_SIZE
> Define this macro if @samp{untyped_call} and @samp{untyped_return}
> need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> @@ -12043,6 +12059,17 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
> is needed.
> @end deftypefn
>
> +@deftypefn {Target Hook} rtx TARGET_PRO_EPILOGUE_USE (rtx @var{reg})
> +This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
> +prevent deleting register setting instructions in proprologue and epilogue.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} rtx TARGET_ZERO_ALL_VECTOR_REGISTERS (bool @var{used_only})
> +This hook should return an rtx to zero all vector registers at function
> +exit.  If @var{used_only} is @code{true}, only used vector registers should
> +be zeroed.  Return @code{NULL} if possible
> +@end deftypefn
> +
> @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
> When optimization is disabled, this hook indicates whether or not
> arguments should be allocated to stack slots.  Normally, GCC allocates
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 3be984b..bee917a 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -3430,6 +3430,10 @@ for a new target instead.
>
> @hook TARGET_FUNCTION_VALUE_REGNO_P
>
> +@hook TARGET_ZERO_CALL_USED_REGNO_P
> +
> +@hook TARGET_ZERO_CALL_USED_REGNO_MODE
> +
> @defmac APPLY_RESULT_SIZE
> Define this macro if @samp{untyped_call} and @samp{untyped_return}
> need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> @@ -8109,6 +8113,10 @@ and the associated definitions of those functions.
>
> @hook TARGET_GET_DRAP_RTX
>
> +@hook TARGET_PRO_EPILOGUE_USE
> +
> +@hook TARGET_ZERO_ALL_VECTOR_REGISTERS
> +
> @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
>
> @hook TARGET_CONST_ANCHOR
> diff --git a/gcc/function.c b/gcc/function.c
> index 9eee9b5..9908530 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
> #include "emit-rtl.h"
> #include "recog.h"
> #include "rtl-error.h"
> +#include "hard-reg-set.h"
> #include "alias.h"
> #include "fold-const.h"
> #include "stor-layout.h"
> @@ -5808,6 +5809,147 @@ make_prologue_seq (void)
>   return seq;
> }
>
> +/* Check whether the hard register REGNO is live at the exit block
> + * of the current routine.  */
> +bool
> +is_live_reg_at_exit (unsigned int regno)
> +{
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> +    {
> +      bitmap live_out = df_get_live_out (e->src);
> +      if (REGNO_REG_SET_P (live_out, regno))
> +       return true;
> +    }
> +
> +  return false;
> +}
> +
> +/* Emit a sequence of insns to zero the call-used-registers for the current
> + * function.  */
> +
> +static void
> +gen_call_used_regs_seq (void)
> +{
> +  if (!targetm.calls.pro_epilogue_use)
> +    return;
> +
> +  bool gpr_only = true;
> +  bool used_only = true;
> +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> +
> +  if (flag_zero_call_used_regs)
> +    if (DECL_ZERO_CALL_USED_REGS (current_function_decl)
> +       == zero_call_used_regs_unset)
> +      zero_call_used_regs_type = flag_zero_call_used_regs;
> +    else
> +      zero_call_used_regs_type
> +       = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> +  else
> +    zero_call_used_regs_type = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> +
> +  /* No need to zero call-used-regs when no user request is present.  */
> +  if (zero_call_used_regs_type <= zero_call_used_regs_skip)
> +    return;
> +
> +  /* No need to zero call-used-regs in main ().  */
> +  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
> +    return;
> +
> +  /* No need to zero call-used-regs if __builtin_eh_return is called
> +     since it isn't a normal function return.  */
> +  if (crtl->calls_eh_return)
> +    return;
> +
> +  /* If gpr_only is true, only zero call-used-registers that are
> +     general-purpose registers; if used_only is true, only zero
> +     call-used-registers that are used in the current function.  */
> +  switch (zero_call_used_regs_type)
> +    {
> +      case zero_call_used_regs_all_gpr:
> +       used_only = false;
> +       break;
> +      case zero_call_used_regs_used:
> +       gpr_only = false;
> +       break;
> +      case zero_call_used_regs_all:
> +       gpr_only = false;
> +       used_only = false;
> +       break;
> +      default:
> +       break;
> +    }
> +
> +  /* An optimization to use a single hard insn to zero all vector registers on
> +     the target that provides such insn.  */
> +  if (!gpr_only
> +      && targetm.calls.zero_all_vector_registers)
> +    {
> +      rtx zero_all_vec_insn
> +       = targetm.calls.zero_all_vector_registers (used_only);
> +      if (zero_all_vec_insn)
> +       {
> +         emit_insn (zero_all_vec_insn);
> +         gpr_only = true;
> +       }
> +    }
> +
> +  /* For each of the hard registers, check to see whether we should zero it if:
> +     1. it is a call-used-registers;
> + and 2. it is not a fixed-registers;
> + and 3. it is not live at the end of the routine;
> + and 4. it is general purpose register if gpr_only is true;
> + and 5. it is used in the routine if used_only is true;
> +   */
> +
> +  /* This array holds the zero rtx with the correponding machine mode.  */
> +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
> +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> +    zero_rtx[i] = NULL_RTX;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    {
> +      if (!this_target_hard_regs->x_call_used_regs[regno])
> +       continue;
> +      if (fixed_regs[regno])
> +       continue;
> +      if (is_live_reg_at_exit (regno))
> +       continue;
> +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
> +       continue;
> +      if (used_only && !df_regs_ever_live_p (regno))
> +       continue;
> +
> +      /* Now we can emit insn to zero this register.  */
> +      rtx reg, tmp;
> +
> +      machine_mode mode
> +       = targetm.calls.zero_call_used_regno_mode (regno,
> +                                                  reg_raw_mode[regno]);
> +      if (mode == VOIDmode)
> +       continue;
> +      if (!have_regs_of_mode[mode])
> +       continue;
> +
> +      reg = gen_rtx_REG (mode, regno);
> +      if (zero_rtx[(int)mode] == NULL_RTX)
> +       {
> +         zero_rtx[(int)mode] = reg;
> +         tmp = gen_rtx_SET (reg, const0_rtx);
> +         emit_insn (tmp);
> +       }
> +      else
> +       emit_move_insn (reg, zero_rtx[(int)mode]);
> +
> +      emit_insn (targetm.calls.pro_epilogue_use (reg));
> +    }
> +
> +  return;
> +}
> +
> +
> /* Return a sequence to be used as the epilogue for the current function,
>    or NULL.  */
>
> @@ -5819,6 +5961,9 @@ make_epilogue_seq (void)
>
>   start_sequence ();
>   emit_note (NOTE_INSN_EPILOGUE_BEG);
> +
> +  gen_call_used_regs_seq ();
> +
>   rtx_insn *seq = targetm.gen_epilogue ();
>   if (seq)
>     emit_jump_insn (seq);
> diff --git a/gcc/function.h b/gcc/function.h
> index d55cbdd..fc36c3e 100644
> --- a/gcc/function.h
> +++ b/gcc/function.h
> @@ -705,4 +705,6 @@ extern const char *current_function_name (void);
>
> extern void used_types_insert (tree);
>
> +extern bool is_live_reg_at_exit (unsigned int);
> +
> #endif  /* GCC_FUNCTION_H */
> diff --git a/gcc/target.def b/gcc/target.def
> index 07059a8..8aab63e 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -5022,6 +5022,26 @@ If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.",
>  default_function_value_regno_p)
>
> DEFHOOK
> +(zero_call_used_regno_p,
> + "A target hook that returns @code{true} if @var{regno} is the number of a\n\
> +call used register.  If @var{general_reg_only_p} is @code{true},\n\
> +@var{regno} must be the number of a hard general register.\n\
> +\n\
> +If this hook is not defined, then default_zero_call_used_regno_p will be used.",
> + bool, (const unsigned int regno, bool general_reg_only_p),
> + default_zero_call_used_regno_p)
> +
> +DEFHOOK
> +(zero_call_used_regno_mode,
> + "A target hook that returns a mode of suitable to zero the register for the\n\
> +call used register @var{regno} in @var{mode}.\n\
> +\n\
> +If this hook is not defined, then default_zero_call_used_regno_mode will be\n\
> +used.",
> + machine_mode, (const unsigned int regno, machine_mode mode),
> + default_zero_call_used_regno_mode)
> +
> +DEFHOOK
> (fntype_abi,
>  "Return the ABI used by a function with type @var{type}; see the\n\
> definition of @code{predefined_function_abi} for details of the ABI\n\
> @@ -5068,6 +5088,19 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
> is needed.",
>  rtx, (void), NULL)
>
> +DEFHOOK
> +(pro_epilogue_use,
> + "This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to\n\
> +prevent deleting register setting instructions in proprologue and epilogue.",
> + rtx, (rtx reg), NULL)
> +
> +DEFHOOK
> +(zero_all_vector_registers,
> + "This hook should return an rtx to zero all vector registers at function\n\
> +exit.  If @var{used_only} is @code{true}, only used vector registers should\n\
> +be zeroed.  Return @code{NULL} if possible",
> + rtx, (bool used_only), NULL)
> +
> /* Return true if all function parameters should be spilled to the
>    stack.  */
> DEFHOOK
> diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> index 0113c7b..ed02173 100644
> --- a/gcc/targhooks.c
> +++ b/gcc/targhooks.c
> @@ -987,6 +987,23 @@ default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
> #endif
> }
>
> +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_P.  */
> +
> +bool
> +default_zero_call_used_regno_p (const unsigned int,
> +                               bool)
> +{
> +  return false;
> +}
> +
> +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> +
> +machine_mode
> +default_zero_call_used_regno_mode (const unsigned int, machine_mode mode)
> +{
> +  return mode;
> +}
> +
> rtx
> default_internal_arg_pointer (void)
> {
> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> index b572a36..370df19 100644
> --- a/gcc/targhooks.h
> +++ b/gcc/targhooks.h
> @@ -162,6 +162,9 @@ extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
> extern rtx default_function_value (const_tree, const_tree, bool);
> extern rtx default_libcall_value (machine_mode, const_rtx);
> extern bool default_function_value_regno_p (const unsigned int);
> +extern bool default_zero_call_used_regno_p (const unsigned int, bool);
> +extern machine_mode default_zero_call_used_regno_mode (const unsigned int,
> +                                                      machine_mode);
> extern rtx default_internal_arg_pointer (void);
> extern rtx default_static_chain (const_tree, bool);
> extern void default_trampoline_init (rtx, tree, rtx);
> diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> new file mode 100644
> index 0000000..3c2ac72
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> @@ -0,0 +1,3 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> +/* { dg-error "'-fzero-call-used-regs=' is not supported for this target" "" { target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> new file mode 100644
> index 0000000..acf48c4
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> @@ -0,0 +1,4 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2" } */
> +
> +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr"))); /* { dg-warning " attribute directive ignored" "" {target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> new file mode 100644
> index 0000000..9f61dc4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> new file mode 100644
> index 0000000..09048e5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> new file mode 100644
> index 0000000..4862688
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> +
> +struct S { int i; };
> +__attribute__((const, noinline, noclone))
> +struct S foo (int x)
> +{
> +  struct S s;
> +  s.i = x;
> +  return s;
> +}
> +
> +int a[2048], b[2048], c[2048], d[2048];
> +struct S e[2048];
> +
> +__attribute__((noinline, noclone)) void
> +bar (void)
> +{
> +  int i;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      e[i] = foo (i);
> +      a[i+2] = a[i] + a[i+1];
> +      b[10] = b[10] + i;
> +      c[i] = c[2047 - i];
> +      d[i] = d[i + 1];
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  bar ();
> +  for (i = 0; i < 1024; i++)
> +    if (e[i].i != i)
> +      __builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> new file mode 100644
> index 0000000..500251b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +struct S { int i; };
> +__attribute__((const, noinline, noclone))
> +struct S foo (int x)
> +{
> +  struct S s;
> +  s.i = x;
> +  return s;
> +}
> +
> +int a[2048], b[2048], c[2048], d[2048];
> +struct S e[2048];
> +
> +__attribute__((noinline, noclone)) void
> +bar (void)
> +{
> +  int i;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      e[i] = foo (i);
> +      a[i+2] = a[i] + a[i+1];
> +      b[10] = b[10] + i;
> +      c[i] = c[2047 - i];
> +      d[i] = d[i + 1];
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  bar ();
> +  for (i = 0; i < 1024; i++)
> +    if (e[i].i != i)
> +      __builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> new file mode 100644
> index 0000000..8b058e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> new file mode 100644
> index 0000000..d4eaaf7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> new file mode 100644
> index 0000000..dd3bb90
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> new file mode 100644
> index 0000000..e2274f6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> new file mode 100644
> index 0000000..7f5d153
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> new file mode 100644
> index 0000000..fe13d2b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> new file mode 100644
> index 0000000..205a532
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> new file mode 100644
> index 0000000..e046684
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> new file mode 100644
> index 0000000..4be8ff6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> new file mode 100644
> index 0000000..0eb34e0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
> +
> +__attribute__ ((zero_call_used_regs("used")))
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> new file mode 100644
> index 0000000..cbb63a4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> new file mode 100644
> index 0000000..7573197
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> new file mode 100644
> index 0000000..de71223
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> new file mode 100644
> index 0000000..ccfa441
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> new file mode 100644
> index 0000000..6b46ca3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +__attribute__ ((zero_call_used_regs("all-gpr")))
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> new file mode 100644
> index 0000000..0680f38
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> new file mode 100644
> index 0000000..534defa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> new file mode 100644
> index 0000000..477bb19
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> new file mode 100644
> index 0000000..a305a60
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 95eea63..01a1f24 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -1464,6 +1464,15 @@ process_options (void)
>         }
>     }
>
> +  if (flag_zero_call_used_regs != zero_call_used_regs_skip
> +      && !targetm.calls.pro_epilogue_use)
> +    {
> +      error_at (UNKNOWN_LOCATION,
> +               "%<-fzero-call-used-regs=%> is not supported for this "
> +               "target");
> +      flag_zero_call_used_regs = zero_call_used_regs_skip;
> +    }
> +
>   /* One region RA really helps to decrease the code size.  */
>   if (flag_ira_region == IRA_REGION_AUTODETECT)
>     flag_ira_region
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index 8c5a2e3..71badbd 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1825,7 +1825,11 @@ struct GTY(()) tree_decl_with_vis {
>  unsigned final : 1;
>  /* Belong to FUNCTION_DECL exclusively.  */
>  unsigned regdecl_flag : 1;
> - /* 14 unused bits. */
> +
> + /* How to clear call-used registers upon function return.  */
> + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
> +
> + /* 11 unused bits.  */
> };
>
> struct GTY(()) tree_var_decl {
> diff --git a/gcc/tree.h b/gcc/tree.h
> index cf546ed..d378a88 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -2925,6 +2925,11 @@ extern void decl_value_expr_insert (tree, tree);
> #define DECL_VISIBILITY(NODE) \
>   (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.visibility)
>
> +/* Value of the function decl's type of zeroing the call used
> +   registers upon return from function.  */
> +#define DECL_ZERO_CALL_USED_REGS(NODE) \
> +  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.zero_call_used_regs_type)
> +
> /* Nonzero means that the decl (or an enclosing scope) had its
>    visibility specified rather than being inferred.  */
> #define DECL_VISIBILITY_SPECIFIED(NODE) \
> --
> 1.9.1

+1. Tested on x86

^ permalink raw reply	[flat|nested] 188+ messages in thread

* PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-07-14 14:45       ` [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all] Qing Zhao
  2020-07-16 13:17         ` Victor Rodriguez
@ 2020-07-28 20:05         ` Qing Zhao
  2020-07-31 17:57           ` Uros Bizjak
  2020-08-07 13:20           ` Alexandre Oliva
  1 sibling, 2 replies; 188+ messages in thread
From: Qing Zhao @ 2020-07-28 20:05 UTC (permalink / raw)
  To: Richard Biener, Uros Bizjak
  Cc: H.J. Lu, Jakub Jelinek, gcc-patches, Kees Cook, Rodriguez Bahena, Victor


Richard and Uros,

Could you please review the change that H.J and I rewrote based on your comments in the previous round of discussion?

This patch is a nice security enhancement for GCC that has been requested by security people for quite some time.  

Thanks a lot for your time.

Qing

> On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> Hi, Gcc team,
> 
> This patch is a follow-up on the previous patch and corresponding discussion:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>
> 
> From the previous round of discussion, the major issues raised were:
> 
> A. should be rewritten by using regsets infrastructure.  
> B. Put the patch into middle-end instead of x86 backend. 
> 
> This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
> 
> 1. Change the names of the option and attribute from 
> -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> to:
> -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”) 
> Add the new option and  new attribute in general. 
> 2. The main code generation part is moved from i386 backend to middle-end;
> 3. Add 4 target-hooks;
> 4. Implement these 4 target-hooks on i386 backend. 
> 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
> 
> The patch is as following:
> 
> [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> command-line option and
> zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> 
>  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> 
>  Don't zero call-used registers upon function return.
> 
>  2. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")
> 
>  Zero used call-used general purpose registers upon function return.
> 
>  3. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")
> 
>  Zero all call-used general purpose registers upon function return.
> 
>  4. -fzero-call-used-regs=used and zero_call_used_regs("used")
> 
>  Zero used call-used registers upon function return.
> 
>  5. -fzero-call-used-regs=all and zero_call_used_regs("all")
> 
>  Zero all call-used registers upon function return.
> 
> The feature is implemented in middle-end. But currently is only valid on X86.
> 
> Tested on x86-64 and aarch64 with bootstrapping GCC trunk, making
> -fzero-call-used-regs=used-gpr, -fzero-call-used-regs=all-gpr
> -fzero-call-used-regs=used, and -fzero-call-used-regs=all enabled
> by default on x86-64.
> 
> Please take a look and let me know any more comment?
> 
> thanks.
> 
> Qing
> 
> 
> ====================================
> 
> gcc/ChangeLog:
> 
> 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
> 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
> 
> 	* common.opt: Add new option -fzero-call-used-regs.
> 	* config/i386/i386.c (ix86_zero_call_used_regno_p): New function.
> 	(ix86_zero_call_used_regno_mode): Likewise.
> 	(ix86_zero_all_vector_registers): Likewise.
> 	(ix86_expand_prologue): Replace gen_prologue_use with
> 	gen_pro_epilogue_use.
> 	(TARGET_ZERO_CALL_USED_REGNO_P): Define.
> 	(TARGET_ZERO_CALL_USED_REGNO_MODE): Define.
> 	(TARGET_PRO_EPILOGUE_USE): Define.
> 	(TARGET_ZERO_ALL_VECTOR_REGISTERS): Define.
> 	* config/i386/i386.md: Replace UNSPECV_PROLOGUE_USE
> 	with UNSPECV_PRO_EPILOGUE_USE.
> 	* coretypes.h (enum zero_call_used_regs): New type.
> 	* doc/extend.texi: Document the new zero_call_used_regs attribute.
> 	* doc/invoke.texi: Document the new -fzero-call-used-regs option.
> 	* doc/tm.texi: Regenerate.
> 	* doc/tm.texi.in (TARGET_ZERO_CALL_USED_REGNO_P): New hook.
> 	(TARGET_ZERO_CALL_USED_REGNO_MODE): Likewise.
> 	(TARGET_PRO_EPILOGUE_USE): Likewise.
> 	(TARGET_ZERO_ALL_VECTOR_REGISTERS): Likewise.
> 	* function.c (is_live_reg_at_exit): New function.
> 	(gen_call_used_regs_seq): Likewise.
> 	(make_epilogue_seq): Call gen_call_used_regs_seq.
> 	* function.h (is_live_reg_at_exit): Declare.
> 	* target.def (zero_call_used_regno_p): New hook.
> 	(zero_call_used_regno_mode): Likewise.
> 	(pro_epilogue_use): Likewise.
> 	(zero_all_vector_registers): Likewise.
> 	* targhooks.c (default_zero_call_used_regno_p): New function.
> 	(default_zero_call_used_regno_mode): Likewise.
> 	* targhooks.h (default_zero_call_used_regno_p): Declare.
> 	(default_zero_call_used_regno_mode): Declare.
> 	* toplev.c (process_options): Issue errors when -fzero-call-used-regs
> 	is used on targets that do not support it.
> 	* tree-core.h (struct tree_decl_with_vis): New field 
> 	zero_call_used_regs_type.
> 	* tree.h (DECL_ZERO_CALL_USED_REGS): New macro.
> 
> gcc/c-family/ChangeLog:
> 
> 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
> 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
> 
> 	* c-attribs.c (c_common_attribute_table): Add new attribute
> 	zero_call_used_regs.
> 	(handle_zero_call_used_regs_attribute): New function.
> 
> gcc/c/ChangeLog:
> 
> 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
> 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
> 
> 	* c-decl.c (merge_decls): Merge zero_call_used_regs_type.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
> 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
> 
> 	* c-c++-common/zero-scratch-regs-1.c: New test.
> 	* c-c++-common/zero-scratch-regs-2.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-1.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-10.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-11.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-12.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-13.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-14.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-15.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-16.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-17.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-18.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-19.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-2.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-20.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-21.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-22.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-23.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-3.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-4.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-5.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-6.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-7.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-8.c: Likewise.
> 	* gcc.target/i386/zero-scratch-regs-9.c: Likewise.
> 
> ---
> gcc/c-family/c-attribs.c                           |  68 ++++++++++
> gcc/c/c-decl.c                                     |   4 +
> gcc/common.opt                                     |  23 ++++
> gcc/config/i386/i386.c                             |  58 ++++++++-
> gcc/config/i386/i386.md                            |   6 +-
> gcc/coretypes.h                                    |  10 ++
> gcc/doc/extend.texi                                |  11 ++
> gcc/doc/invoke.texi                                |  13 +-
> gcc/doc/tm.texi                                    |  27 ++++
> gcc/doc/tm.texi.in                                 |   8 ++
> gcc/function.c                                     | 145 +++++++++++++++++++++
> gcc/function.h                                     |   2 +
> gcc/target.def                                     |  33 +++++
> gcc/targhooks.c                                    |  17 +++
> gcc/targhooks.h                                    |   3 +
> gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |   3 +
> gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |   4 +
> .../gcc.target/i386/zero-scratch-regs-1.c          |  12 ++
> .../gcc.target/i386/zero-scratch-regs-10.c         |  21 +++
> .../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++++
> .../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++++
> .../gcc.target/i386/zero-scratch-regs-13.c         |  21 +++
> .../gcc.target/i386/zero-scratch-regs-14.c         |  19 +++
> .../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
> .../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
> .../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
> .../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
> .../gcc.target/i386/zero-scratch-regs-19.c         |  12 ++
> .../gcc.target/i386/zero-scratch-regs-2.c          |  19 +++
> .../gcc.target/i386/zero-scratch-regs-20.c         |  23 ++++
> .../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
> .../gcc.target/i386/zero-scratch-regs-22.c         |  19 +++
> .../gcc.target/i386/zero-scratch-regs-23.c         |  19 +++
> .../gcc.target/i386/zero-scratch-regs-3.c          |  12 ++
> .../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
> .../gcc.target/i386/zero-scratch-regs-5.c          |  20 +++
> .../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
> .../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
> .../gcc.target/i386/zero-scratch-regs-8.c          |  19 +++
> .../gcc.target/i386/zero-scratch-regs-9.c          |  15 +++
> gcc/toplev.c                                       |   9 ++
> gcc/tree-core.h                                    |   6 +-
> gcc/tree.h                                         |   5 +
> 43 files changed, 866 insertions(+), 7 deletions(-)
> create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> 
> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> index 3721483..cc93d6f 100644
> --- a/gcc/c-family/c-attribs.c
> +++ b/gcc/c-family/c-attribs.c
> @@ -136,6 +136,8 @@ static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
> static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> static tree ignore_attribute (tree *, tree, tree, int, bool *);
> static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
> +static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
> +						 bool *);
> static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
> static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
> static tree handle_returns_nonnull_attribute (tree *, tree, tree, int, bool *);
> @@ -434,6 +436,9 @@ const struct attribute_spec c_common_attribute_table[] =
> 			      ignore_attribute, NULL },
>  { "no_split_stack",	      0, 0, true,  false, false, false,
> 			      handle_no_split_stack_attribute, NULL },
> +  { "zero_call_used_regs",    1, 1, true, false, false, false,
> +			      handle_zero_call_used_regs_attribute, NULL },
> +
>  /* For internal use (marking of builtins and runtime functions) only.
>     The name contains space to prevent its usage in source code.  */
>  { "fn spec",		      1, 1, false, true, true, false,
> @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, tree name,
>  return NULL_TREE;
> }
> 
> +/* Handle a "zero_call_used_regs" attribute; arguments as in
> +   struct attribute_spec.handler.  */
> +
> +static tree
> +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> +				      int ARG_UNUSED (flags),
> +				      bool *no_add_attris)
> +{
> +  tree decl = *node;
> +  tree id = TREE_VALUE (args);
> +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> +
> +  if (TREE_CODE (decl) != FUNCTION_DECL)
> +    {
> +      error_at (DECL_SOURCE_LOCATION (decl),
> +		"%qE attribute applies only to functions", name);
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +  else if (DECL_INITIAL (decl))
> +    {
> +      error_at (DECL_SOURCE_LOCATION (decl),
> +		"cannot set %qE attribute after definition", name);
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  if (TREE_CODE (id) != STRING_CST)
> +    {
> +      error ("attribute %qE arguments not a string", name);
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  if (!targetm.calls.pro_epilogue_use)
> +    {
> +      warning (OPT_Wattributes, "%qE attribute directive ignored", name);
> +      return NULL_TREE;
> +    }
> +
> +  if (strcmp (TREE_STRING_POINTER (id), "skip") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_skip;
> +  else if (strcmp (TREE_STRING_POINTER (id), "used-gpr") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_used_gpr;
> +  else if (strcmp (TREE_STRING_POINTER (id), "all-gpr") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_all_gpr;
> +  else if (strcmp (TREE_STRING_POINTER (id), "used") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_used;
> +  else if (strcmp (TREE_STRING_POINTER (id), "all") == 0)
> +    zero_call_used_regs_type = zero_call_used_regs_all;
> +  else
> +    {
> +      error ("attribute %qE argument must be one of %qs, %qs, %qs, %qs, or %qs",
> + 	     name, "skip", "used-gpr", "all-gpr", "used", "all");
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  DECL_ZERO_CALL_USED_REGS (decl) = zero_call_used_regs_type;
> +
> +  return NULL_TREE;
> +}
> +
> /* Handle a "returns_nonnull" attribute; arguments as in
>   struct attribute_spec.handler.  */
> 
> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> index 81bd2ee..ded1880 100644
> --- a/gcc/c/c-decl.c
> +++ b/gcc/c/c-decl.c
> @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
> 	  DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
> 	}
> 
> +      /* Merge the zero_call_used_regs_type information.  */
> +      if (TREE_CODE (newdecl) == FUNCTION_DECL)
> +	DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS (olddecl);
> +
>      /* Merge the storage class information.  */
>      merge_weak (newdecl, olddecl);
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index df8af36..19900f9 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -3083,6 +3083,29 @@ fzero-initialized-in-bss
> Common Report Var(flag_zero_initialized_in_bss) Init(1)
> Put zero initialized data in the bss section.
> 
> +fzero-call-used-regs=
> +Common Report RejectNegative Joined Enum(zero_call_used_regs) Var(flag_zero_call_used_regs) Init(zero_call_used_regs_skip)
> +Clear call-used registers upon function return.
> +
> +Enum
> +Name(zero_call_used_regs) Type(enum zero_call_used_regs)
> +Known choices of clearing call-used registers upon function return (for use with the -fzero-call-used-regs= option):
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(skip) Value(zero_call_used_regs_skip)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(used-gpr) Value(zero_call_used_regs_used_gpr)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(all-gpr) Value(zero_call_used_regs_all_gpr)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(used) Value(zero_call_used_regs_used)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(all) Value(zero_call_used_regs_all)
> +
> g
> Common Driver RejectNegative JoinedOrMissing
> Generate debug information in default format.
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 5c373c0..fd1aa9c 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -3551,6 +3551,48 @@ ix86_function_value_regno_p (const unsigned int regno)
>  return false;
> }
> 
> +/* TARGET_ZERO_CALL_USED_REGNO_P.  */
> +
> +static bool
> +ix86_zero_call_used_regno_p (const unsigned int regno,
> +			     bool gpr_only)
> +{
> +  return GENERAL_REGNO_P (regno) || (!gpr_only && SSE_REGNO_P (regno));
> +}
> +
> +/* TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> +
> +static machine_mode
> +ix86_zero_call_used_regno_mode (const unsigned int regno, machine_mode)
> +{
> +  /* NB: We only need to zero the lower 32 bits for integer registers
> +     and the lower 128 bits for vector registers since destination are
> +     zero-extended to the full register width.  */
> +  return GENERAL_REGNO_P (regno) ? SImode : V4SFmode;
> +}
> +
> +/* TARGET_ZERO_ALL_VECTOR_REGISTERS.  */
> +
> +static rtx
> +ix86_zero_all_vector_registers (bool used_only)
> +{
> +  if (!TARGET_AVX)
> +    return NULL;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> +	 || (TARGET_64BIT
> +	     && (REX_SSE_REGNO_P (regno)
> +		 || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> +	&& (!this_target_hard_regs->x_call_used_regs[regno]
> +	    || fixed_regs[regno]
> +	    || is_live_reg_at_exit (regno)
> +	    || (used_only && !df_regs_ever_live_p (regno))))
> +      return NULL;
> +
> +  return gen_avx_vzeroall ();
> +}
> +
> /* Define how to find the value returned by a function.
>   VALTYPE is the data type of the value (as a tree).
>   If the precise function being called is known, FUNC is its FUNCTION_DECL;
> @@ -8513,7 +8555,7 @@ ix86_expand_prologue (void)
>      insn = emit_insn (gen_set_got (pic));
>      RTX_FRAME_RELATED_P (insn) = 1;
>      add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
> -      emit_insn (gen_prologue_use (pic));
> +      emit_insn (gen_pro_epilogue_use (pic));
>      /* Deleting already emmitted SET_GOT if exist and allocated to
> 	 REAL_PIC_OFFSET_TABLE_REGNUM.  */
>      ix86_elim_entry_set_got (pic);
> @@ -8542,7 +8584,7 @@ ix86_expand_prologue (void)
>     Further, prevent alloca modifications to the stack pointer from being
>     combined with prologue modifications.  */
>  if (TARGET_SEH)
> -    emit_insn (gen_prologue_use (stack_pointer_rtx));
> +    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));
> }
> 
> /* Emit code to restore REG using a POP insn.  */
> @@ -23319,6 +23361,18 @@ ix86_run_selftests (void)
> #undef TARGET_FUNCTION_VALUE_REGNO_P
> #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p
> 
> +#undef TARGET_ZERO_CALL_USED_REGNO_P
> +#define TARGET_ZERO_CALL_USED_REGNO_P ix86_zero_call_used_regno_p
> +
> +#undef TARGET_ZERO_CALL_USED_REGNO_MODE
> +#define TARGET_ZERO_CALL_USED_REGNO_MODE ix86_zero_call_used_regno_mode
> +
> +#undef TARGET_PRO_EPILOGUE_USE
> +#define TARGET_PRO_EPILOGUE_USE gen_pro_epilogue_use
> +
> +#undef TARGET_ZERO_ALL_VECTOR_REGISTERS
> +#define TARGET_ZERO_ALL_VECTOR_REGISTERS ix86_zero_all_vector_registers
> +
> #undef TARGET_PROMOTE_FUNCTION_MODE
> #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
> 
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index d0ecd9e..e7df59f 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -194,7 +194,7 @@
>  UNSPECV_STACK_PROBE
>  UNSPECV_PROBE_STACK_RANGE
>  UNSPECV_ALIGN
> -  UNSPECV_PROLOGUE_USE
> +  UNSPECV_PRO_EPILOGUE_USE
>  UNSPECV_SPLIT_STACK_RETURN
>  UNSPECV_CLD
>  UNSPECV_NOPS
> @@ -13525,8 +13525,8 @@
> 
> ;; As USE insns aren't meaningful after reload, this is used instead
> ;; to prevent deleting instructions setting registers for PIC code
> -(define_insn "prologue_use"
> -  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]
> +(define_insn "pro_epilogue_use"
> +  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
>  ""
>  ""
>  [(set_attr "length" "0")])
> diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> index 6b6cfcd..e56d6ec 100644
> --- a/gcc/coretypes.h
> +++ b/gcc/coretypes.h
> @@ -418,6 +418,16 @@ enum symbol_visibility
>  VISIBILITY_INTERNAL
> };
> 
> +/* Zero call-used registers type.  */
> +enum zero_call_used_regs {
> +  zero_call_used_regs_unset = 0,
> +  zero_call_used_regs_skip,
> +  zero_call_used_regs_used_gpr,
> +  zero_call_used_regs_all_gpr,
> +  zero_call_used_regs_used,
> +  zero_call_used_regs_all
> +};
> +
> /* enums used by the targetm.excess_precision hook.  */
> 
> enum flt_eval_method
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index c800b74..b32c55f 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -3984,6 +3984,17 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
> A declaration to which @code{weakref} is attached and that is associated
> with a named @code{target} must be @code{static}.
> 
> +@item zero_call_used_regs ("@var{choice}")
> +@cindex @code{zero_call_used_regs} function attribute
> +The @code{zero_call_used_regs} attribute causes the compiler to zero
> +call-used registers at function return according to @var{choice}.
> +@samp{skip} doesn't zero call-used registers. @samp{used-gpr} zeros
> +call-used general purpose registers which are used in funciton.
> +@samp{all-gpr} zeros all call-used general purpose registers.
> +@samp{used} zeros call-used registers which are used in function.
> +@samp{all} zeros all call-used registers.  The default for the
> +attribute is controlled by @option{-fzero-call-used-regs}.
> +
> @end table
> 
> @c This is the end of the target-independent attribute table
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 09bcc5b..da02686 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -542,7 +542,7 @@ Objective-C and Objective-C++ Dialects}.
> -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
> -funsafe-math-optimizations  -funswitch-loops @gol
> -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
> --fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
> +-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
> --param @var{name}=@var{value}
> -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}
> 
> @@ -12273,6 +12273,17 @@ int foo (void)
> 
> Not all targets support this option.
> 
> +@item -fzero-call-used-regs=@var{choice}
> +@opindex fzero-call-used-regs
> +Zero call-used registers at function return according to
> +@var{choice}.  @samp{skip}, which is the default, doesn't zero
> +call-used registers.  @samp{used-gpr} zeros call-used general purpose
> +registers which are used in function.  @samp{all-gpr} zeros all
> +call-used registers.  @samp{used} zeros call-used registers which
> +are used in function.  @samp{all} zeros all call-used registers.  You
> +can control this behavior for a specific function by using the function
> +attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
> +
> @item --param @var{name}=@var{value}
> @opindex param
> In some places, GCC uses various constants to control the amount of
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 6e7d9dc..43dddd3 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -4571,6 +4571,22 @@ should recognize only the caller's register numbers.
> If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.
> @end deftypefn
> 
> +@deftypefn {Target Hook} bool TARGET_ZERO_CALL_USED_REGNO_P (const unsigned int @var{regno}, bool @var{general_reg_only_p})
> +A target hook that returns @code{true} if @var{regno} is the number of a
> +call used register.  If @var{general_reg_only_p} is @code{true},
> +@var{regno} must be the number of a hard general register.
> +
> +If this hook is not defined, then default_zero_call_used_regno_p will be used.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} machine_mode TARGET_ZERO_CALL_USED_REGNO_MODE (const unsigned int @var{regno}, machine_mode @var{mode})
> +A target hook that returns a mode of suitable to zero the register for the
> +call used register @var{regno} in @var{mode}.
> +
> +If this hook is not defined, then default_zero_call_used_regno_mode will be
> +used.
> +@end deftypefn
> +
> @defmac APPLY_RESULT_SIZE
> Define this macro if @samp{untyped_call} and @samp{untyped_return}
> need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> @@ -12043,6 +12059,17 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
> is needed.
> @end deftypefn
> 
> +@deftypefn {Target Hook} rtx TARGET_PRO_EPILOGUE_USE (rtx @var{reg})
> +This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
> +prevent deleting register setting instructions in proprologue and epilogue.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} rtx TARGET_ZERO_ALL_VECTOR_REGISTERS (bool @var{used_only})
> +This hook should return an rtx to zero all vector registers at function
> +exit.  If @var{used_only} is @code{true}, only used vector registers should
> +be zeroed.  Return @code{NULL} if possible
> +@end deftypefn
> +
> @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
> When optimization is disabled, this hook indicates whether or not
> arguments should be allocated to stack slots.  Normally, GCC allocates
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 3be984b..bee917a 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -3430,6 +3430,10 @@ for a new target instead.
> 
> @hook TARGET_FUNCTION_VALUE_REGNO_P
> 
> +@hook TARGET_ZERO_CALL_USED_REGNO_P
> +
> +@hook TARGET_ZERO_CALL_USED_REGNO_MODE
> +
> @defmac APPLY_RESULT_SIZE
> Define this macro if @samp{untyped_call} and @samp{untyped_return}
> need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> @@ -8109,6 +8113,10 @@ and the associated definitions of those functions.
> 
> @hook TARGET_GET_DRAP_RTX
> 
> +@hook TARGET_PRO_EPILOGUE_USE
> +
> +@hook TARGET_ZERO_ALL_VECTOR_REGISTERS
> +
> @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
> 
> @hook TARGET_CONST_ANCHOR
> diff --git a/gcc/function.c b/gcc/function.c
> index 9eee9b5..9908530 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
> #include "emit-rtl.h"
> #include "recog.h"
> #include "rtl-error.h"
> +#include "hard-reg-set.h"
> #include "alias.h"
> #include "fold-const.h"
> #include "stor-layout.h"
> @@ -5808,6 +5809,147 @@ make_prologue_seq (void)
>  return seq;
> }
> 
> +/* Check whether the hard register REGNO is live at the exit block
> + * of the current routine.  */
> +bool
> +is_live_reg_at_exit (unsigned int regno)
> +{
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> +    {
> +      bitmap live_out = df_get_live_out (e->src);
> +      if (REGNO_REG_SET_P (live_out, regno))
> +	return true;
> +    }
> +
> +  return false;
> +}
> +
> +/* Emit a sequence of insns to zero the call-used-registers for the current
> + * function.  */
> +
> +static void
> +gen_call_used_regs_seq (void)
> +{
> +  if (!targetm.calls.pro_epilogue_use)
> +    return;
> +
> +  bool gpr_only = true;
> +  bool used_only = true;
> +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> +
> +  if (flag_zero_call_used_regs)
> +    if (DECL_ZERO_CALL_USED_REGS (current_function_decl)
> +	== zero_call_used_regs_unset)
> +      zero_call_used_regs_type = flag_zero_call_used_regs;
> +    else
> +      zero_call_used_regs_type
> +	= DECL_ZERO_CALL_USED_REGS (current_function_decl);
> +  else
> +    zero_call_used_regs_type = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> +
> +  /* No need to zero call-used-regs when no user request is present.  */
> +  if (zero_call_used_regs_type <= zero_call_used_regs_skip)
> +    return;
> +
> +  /* No need to zero call-used-regs in main ().  */
> +  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
> +    return;
> +
> +  /* No need to zero call-used-regs if __builtin_eh_return is called
> +     since it isn't a normal function return.  */
> +  if (crtl->calls_eh_return)
> +    return;
> +
> +  /* If gpr_only is true, only zero call-used-registers that are
> +     general-purpose registers; if used_only is true, only zero
> +     call-used-registers that are used in the current function.  */
> +  switch (zero_call_used_regs_type)
> +    {
> +      case zero_call_used_regs_all_gpr:
> +	used_only = false;
> +	break;
> +      case zero_call_used_regs_used:
> +	gpr_only = false;
> +	break;
> +      case zero_call_used_regs_all:
> +	gpr_only = false;
> +	used_only = false;
> +	break;
> +      default:
> +	break;
> +    }
> +
> +  /* An optimization to use a single hard insn to zero all vector registers on
> +     the target that provides such insn.  */
> +  if (!gpr_only
> +      && targetm.calls.zero_all_vector_registers)
> +    {
> +      rtx zero_all_vec_insn
> +	= targetm.calls.zero_all_vector_registers (used_only);
> +      if (zero_all_vec_insn)
> +	{
> +	  emit_insn (zero_all_vec_insn);
> +	  gpr_only = true;
> +	}
> +    }
> +
> +  /* For each of the hard registers, check to see whether we should zero it if:
> +     1. it is a call-used-registers;
> + and 2. it is not a fixed-registers;
> + and 3. it is not live at the end of the routine;
> + and 4. it is general purpose register if gpr_only is true;
> + and 5. it is used in the routine if used_only is true;
> +   */
> +
> +  /* This array holds the zero rtx with the correponding machine mode.  */
> +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
> +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> +    zero_rtx[i] = NULL_RTX;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    {
> +      if (!this_target_hard_regs->x_call_used_regs[regno])
> +	continue;
> +      if (fixed_regs[regno])
> +	continue;
> +      if (is_live_reg_at_exit (regno))
> +	continue;
> +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
> +	continue;
> +      if (used_only && !df_regs_ever_live_p (regno))
> +	continue;
> +
> +      /* Now we can emit insn to zero this register.  */
> +      rtx reg, tmp;
> +
> +      machine_mode mode
> +	= targetm.calls.zero_call_used_regno_mode (regno,
> +						   reg_raw_mode[regno]);
> +      if (mode == VOIDmode)
> +	continue;
> +      if (!have_regs_of_mode[mode])
> +	continue;
> +
> +      reg = gen_rtx_REG (mode, regno);
> +      if (zero_rtx[(int)mode] == NULL_RTX)
> +	{
> +	  zero_rtx[(int)mode] = reg;
> +	  tmp = gen_rtx_SET (reg, const0_rtx);
> +	  emit_insn (tmp);
> +	}
> +      else
> +	emit_move_insn (reg, zero_rtx[(int)mode]);
> +
> +      emit_insn (targetm.calls.pro_epilogue_use (reg));
> +    }
> +
> +  return;
> +}
> +
> +
> /* Return a sequence to be used as the epilogue for the current function,
>   or NULL.  */
> 
> @@ -5819,6 +5961,9 @@ make_epilogue_seq (void)
> 
>  start_sequence ();
>  emit_note (NOTE_INSN_EPILOGUE_BEG);
> +
> +  gen_call_used_regs_seq ();
> +
>  rtx_insn *seq = targetm.gen_epilogue ();
>  if (seq)
>    emit_jump_insn (seq);
> diff --git a/gcc/function.h b/gcc/function.h
> index d55cbdd..fc36c3e 100644
> --- a/gcc/function.h
> +++ b/gcc/function.h
> @@ -705,4 +705,6 @@ extern const char *current_function_name (void);
> 
> extern void used_types_insert (tree);
> 
> +extern bool is_live_reg_at_exit (unsigned int);
> +
> #endif  /* GCC_FUNCTION_H */
> diff --git a/gcc/target.def b/gcc/target.def
> index 07059a8..8aab63e 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -5022,6 +5022,26 @@ If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.",
> default_function_value_regno_p)
> 
> DEFHOOK
> +(zero_call_used_regno_p,
> + "A target hook that returns @code{true} if @var{regno} is the number of a\n\
> +call used register.  If @var{general_reg_only_p} is @code{true},\n\
> +@var{regno} must be the number of a hard general register.\n\
> +\n\
> +If this hook is not defined, then default_zero_call_used_regno_p will be used.",
> + bool, (const unsigned int regno, bool general_reg_only_p),
> + default_zero_call_used_regno_p)
> +
> +DEFHOOK
> +(zero_call_used_regno_mode,
> + "A target hook that returns a mode of suitable to zero the register for the\n\
> +call used register @var{regno} in @var{mode}.\n\
> +\n\
> +If this hook is not defined, then default_zero_call_used_regno_mode will be\n\
> +used.",
> + machine_mode, (const unsigned int regno, machine_mode mode),
> + default_zero_call_used_regno_mode)
> +
> +DEFHOOK
> (fntype_abi,
> "Return the ABI used by a function with type @var{type}; see the\n\
> definition of @code{predefined_function_abi} for details of the ABI\n\
> @@ -5068,6 +5088,19 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
> is needed.",
> rtx, (void), NULL)
> 
> +DEFHOOK
> +(pro_epilogue_use,
> + "This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to\n\
> +prevent deleting register setting instructions in proprologue and epilogue.",
> + rtx, (rtx reg), NULL)
> +
> +DEFHOOK
> +(zero_all_vector_registers,
> + "This hook should return an rtx to zero all vector registers at function\n\
> +exit.  If @var{used_only} is @code{true}, only used vector registers should\n\
> +be zeroed.  Return @code{NULL} if possible",
> + rtx, (bool used_only), NULL)
> +
> /* Return true if all function parameters should be spilled to the
>   stack.  */
> DEFHOOK
> diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> index 0113c7b..ed02173 100644
> --- a/gcc/targhooks.c
> +++ b/gcc/targhooks.c
> @@ -987,6 +987,23 @@ default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
> #endif
> }
> 
> +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_P.  */
> +
> +bool
> +default_zero_call_used_regno_p (const unsigned int,
> +				bool)
> +{
> +  return false;
> +}
> +
> +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> +
> +machine_mode
> +default_zero_call_used_regno_mode (const unsigned int, machine_mode mode)
> +{
> +  return mode;
> +}
> +
> rtx
> default_internal_arg_pointer (void)
> {
> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> index b572a36..370df19 100644
> --- a/gcc/targhooks.h
> +++ b/gcc/targhooks.h
> @@ -162,6 +162,9 @@ extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
> extern rtx default_function_value (const_tree, const_tree, bool);
> extern rtx default_libcall_value (machine_mode, const_rtx);
> extern bool default_function_value_regno_p (const unsigned int);
> +extern bool default_zero_call_used_regno_p (const unsigned int, bool);
> +extern machine_mode default_zero_call_used_regno_mode (const unsigned int,
> +						       machine_mode);
> extern rtx default_internal_arg_pointer (void);
> extern rtx default_static_chain (const_tree, bool);
> extern void default_trampoline_init (rtx, tree, rtx);
> diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> new file mode 100644
> index 0000000..3c2ac72
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> @@ -0,0 +1,3 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> +/* { dg-error "'-fzero-call-used-regs=' is not supported for this target" "" { target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> new file mode 100644
> index 0000000..acf48c4
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> @@ -0,0 +1,4 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2" } */
> +
> +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr"))); /* { dg-warning " attribute directive ignored" "" {target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> new file mode 100644
> index 0000000..9f61dc4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> new file mode 100644
> index 0000000..09048e5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> new file mode 100644
> index 0000000..4862688
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> +
> +struct S { int i; };
> +__attribute__((const, noinline, noclone))
> +struct S foo (int x)
> +{
> +  struct S s;
> +  s.i = x;
> +  return s;
> +}
> +
> +int a[2048], b[2048], c[2048], d[2048];
> +struct S e[2048];
> +
> +__attribute__((noinline, noclone)) void
> +bar (void)
> +{
> +  int i;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      e[i] = foo (i);
> +      a[i+2] = a[i] + a[i+1];
> +      b[10] = b[10] + i;
> +      c[i] = c[2047 - i];
> +      d[i] = d[i + 1];
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  bar ();
> +  for (i = 0; i < 1024; i++)
> +    if (e[i].i != i)
> +      __builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> new file mode 100644
> index 0000000..500251b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +struct S { int i; };
> +__attribute__((const, noinline, noclone))
> +struct S foo (int x)
> +{
> +  struct S s;
> +  s.i = x;
> +  return s;
> +}
> +
> +int a[2048], b[2048], c[2048], d[2048];
> +struct S e[2048];
> +
> +__attribute__((noinline, noclone)) void
> +bar (void)
> +{
> +  int i;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      e[i] = foo (i);
> +      a[i+2] = a[i] + a[i+1];
> +      b[10] = b[10] + i;
> +      c[i] = c[2047 - i];
> +      d[i] = d[i + 1];
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  bar ();
> +  for (i = 0; i < 1024; i++)
> +    if (e[i].i != i)
> +      __builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> new file mode 100644
> index 0000000..8b058e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> new file mode 100644
> index 0000000..d4eaaf7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> new file mode 100644
> index 0000000..dd3bb90
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> new file mode 100644
> index 0000000..e2274f6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> new file mode 100644
> index 0000000..7f5d153
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> new file mode 100644
> index 0000000..fe13d2b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> new file mode 100644
> index 0000000..205a532
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> new file mode 100644
> index 0000000..e046684
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> new file mode 100644
> index 0000000..4be8ff6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> new file mode 100644
> index 0000000..0eb34e0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
> +
> +__attribute__ ((zero_call_used_regs("used")))
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> new file mode 100644
> index 0000000..cbb63a4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> new file mode 100644
> index 0000000..7573197
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> new file mode 100644
> index 0000000..de71223
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> new file mode 100644
> index 0000000..ccfa441
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> new file mode 100644
> index 0000000..6b46ca3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +__attribute__ ((zero_call_used_regs("all-gpr")))
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> new file mode 100644
> index 0000000..0680f38
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> new file mode 100644
> index 0000000..534defa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> new file mode 100644
> index 0000000..477bb19
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> new file mode 100644
> index 0000000..a305a60
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 95eea63..01a1f24 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -1464,6 +1464,15 @@ process_options (void)
> 	}
>    }
> 
> +  if (flag_zero_call_used_regs != zero_call_used_regs_skip
> +      && !targetm.calls.pro_epilogue_use)
> +    {
> +      error_at (UNKNOWN_LOCATION,
> +		"%<-fzero-call-used-regs=%> is not supported for this "
> +		"target");
> +      flag_zero_call_used_regs = zero_call_used_regs_skip;
> +    }
> +
>  /* One region RA really helps to decrease the code size.  */
>  if (flag_ira_region == IRA_REGION_AUTODETECT)
>    flag_ira_region
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index 8c5a2e3..71badbd 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1825,7 +1825,11 @@ struct GTY(()) tree_decl_with_vis {
> unsigned final : 1;
> /* Belong to FUNCTION_DECL exclusively.  */
> unsigned regdecl_flag : 1;
> - /* 14 unused bits. */
> +
> + /* How to clear call-used registers upon function return.  */
> + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
> +
> + /* 11 unused bits.  */
> };
> 
> struct GTY(()) tree_var_decl {
> diff --git a/gcc/tree.h b/gcc/tree.h
> index cf546ed..d378a88 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -2925,6 +2925,11 @@ extern void decl_value_expr_insert (tree, tree);
> #define DECL_VISIBILITY(NODE) \
>  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.visibility)
> 
> +/* Value of the function decl's type of zeroing the call used
> +   registers upon return from function.  */
> +#define DECL_ZERO_CALL_USED_REGS(NODE) \
> +  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.zero_call_used_regs_type)
> +
> /* Nonzero means that the decl (or an enclosing scope) had its
>   visibility specified rather than being inferred.  */
> #define DECL_VISIBILITY_SPECIFIED(NODE) \
> -- 
> 1.9.1


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-07-28 20:05         ` PING " Qing Zhao
@ 2020-07-31 17:57           ` Uros Bizjak
  2020-08-03 15:42             ` Qing Zhao
  2020-08-07 13:20           ` Alexandre Oliva
  1 sibling, 1 reply; 188+ messages in thread
From: Uros Bizjak @ 2020-07-31 17:57 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Biener, H. J. Lu, Jakub Jelinek, gcc-patches, Kees Cook,
	Rodriguez Bahena, Victor

22:05, tor., 28. jul. 2020 je oseba Qing Zhao <QING.ZHAO@oracle.com>
napisala:
>
>
> Richard and Uros,
>
> Could you please review the change that H.J and I rewrote based on your
comments in the previous round of discussion?
>
> This patch is a nice security enhancement for GCC that has been requested
by security people for quite some time.
>
> Thanks a lot for your time.

I'll be away from the keyboard for the next week, but the patch needs a
middle end approval first.

That said, x86 parts looks OK.

Uros.
> Qing
>
> > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi, Gcc team,
> >
> > This patch is a follow-up on the previous patch and corresponding
discussion:
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <
https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>
> >
> > From the previous round of discussion, the major issues raised were:
> >
> > A. should be rewritten by using regsets infrastructure.
> > B. Put the patch into middle-end instead of x86 backend.
> >
> > This new patch is rewritten based on the above 2 comments.  The major
changes compared to the previous patch are:
> >
> > 1. Change the names of the option and attribute from
> > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and
zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > to:
> > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and
zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > Add the new option and  new attribute in general.
> > 2. The main code generation part is moved from i386 backend to
middle-end;
> > 3. Add 4 target-hooks;
> > 4. Implement these 4 target-hooks on i386 backend.
> > 5. On a target that does not implement the target hook, issue error for
the new option, issue warning for the new attribute.
> >
> > The patch is as following:
> >
> > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > command-line option and
> > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function
attribue:
> >
> >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> >
> >  Don't zero call-used registers upon function return.
> >
> >  2. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")
> >
> >  Zero used call-used general purpose registers upon function return.
> >
> >  3. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")
> >
> >  Zero all call-used general purpose registers upon function return.
> >
> >  4. -fzero-call-used-regs=used and zero_call_used_regs("used")
> >
> >  Zero used call-used registers upon function return.
> >
> >  5. -fzero-call-used-regs=all and zero_call_used_regs("all")
> >
> >  Zero all call-used registers upon function return.
> >
> > The feature is implemented in middle-end. But currently is only valid
on X86.
> >
> > Tested on x86-64 and aarch64 with bootstrapping GCC trunk, making
> > -fzero-call-used-regs=used-gpr, -fzero-call-used-regs=all-gpr
> > -fzero-call-used-regs=used, and -fzero-call-used-regs=all enabled
> > by default on x86-64.
> >
> > Please take a look and let me know any more comment?
> >
> > thanks.
> >
> > Qing
> >
> >
> > ====================================
> >
> > gcc/ChangeLog:
> >
> > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:
qing.zhao@oracle.com>>
> > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
> >
> >       * common.opt: Add new option -fzero-call-used-regs.
> >       * config/i386/i386.c (ix86_zero_call_used_regno_p): New function.
> >       (ix86_zero_call_used_regno_mode): Likewise.
> >       (ix86_zero_all_vector_registers): Likewise.
> >       (ix86_expand_prologue): Replace gen_prologue_use with
> >       gen_pro_epilogue_use.
> >       (TARGET_ZERO_CALL_USED_REGNO_P): Define.
> >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Define.
> >       (TARGET_PRO_EPILOGUE_USE): Define.
> >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Define.
> >       * config/i386/i386.md: Replace UNSPECV_PROLOGUE_USE
> >       with UNSPECV_PRO_EPILOGUE_USE.
> >       * coretypes.h (enum zero_call_used_regs): New type.
> >       * doc/extend.texi: Document the new zero_call_used_regs attribute.
> >       * doc/invoke.texi: Document the new -fzero-call-used-regs option.
> >       * doc/tm.texi: Regenerate.
> >       * doc/tm.texi.in (TARGET_ZERO_CALL_USED_REGNO_P): New hook.
> >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Likewise.
> >       (TARGET_PRO_EPILOGUE_USE): Likewise.
> >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Likewise.
> >       * function.c (is_live_reg_at_exit): New function.
> >       (gen_call_used_regs_seq): Likewise.
> >       (make_epilogue_seq): Call gen_call_used_regs_seq.
> >       * function.h (is_live_reg_at_exit): Declare.
> >       * target.def (zero_call_used_regno_p): New hook.
> >       (zero_call_used_regno_mode): Likewise.
> >       (pro_epilogue_use): Likewise.
> >       (zero_all_vector_registers): Likewise.
> >       * targhooks.c (default_zero_call_used_regno_p): New function.
> >       (default_zero_call_used_regno_mode): Likewise.
> >       * targhooks.h (default_zero_call_used_regno_p): Declare.
> >       (default_zero_call_used_regno_mode): Declare.
> >       * toplev.c (process_options): Issue errors when
-fzero-call-used-regs
> >       is used on targets that do not support it.
> >       * tree-core.h (struct tree_decl_with_vis): New field
> >       zero_call_used_regs_type.
> >       * tree.h (DECL_ZERO_CALL_USED_REGS): New macro.
> >
> > gcc/c-family/ChangeLog:
> >
> > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:
qing.zhao@oracle.com>>
> > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
> >
> >       * c-attribs.c (c_common_attribute_table): Add new attribute
> >       zero_call_used_regs.
> >       (handle_zero_call_used_regs_attribute): New function.
> >
> > gcc/c/ChangeLog:
> >
> > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:
qing.zhao@oracle.com>>
> > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
> >
> >       * c-decl.c (merge_decls): Merge zero_call_used_regs_type.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:
qing.zhao@oracle.com>>
> > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
> >
> >       * c-c++-common/zero-scratch-regs-1.c: New test.
> >       * c-c++-common/zero-scratch-regs-2.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-1.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-10.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-11.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-12.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-13.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-14.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-15.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-16.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-17.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-18.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-19.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-2.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-20.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-21.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-22.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-23.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-3.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-4.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-5.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-6.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-7.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-8.c: Likewise.
> >       * gcc.target/i386/zero-scratch-regs-9.c: Likewise.
> >
> > ---
> > gcc/c-family/c-attribs.c                           |  68 ++++++++++
> > gcc/c/c-decl.c                                     |   4 +
> > gcc/common.opt                                     |  23 ++++
> > gcc/config/i386/i386.c                             |  58 ++++++++-
> > gcc/config/i386/i386.md                            |   6 +-
> > gcc/coretypes.h                                    |  10 ++
> > gcc/doc/extend.texi                                |  11 ++
> > gcc/doc/invoke.texi                                |  13 +-
> > gcc/doc/tm.texi                                    |  27 ++++
> > gcc/doc/tm.texi.in                                 |   8 ++
> > gcc/function.c                                     | 145
+++++++++++++++++++++
> > gcc/function.h                                     |   2 +
> > gcc/target.def                                     |  33 +++++
> > gcc/targhooks.c                                    |  17 +++
> > gcc/targhooks.h                                    |   3 +
> > gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |   3 +
> > gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |   4 +
> > .../gcc.target/i386/zero-scratch-regs-1.c          |  12 ++
> > .../gcc.target/i386/zero-scratch-regs-10.c         |  21 +++
> > .../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++++
> > .../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++++
> > .../gcc.target/i386/zero-scratch-regs-13.c         |  21 +++
> > .../gcc.target/i386/zero-scratch-regs-14.c         |  19 +++
> > .../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
> > .../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
> > .../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
> > .../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
> > .../gcc.target/i386/zero-scratch-regs-19.c         |  12 ++
> > .../gcc.target/i386/zero-scratch-regs-2.c          |  19 +++
> > .../gcc.target/i386/zero-scratch-regs-20.c         |  23 ++++
> > .../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
> > .../gcc.target/i386/zero-scratch-regs-22.c         |  19 +++
> > .../gcc.target/i386/zero-scratch-regs-23.c         |  19 +++
> > .../gcc.target/i386/zero-scratch-regs-3.c          |  12 ++
> > .../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
> > .../gcc.target/i386/zero-scratch-regs-5.c          |  20 +++
> > .../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
> > .../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
> > .../gcc.target/i386/zero-scratch-regs-8.c          |  19 +++
> > .../gcc.target/i386/zero-scratch-regs-9.c          |  15 +++
> > gcc/toplev.c                                       |   9 ++
> > gcc/tree-core.h                                    |   6 +-
> > gcc/tree.h                                         |   5 +
> > 43 files changed, 866 insertions(+), 7 deletions(-)
> > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> >
> > diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> > index 3721483..cc93d6f 100644
> > --- a/gcc/c-family/c-attribs.c
> > +++ b/gcc/c-family/c-attribs.c
> > @@ -136,6 +136,8 @@ static tree handle_target_clones_attribute (tree *,
tree, tree, int, bool *);
> > static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> > static tree ignore_attribute (tree *, tree, tree, int, bool *);
> > static tree handle_no_split_stack_attribute (tree *, tree, tree, int,
bool *);
> > +static tree handle_zero_call_used_regs_attribute (tree *, tree, tree,
int,
> > +                                              bool *);
> > static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
> > static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool
*);
> > static tree handle_returns_nonnull_attribute (tree *, tree, tree, int,
bool *);
> > @@ -434,6 +436,9 @@ const struct attribute_spec
c_common_attribute_table[] =
> >                             ignore_attribute, NULL },
> >  { "no_split_stack",        0, 0, true,  false, false, false,
> >                             handle_no_split_stack_attribute, NULL },
> > +  { "zero_call_used_regs",    1, 1, true, false, false, false,
> > +                           handle_zero_call_used_regs_attribute, NULL
},
> > +
> >  /* For internal use (marking of builtins and runtime functions) only.
> >     The name contains space to prevent its usage in source code.  */
> >  { "fn spec",               1, 1, false, true, true, false,
> > @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node,
tree name,
> >  return NULL_TREE;
> > }
> >
> > +/* Handle a "zero_call_used_regs" attribute; arguments as in
> > +   struct attribute_spec.handler.  */
> > +
> > +static tree
> > +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> > +                                   int ARG_UNUSED (flags),
> > +                                   bool *no_add_attris)
> > +{
> > +  tree decl = *node;
> > +  tree id = TREE_VALUE (args);
> > +  enum zero_call_used_regs zero_call_used_regs_type =
zero_call_used_regs_unset;
> > +
> > +  if (TREE_CODE (decl) != FUNCTION_DECL)
> > +    {
> > +      error_at (DECL_SOURCE_LOCATION (decl),
> > +             "%qE attribute applies only to functions", name);
> > +      *no_add_attris = true;
> > +      return NULL_TREE;
> > +    }
> > +  else if (DECL_INITIAL (decl))
> > +    {
> > +      error_at (DECL_SOURCE_LOCATION (decl),
> > +             "cannot set %qE attribute after definition", name);
> > +      *no_add_attris = true;
> > +      return NULL_TREE;
> > +    }
> > +
> > +  if (TREE_CODE (id) != STRING_CST)
> > +    {
> > +      error ("attribute %qE arguments not a string", name);
> > +      *no_add_attris = true;
> > +      return NULL_TREE;
> > +    }
> > +
> > +  if (!targetm.calls.pro_epilogue_use)
> > +    {
> > +      warning (OPT_Wattributes, "%qE attribute directive ignored",
name);
> > +      return NULL_TREE;
> > +    }
> > +
> > +  if (strcmp (TREE_STRING_POINTER (id), "skip") == 0)
> > +    zero_call_used_regs_type = zero_call_used_regs_skip;
> > +  else if (strcmp (TREE_STRING_POINTER (id), "used-gpr") == 0)
> > +    zero_call_used_regs_type = zero_call_used_regs_used_gpr;
> > +  else if (strcmp (TREE_STRING_POINTER (id), "all-gpr") == 0)
> > +    zero_call_used_regs_type = zero_call_used_regs_all_gpr;
> > +  else if (strcmp (TREE_STRING_POINTER (id), "used") == 0)
> > +    zero_call_used_regs_type = zero_call_used_regs_used;
> > +  else if (strcmp (TREE_STRING_POINTER (id), "all") == 0)
> > +    zero_call_used_regs_type = zero_call_used_regs_all;
> > +  else
> > +    {
> > +      error ("attribute %qE argument must be one of %qs, %qs, %qs,
%qs, or %qs",
> > +          name, "skip", "used-gpr", "all-gpr", "used", "all");
> > +      *no_add_attris = true;
> > +      return NULL_TREE;
> > +    }
> > +
> > +  DECL_ZERO_CALL_USED_REGS (decl) = zero_call_used_regs_type;
> > +
> > +  return NULL_TREE;
> > +}
> > +
> > /* Handle a "returns_nonnull" attribute; arguments as in
> >   struct attribute_spec.handler.  */
> >
> > diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> > index 81bd2ee..ded1880 100644
> > --- a/gcc/c/c-decl.c
> > +++ b/gcc/c/c-decl.c
> > @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree
newtype, tree oldtype)
> >         DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
> >       }
> >
> > +      /* Merge the zero_call_used_regs_type information.  */
> > +      if (TREE_CODE (newdecl) == FUNCTION_DECL)
> > +     DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS
(olddecl);
> > +
> >      /* Merge the storage class information.  */
> >      merge_weak (newdecl, olddecl);
> >
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index df8af36..19900f9 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -3083,6 +3083,29 @@ fzero-initialized-in-bss
> > Common Report Var(flag_zero_initialized_in_bss) Init(1)
> > Put zero initialized data in the bss section.
> >
> > +fzero-call-used-regs=
> > +Common Report RejectNegative Joined Enum(zero_call_used_regs)
Var(flag_zero_call_used_regs) Init(zero_call_used_regs_skip)
> > +Clear call-used registers upon function return.
> > +
> > +Enum
> > +Name(zero_call_used_regs) Type(enum zero_call_used_regs)
> > +Known choices of clearing call-used registers upon function return
(for use with the -fzero-call-used-regs= option):
> > +
> > +EnumValue
> > +Enum(zero_call_used_regs) String(skip) Value(zero_call_used_regs_skip)
> > +
> > +EnumValue
> > +Enum(zero_call_used_regs) String(used-gpr)
Value(zero_call_used_regs_used_gpr)
> > +
> > +EnumValue
> > +Enum(zero_call_used_regs) String(all-gpr)
Value(zero_call_used_regs_all_gpr)
> > +
> > +EnumValue
> > +Enum(zero_call_used_regs) String(used) Value(zero_call_used_regs_used)
> > +
> > +EnumValue
> > +Enum(zero_call_used_regs) String(all) Value(zero_call_used_regs_all)
> > +
> > g
> > Common Driver RejectNegative JoinedOrMissing
> > Generate debug information in default format.
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index 5c373c0..fd1aa9c 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -3551,6 +3551,48 @@ ix86_function_value_regno_p (const unsigned int
regno)
> >  return false;
> > }
> >
> > +/* TARGET_ZERO_CALL_USED_REGNO_P.  */
> > +
> > +static bool
> > +ix86_zero_call_used_regno_p (const unsigned int regno,
> > +                          bool gpr_only)
> > +{
> > +  return GENERAL_REGNO_P (regno) || (!gpr_only && SSE_REGNO_P (regno));
> > +}
> > +
> > +/* TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > +
> > +static machine_mode
> > +ix86_zero_call_used_regno_mode (const unsigned int regno, machine_mode)
> > +{
> > +  /* NB: We only need to zero the lower 32 bits for integer registers
> > +     and the lower 128 bits for vector registers since destination are
> > +     zero-extended to the full register width.  */
> > +  return GENERAL_REGNO_P (regno) ? SImode : V4SFmode;
> > +}
> > +
> > +/* TARGET_ZERO_ALL_VECTOR_REGISTERS.  */
> > +
> > +static rtx
> > +ix86_zero_all_vector_registers (bool used_only)
> > +{
> > +  if (!TARGET_AVX)
> > +    return NULL;
> > +
> > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> > +      || (TARGET_64BIT
> > +          && (REX_SSE_REGNO_P (regno)
> > +              || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> > +     && (!this_target_hard_regs->x_call_used_regs[regno]
> > +         || fixed_regs[regno]
> > +         || is_live_reg_at_exit (regno)
> > +         || (used_only && !df_regs_ever_live_p (regno))))
> > +      return NULL;
> > +
> > +  return gen_avx_vzeroall ();
> > +}
> > +
> > /* Define how to find the value returned by a function.
> >   VALTYPE is the data type of the value (as a tree).
> >   If the precise function being called is known, FUNC is its
FUNCTION_DECL;
> > @@ -8513,7 +8555,7 @@ ix86_expand_prologue (void)
> >      insn = emit_insn (gen_set_got (pic));
> >      RTX_FRAME_RELATED_P (insn) = 1;
> >      add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
> > -      emit_insn (gen_prologue_use (pic));
> > +      emit_insn (gen_pro_epilogue_use (pic));
> >      /* Deleting already emmitted SET_GOT if exist and allocated to
> >        REAL_PIC_OFFSET_TABLE_REGNUM.  */
> >      ix86_elim_entry_set_got (pic);
> > @@ -8542,7 +8584,7 @@ ix86_expand_prologue (void)
> >     Further, prevent alloca modifications to the stack pointer from
being
> >     combined with prologue modifications.  */
> >  if (TARGET_SEH)
> > -    emit_insn (gen_prologue_use (stack_pointer_rtx));
> > +    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));
> > }
> >
> > /* Emit code to restore REG using a POP insn.  */
> > @@ -23319,6 +23361,18 @@ ix86_run_selftests (void)
> > #undef TARGET_FUNCTION_VALUE_REGNO_P
> > #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p
> >
> > +#undef TARGET_ZERO_CALL_USED_REGNO_P
> > +#define TARGET_ZERO_CALL_USED_REGNO_P ix86_zero_call_used_regno_p
> > +
> > +#undef TARGET_ZERO_CALL_USED_REGNO_MODE
> > +#define TARGET_ZERO_CALL_USED_REGNO_MODE ix86_zero_call_used_regno_mode
> > +
> > +#undef TARGET_PRO_EPILOGUE_USE
> > +#define TARGET_PRO_EPILOGUE_USE gen_pro_epilogue_use
> > +
> > +#undef TARGET_ZERO_ALL_VECTOR_REGISTERS
> > +#define TARGET_ZERO_ALL_VECTOR_REGISTERS ix86_zero_all_vector_registers
> > +
> > #undef TARGET_PROMOTE_FUNCTION_MODE
> > #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
> >
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index d0ecd9e..e7df59f 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -194,7 +194,7 @@
> >  UNSPECV_STACK_PROBE
> >  UNSPECV_PROBE_STACK_RANGE
> >  UNSPECV_ALIGN
> > -  UNSPECV_PROLOGUE_USE
> > +  UNSPECV_PRO_EPILOGUE_USE
> >  UNSPECV_SPLIT_STACK_RETURN
> >  UNSPECV_CLD
> >  UNSPECV_NOPS
> > @@ -13525,8 +13525,8 @@
> >
> > ;; As USE insns aren't meaningful after reload, this is used instead
> > ;; to prevent deleting instructions setting registers for PIC code
> > -(define_insn "prologue_use"
> > -  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]
> > +(define_insn "pro_epilogue_use"
> > +  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
> >  ""
> >  ""
> >  [(set_attr "length" "0")])
> > diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> > index 6b6cfcd..e56d6ec 100644
> > --- a/gcc/coretypes.h
> > +++ b/gcc/coretypes.h
> > @@ -418,6 +418,16 @@ enum symbol_visibility
> >  VISIBILITY_INTERNAL
> > };
> >
> > +/* Zero call-used registers type.  */
> > +enum zero_call_used_regs {
> > +  zero_call_used_regs_unset = 0,
> > +  zero_call_used_regs_skip,
> > +  zero_call_used_regs_used_gpr,
> > +  zero_call_used_regs_all_gpr,
> > +  zero_call_used_regs_used,
> > +  zero_call_used_regs_all
> > +};
> > +
> > /* enums used by the targetm.excess_precision hook.  */
> >
> > enum flt_eval_method
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index c800b74..b32c55f 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -3984,6 +3984,17 @@ performing a link with relocatable output (i.e.@:
@code{ld -r}) on them.
> > A declaration to which @code{weakref} is attached and that is associated
> > with a named @code{target} must be @code{static}.
> >
> > +@item zero_call_used_regs ("@var{choice}")
> > +@cindex @code{zero_call_used_regs} function attribute
> > +The @code{zero_call_used_regs} attribute causes the compiler to zero
> > +call-used registers at function return according to @var{choice}.
> > +@samp{skip} doesn't zero call-used registers. @samp{used-gpr} zeros
> > +call-used general purpose registers which are used in funciton.
> > +@samp{all-gpr} zeros all call-used general purpose registers.
> > +@samp{used} zeros call-used registers which are used in function.
> > +@samp{all} zeros all call-used registers.  The default for the
> > +attribute is controlled by @option{-fzero-call-used-regs}.
> > +
> > @end table
> >
> > @c This is the end of the target-independent attribute table
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 09bcc5b..da02686 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -542,7 +542,7 @@ Objective-C and Objective-C++ Dialects}.
> > -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
> > -funsafe-math-optimizations  -funswitch-loops @gol
> > -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt
@gol
> > --fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
> > +-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin
-fzero-call-used-regs @gol
> > --param @var{name}=@var{value}
> > -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}
> >
> > @@ -12273,6 +12273,17 @@ int foo (void)
> >
> > Not all targets support this option.
> >
> > +@item -fzero-call-used-regs=@var{choice}
> > +@opindex fzero-call-used-regs
> > +Zero call-used registers at function return according to
> > +@var{choice}.  @samp{skip}, which is the default, doesn't zero
> > +call-used registers.  @samp{used-gpr} zeros call-used general purpose
> > +registers which are used in function.  @samp{all-gpr} zeros all
> > +call-used registers.  @samp{used} zeros call-used registers which
> > +are used in function.  @samp{all} zeros all call-used registers.  You
> > +can control this behavior for a specific function by using the function
> > +attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
> > +
> > @item --param @var{name}=@var{value}
> > @opindex param
> > In some places, GCC uses various constants to control the amount of
> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > index 6e7d9dc..43dddd3 100644
> > --- a/gcc/doc/tm.texi
> > +++ b/gcc/doc/tm.texi
> > @@ -4571,6 +4571,22 @@ should recognize only the caller's register
numbers.
> > If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.
> > @end deftypefn
> >
> > +@deftypefn {Target Hook} bool TARGET_ZERO_CALL_USED_REGNO_P (const
unsigned int @var{regno}, bool @var{general_reg_only_p})
> > +A target hook that returns @code{true} if @var{regno} is the number of
a
> > +call used register.  If @var{general_reg_only_p} is @code{true},
> > +@var{regno} must be the number of a hard general register.
> > +
> > +If this hook is not defined, then default_zero_call_used_regno_p will
be used.
> > +@end deftypefn
> > +
> > +@deftypefn {Target Hook} machine_mode TARGET_ZERO_CALL_USED_REGNO_MODE
(const unsigned int @var{regno}, machine_mode @var{mode})
> > +A target hook that returns a mode of suitable to zero the register for
the
> > +call used register @var{regno} in @var{mode}.
> > +
> > +If this hook is not defined, then default_zero_call_used_regno_mode
will be
> > +used.
> > +@end deftypefn
> > +
> > @defmac APPLY_RESULT_SIZE
> > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > @@ -12043,6 +12059,17 @@ argument list due to stack realignment.
Return @code{NULL} if no DRAP
> > is needed.
> > @end deftypefn
> >
> > +@deftypefn {Target Hook} rtx TARGET_PRO_EPILOGUE_USE (rtx @var{reg})
> > +This hook should return a UNSPEC_VOLATILE rtx to mark a register in
use to
> > +prevent deleting register setting instructions in proprologue and
epilogue.
> > +@end deftypefn
> > +
> > +@deftypefn {Target Hook} rtx TARGET_ZERO_ALL_VECTOR_REGISTERS (bool
@var{used_only})
> > +This hook should return an rtx to zero all vector registers at function
> > +exit.  If @var{used_only} is @code{true}, only used vector registers
should
> > +be zeroed.  Return @code{NULL} if possible
> > +@end deftypefn
> > +
> > @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
(void)
> > When optimization is disabled, this hook indicates whether or not
> > arguments should be allocated to stack slots.  Normally, GCC allocates
> > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> > index 3be984b..bee917a 100644
> > --- a/gcc/doc/tm.texi.in
> > +++ b/gcc/doc/tm.texi.in
> > @@ -3430,6 +3430,10 @@ for a new target instead.
> >
> > @hook TARGET_FUNCTION_VALUE_REGNO_P
> >
> > +@hook TARGET_ZERO_CALL_USED_REGNO_P
> > +
> > +@hook TARGET_ZERO_CALL_USED_REGNO_MODE
> > +
> > @defmac APPLY_RESULT_SIZE
> > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > @@ -8109,6 +8113,10 @@ and the associated definitions of those
functions.
> >
> > @hook TARGET_GET_DRAP_RTX
> >
> > +@hook TARGET_PRO_EPILOGUE_USE
> > +
> > +@hook TARGET_ZERO_ALL_VECTOR_REGISTERS
> > +
> > @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
> >
> > @hook TARGET_CONST_ANCHOR
> > diff --git a/gcc/function.c b/gcc/function.c
> > index 9eee9b5..9908530 100644
> > --- a/gcc/function.c
> > +++ b/gcc/function.c
> > @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
> > #include "emit-rtl.h"
> > #include "recog.h"
> > #include "rtl-error.h"
> > +#include "hard-reg-set.h"
> > #include "alias.h"
> > #include "fold-const.h"
> > #include "stor-layout.h"
> > @@ -5808,6 +5809,147 @@ make_prologue_seq (void)
> >  return seq;
> > }
> >
> > +/* Check whether the hard register REGNO is live at the exit block
> > + * of the current routine.  */
> > +bool
> > +is_live_reg_at_exit (unsigned int regno)
> > +{
> > +  edge e;
> > +  edge_iterator ei;
> > +
> > +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> > +    {
> > +      bitmap live_out = df_get_live_out (e->src);
> > +      if (REGNO_REG_SET_P (live_out, regno))
> > +     return true;
> > +    }
> > +
> > +  return false;
> > +}
> > +
> > +/* Emit a sequence of insns to zero the call-used-registers for the
current
> > + * function.  */
> > +
> > +static void
> > +gen_call_used_regs_seq (void)
> > +{
> > +  if (!targetm.calls.pro_epilogue_use)
> > +    return;
> > +
> > +  bool gpr_only = true;
> > +  bool used_only = true;
> > +  enum zero_call_used_regs zero_call_used_regs_type =
zero_call_used_regs_unset;
> > +
> > +  if (flag_zero_call_used_regs)
> > +    if (DECL_ZERO_CALL_USED_REGS (current_function_decl)
> > +     == zero_call_used_regs_unset)
> > +      zero_call_used_regs_type = flag_zero_call_used_regs;
> > +    else
> > +      zero_call_used_regs_type
> > +     = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> > +  else
> > +    zero_call_used_regs_type = DECL_ZERO_CALL_USED_REGS
(current_function_decl);
> > +
> > +  /* No need to zero call-used-regs when no user request is present.
*/
> > +  if (zero_call_used_regs_type <= zero_call_used_regs_skip)
> > +    return;
> > +
> > +  /* No need to zero call-used-regs in main ().  */
> > +  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
> > +    return;
> > +
> > +  /* No need to zero call-used-regs if __builtin_eh_return is called
> > +     since it isn't a normal function return.  */
> > +  if (crtl->calls_eh_return)
> > +    return;
> > +
> > +  /* If gpr_only is true, only zero call-used-registers that are
> > +     general-purpose registers; if used_only is true, only zero
> > +     call-used-registers that are used in the current function.  */
> > +  switch (zero_call_used_regs_type)
> > +    {
> > +      case zero_call_used_regs_all_gpr:
> > +     used_only = false;
> > +     break;
> > +      case zero_call_used_regs_used:
> > +     gpr_only = false;
> > +     break;
> > +      case zero_call_used_regs_all:
> > +     gpr_only = false;
> > +     used_only = false;
> > +     break;
> > +      default:
> > +     break;
> > +    }
> > +
> > +  /* An optimization to use a single hard insn to zero all vector
registers on
> > +     the target that provides such insn.  */
> > +  if (!gpr_only
> > +      && targetm.calls.zero_all_vector_registers)
> > +    {
> > +      rtx zero_all_vec_insn
> > +     = targetm.calls.zero_all_vector_registers (used_only);
> > +      if (zero_all_vec_insn)
> > +     {
> > +       emit_insn (zero_all_vec_insn);
> > +       gpr_only = true;
> > +     }
> > +    }
> > +
> > +  /* For each of the hard registers, check to see whether we should
zero it if:
> > +     1. it is a call-used-registers;
> > + and 2. it is not a fixed-registers;
> > + and 3. it is not live at the end of the routine;
> > + and 4. it is general purpose register if gpr_only is true;
> > + and 5. it is used in the routine if used_only is true;
> > +   */
> > +
> > +  /* This array holds the zero rtx with the correponding machine
mode.  */
> > +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
> > +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> > +    zero_rtx[i] = NULL_RTX;
> > +
> > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > +    {
> > +      if (!this_target_hard_regs->x_call_used_regs[regno])
> > +     continue;
> > +      if (fixed_regs[regno])
> > +     continue;
> > +      if (is_live_reg_at_exit (regno))
> > +     continue;
> > +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
> > +     continue;
> > +      if (used_only && !df_regs_ever_live_p (regno))
> > +     continue;
> > +
> > +      /* Now we can emit insn to zero this register.  */
> > +      rtx reg, tmp;
> > +
> > +      machine_mode mode
> > +     = targetm.calls.zero_call_used_regno_mode (regno,
> > +                                                reg_raw_mode[regno]);
> > +      if (mode == VOIDmode)
> > +     continue;
> > +      if (!have_regs_of_mode[mode])
> > +     continue;
> > +
> > +      reg = gen_rtx_REG (mode, regno);
> > +      if (zero_rtx[(int)mode] == NULL_RTX)
> > +     {
> > +       zero_rtx[(int)mode] = reg;
> > +       tmp = gen_rtx_SET (reg, const0_rtx);
> > +       emit_insn (tmp);
> > +     }
> > +      else
> > +     emit_move_insn (reg, zero_rtx[(int)mode]);
> > +
> > +      emit_insn (targetm.calls.pro_epilogue_use (reg));
> > +    }
> > +
> > +  return;
> > +}
> > +
> > +
> > /* Return a sequence to be used as the epilogue for the current
function,
> >   or NULL.  */
> >
> > @@ -5819,6 +5961,9 @@ make_epilogue_seq (void)
> >
> >  start_sequence ();
> >  emit_note (NOTE_INSN_EPILOGUE_BEG);
> > +
> > +  gen_call_used_regs_seq ();
> > +
> >  rtx_insn *seq = targetm.gen_epilogue ();
> >  if (seq)
> >    emit_jump_insn (seq);
> > diff --git a/gcc/function.h b/gcc/function.h
> > index d55cbdd..fc36c3e 100644
> > --- a/gcc/function.h
> > +++ b/gcc/function.h
> > @@ -705,4 +705,6 @@ extern const char *current_function_name (void);
> >
> > extern void used_types_insert (tree);
> >
> > +extern bool is_live_reg_at_exit (unsigned int);
> > +
> > #endif  /* GCC_FUNCTION_H */
> > diff --git a/gcc/target.def b/gcc/target.def
> > index 07059a8..8aab63e 100644
> > --- a/gcc/target.def
> > +++ b/gcc/target.def
> > @@ -5022,6 +5022,26 @@ If this hook is not defined, then
FUNCTION_VALUE_REGNO_P will be used.",
> > default_function_value_regno_p)
> >
> > DEFHOOK
> > +(zero_call_used_regno_p,
> > + "A target hook that returns @code{true} if @var{regno} is the number
of a\n\
> > +call used register.  If @var{general_reg_only_p} is @code{true},\n\
> > +@var{regno} must be the number of a hard general register.\n\
> > +\n\
> > +If this hook is not defined, then default_zero_call_used_regno_p will
be used.",
> > + bool, (const unsigned int regno, bool general_reg_only_p),
> > + default_zero_call_used_regno_p)
> > +
> > +DEFHOOK
> > +(zero_call_used_regno_mode,
> > + "A target hook that returns a mode of suitable to zero the register
for the\n\
> > +call used register @var{regno} in @var{mode}.\n\
> > +\n\
> > +If this hook is not defined, then default_zero_call_used_regno_mode
will be\n\
> > +used.",
> > + machine_mode, (const unsigned int regno, machine_mode mode),
> > + default_zero_call_used_regno_mode)
> > +
> > +DEFHOOK
> > (fntype_abi,
> > "Return the ABI used by a function with type @var{type}; see the\n\
> > definition of @code{predefined_function_abi} for details of the ABI\n\
> > @@ -5068,6 +5088,19 @@ argument list due to stack realignment.  Return
@code{NULL} if no DRAP\n\
> > is needed.",
> > rtx, (void), NULL)
> >
> > +DEFHOOK
> > +(pro_epilogue_use,
> > + "This hook should return a UNSPEC_VOLATILE rtx to mark a register in
use to\n\
> > +prevent deleting register setting instructions in proprologue and
epilogue.",
> > + rtx, (rtx reg), NULL)
> > +
> > +DEFHOOK
> > +(zero_all_vector_registers,
> > + "This hook should return an rtx to zero all vector registers at
function\n\
> > +exit.  If @var{used_only} is @code{true}, only used vector registers
should\n\
> > +be zeroed.  Return @code{NULL} if possible",
> > + rtx, (bool used_only), NULL)
> > +
> > /* Return true if all function parameters should be spilled to the
> >   stack.  */
> > DEFHOOK
> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> > index 0113c7b..ed02173 100644
> > --- a/gcc/targhooks.c
> > +++ b/gcc/targhooks.c
> > @@ -987,6 +987,23 @@ default_function_value_regno_p (const unsigned int
regno ATTRIBUTE_UNUSED)
> > #endif
> > }
> >
> > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_P.  */
> > +
> > +bool
> > +default_zero_call_used_regno_p (const unsigned int,
> > +                             bool)
> > +{
> > +  return false;
> > +}
> > +
> > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > +
> > +machine_mode
> > +default_zero_call_used_regno_mode (const unsigned int, machine_mode
mode)
> > +{
> > +  return mode;
> > +}
> > +
> > rtx
> > default_internal_arg_pointer (void)
> > {
> > diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> > index b572a36..370df19 100644
> > --- a/gcc/targhooks.h
> > +++ b/gcc/targhooks.h
> > @@ -162,6 +162,9 @@ extern bool hook_bool_const_rtx_commutative_p
(const_rtx, int);
> > extern rtx default_function_value (const_tree, const_tree, bool);
> > extern rtx default_libcall_value (machine_mode, const_rtx);
> > extern bool default_function_value_regno_p (const unsigned int);
> > +extern bool default_zero_call_used_regno_p (const unsigned int, bool);
> > +extern machine_mode default_zero_call_used_regno_mode (const unsigned
int,
> > +                                                    machine_mode);
> > extern rtx default_internal_arg_pointer (void);
> > extern rtx default_static_chain (const_tree, bool);
> > extern void default_trampoline_init (rtx, tree, rtx);
> > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > new file mode 100644
> > index 0000000..3c2ac72
> > --- /dev/null
> > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > @@ -0,0 +1,3 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > +/* { dg-error "'-fzero-call-used-regs=' is not supported for this
target" "" { target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > new file mode 100644
> > index 0000000..acf48c4
> > --- /dev/null
> > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > @@ -0,0 +1,4 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2" } */
> > +
> > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
/* { dg-warning " attribute directive ignored" "" {target { ! "i?86-*-*
x86_64-*-*" } } 0 } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > new file mode 100644
> > index 0000000..9f61dc4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > new file mode 100644
> > index 0000000..09048e5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > +
> > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
> > +
> > +int
> > +foo (int x)
> > +{
> > +  return x;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > new file mode 100644
> > index 0000000..4862688
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > @@ -0,0 +1,39 @@
> > +/* { dg-do run { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > +
> > +struct S { int i; };
> > +__attribute__((const, noinline, noclone))
> > +struct S foo (int x)
> > +{
> > +  struct S s;
> > +  s.i = x;
> > +  return s;
> > +}
> > +
> > +int a[2048], b[2048], c[2048], d[2048];
> > +struct S e[2048];
> > +
> > +__attribute__((noinline, noclone)) void
> > +bar (void)
> > +{
> > +  int i;
> > +  for (i = 0; i < 1024; i++)
> > +    {
> > +      e[i] = foo (i);
> > +      a[i+2] = a[i] + a[i+1];
> > +      b[10] = b[10] + i;
> > +      c[i] = c[2047 - i];
> > +      d[i] = d[i + 1];
> > +    }
> > +}
> > +
> > +int
> > +main ()
> > +{
> > +  int i;
> > +  bar ();
> > +  for (i = 0; i < 1024; i++)
> > +    if (e[i].i != i)
> > +      __builtin_abort ();
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > new file mode 100644
> > index 0000000..500251b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > @@ -0,0 +1,39 @@
> > +/* { dg-do run { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > +
> > +struct S { int i; };
> > +__attribute__((const, noinline, noclone))
> > +struct S foo (int x)
> > +{
> > +  struct S s;
> > +  s.i = x;
> > +  return s;
> > +}
> > +
> > +int a[2048], b[2048], c[2048], d[2048];
> > +struct S e[2048];
> > +
> > +__attribute__((noinline, noclone)) void
> > +bar (void)
> > +{
> > +  int i;
> > +  for (i = 0; i < 1024; i++)
> > +    {
> > +      e[i] = foo (i);
> > +      a[i+2] = a[i] + a[i+1];
> > +      b[10] = b[10] + i;
> > +      c[i] = c[2047 - i];
> > +      d[i] = d[i + 1];
> > +    }
> > +}
> > +
> > +int
> > +main ()
> > +{
> > +  int i;
> > +  bar ();
> > +  for (i = 0; i < 1024; i++)
> > +    if (e[i].i != i)
> > +      __builtin_abort ();
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > new file mode 100644
> > index 0000000..8b058e3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0,
%xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0,
%xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > new file mode 100644
> > index 0000000..d4eaaf7
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" }
*/
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > new file mode 100644
> > index 0000000..dd3bb90
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > +
> > +extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > new file mode 100644
> > index 0000000..e2274f6
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> > +
> > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > new file mode 100644
> > index 0000000..7f5d153
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > +
> > +int
> > +foo (int x)
> > +{
> > +  return x;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } }
*/
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > new file mode 100644
> > index 0000000..fe13d2b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > +
> > +float
> > +foo (float z, float y, float x)
> > +{
> > +  return x + y;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target {
! ia32 } } } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > new file mode 100644
> > index 0000000..205a532
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > +
> > +float
> > +foo (float z, float y, float x)
> > +{
> > +  return x;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > new file mode 100644
> > index 0000000..e046684
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > new file mode 100644
> > index 0000000..4be8ff6
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > +
> > +float
> > +foo (float z, float y, float x)
> > +{
> > +  return x + y;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target {
ia32 } } } } */
> > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0,
%xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1,
%xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > new file mode 100644
> > index 0000000..0eb34e0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
> > +
> > +__attribute__ ((zero_call_used_regs("used")))
> > +float
> > +foo (float z, float y, float x)
> > +{
> > +  return x + y;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target {
! ia32 } } } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > new file mode 100644
> > index 0000000..cbb63a4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" }
*/
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > new file mode 100644
> > index 0000000..7573197
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7
-mavx512f" } */
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > new file mode 100644
> > index 0000000..de71223
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > new file mode 100644
> > index 0000000..ccfa441
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > +
> > +extern void foo (void) __attribute__
((zero_call_used_regs("used-gpr")));
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > new file mode 100644
> > index 0000000..6b46ca3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > +
> > +__attribute__ ((zero_call_used_regs("all-gpr")))
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > new file mode 100644
> > index 0000000..0680f38
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > +
> > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > +
> > +void
> > +foo (void)
> > +{
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > new file mode 100644
> > index 0000000..534defa
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > +
> > +int
> > +foo (int x)
> > +{
> > +  return x;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } }
*/
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > new file mode 100644
> > index 0000000..477bb19
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > +
> > +int
> > +foo (int x)
> > +{
> > +  return x;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { !
ia32 } } } } */
> > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { !
ia32 } } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > new file mode 100644
> > index 0000000..a305a60
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile { target *-*-linux* } } */
> > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > +
> > +extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
> > +
> > +int
> > +foo (int x)
> > +{
> > +  return x;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } }
*/
> > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { !
ia32 } } } } */
> > diff --git a/gcc/toplev.c b/gcc/toplev.c
> > index 95eea63..01a1f24 100644
> > --- a/gcc/toplev.c
> > +++ b/gcc/toplev.c
> > @@ -1464,6 +1464,15 @@ process_options (void)
> >       }
> >    }
> >
> > +  if (flag_zero_call_used_regs != zero_call_used_regs_skip
> > +      && !targetm.calls.pro_epilogue_use)
> > +    {
> > +      error_at (UNKNOWN_LOCATION,
> > +             "%<-fzero-call-used-regs=%> is not supported for this "
> > +             "target");
> > +      flag_zero_call_used_regs = zero_call_used_regs_skip;
> > +    }
> > +
> >  /* One region RA really helps to decrease the code size.  */
> >  if (flag_ira_region == IRA_REGION_AUTODETECT)
> >    flag_ira_region
> > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > index 8c5a2e3..71badbd 100644
> > --- a/gcc/tree-core.h
> > +++ b/gcc/tree-core.h
> > @@ -1825,7 +1825,11 @@ struct GTY(()) tree_decl_with_vis {
> > unsigned final : 1;
> > /* Belong to FUNCTION_DECL exclusively.  */
> > unsigned regdecl_flag : 1;
> > - /* 14 unused bits. */
> > +
> > + /* How to clear call-used registers upon function return.  */
> > + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
> > +
> > + /* 11 unused bits.  */
> > };
> >
> > struct GTY(()) tree_var_decl {
> > diff --git a/gcc/tree.h b/gcc/tree.h
> > index cf546ed..d378a88 100644
> > --- a/gcc/tree.h
> > +++ b/gcc/tree.h
> > @@ -2925,6 +2925,11 @@ extern void decl_value_expr_insert (tree, tree);
> > #define DECL_VISIBILITY(NODE) \
> >  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.visibility)
> >
> > +/* Value of the function decl's type of zeroing the call used
> > +   registers upon return from function.  */
> > +#define DECL_ZERO_CALL_USED_REGS(NODE) \
> > +  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.zero_call_used_regs_type)
> > +
> > /* Nonzero means that the decl (or an enclosing scope) had its
> >   visibility specified rather than being inferred.  */
> > #define DECL_VISIBILITY_SPECIFIED(NODE) \
> > --
> > 1.9.1
>

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-07-31 17:57           ` Uros Bizjak
@ 2020-08-03 15:42             ` Qing Zhao
  2020-08-04  7:35               ` Richard Biener
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-08-03 15:42 UTC (permalink / raw)
  To: Uros Bizjak, Richard Biener
  Cc: H. J. Lu, Jakub Jelinek, GCC Patches, Kees Cook,
	Rodriguez Bahena, Victor

Hi, Uros,

Thanks a lot for your review on X86 parts.

Hi, Richard,

Could you please take a look at the middle-end part to see whether the rewritten addressed your previous concern?

Thanks a lot.

Qing


> On Jul 31, 2020, at 12:57 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
> 
> 22:05, tor., 28. jul. 2020 je oseba Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
> >
> >
> > Richard and Uros,
> >
> > Could you please review the change that H.J and I rewrote based on your comments in the previous round of discussion?
> >
> > This patch is a nice security enhancement for GCC that has been requested by security people for quite some time.  
> >
> > Thanks a lot for your time.
> 
> I'll be away from the keyboard for the next week, but the patch needs a middle end approval first.
> 
> That said, x86 parts looks OK.
> 
> 

> Uros.
> > Qing
> >
> > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > 
> > > Hi, Gcc team,
> > > 
> > > This patch is a follow-up on the previous patch and corresponding discussion:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html> <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
> > > 
> > > From the previous round of discussion, the major issues raised were:
> > > 
> > > A. should be rewritten by using regsets infrastructure.  
> > > B. Put the patch into middle-end instead of x86 backend. 
> > > 
> > > This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
> > > 
> > > 1. Change the names of the option and attribute from 
> > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > to:
> > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”) 
> > > Add the new option and  new attribute in general. 
> > > 2. The main code generation part is moved from i386 backend to middle-end;
> > > 3. Add 4 target-hooks;
> > > 4. Implement these 4 target-hooks on i386 backend. 
> > > 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
> > > 
> > > The patch is as following:
> > > 
> > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > command-line option and
> > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> > > 
> > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > 
> > >  Don't zero call-used registers upon function return.
> > > 
> > >  2. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")
> > > 
> > >  Zero used call-used general purpose registers upon function return.
> > > 
> > >  3. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")
> > > 
> > >  Zero all call-used general purpose registers upon function return.
> > > 
> > >  4. -fzero-call-used-regs=used and zero_call_used_regs("used")
> > > 
> > >  Zero used call-used registers upon function return.
> > > 
> > >  5. -fzero-call-used-regs=all and zero_call_used_regs("all")
> > > 
> > >  Zero all call-used registers upon function return.
> > > 
> > > The feature is implemented in middle-end. But currently is only valid on X86.
> > > 
> > > Tested on x86-64 and aarch64 with bootstrapping GCC trunk, making
> > > -fzero-call-used-regs=used-gpr, -fzero-call-used-regs=all-gpr
> > > -fzero-call-used-regs=used, and -fzero-call-used-regs=all enabled
> > > by default on x86-64.
> > > 
> > > Please take a look and let me know any more comment?
> > > 
> > > thanks.
> > > 
> > > Qing
> > > 
> > > 
> > > ====================================
> > > 
> > > gcc/ChangeLog:
> > > 
> > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > 
> > >       * common.opt: Add new option -fzero-call-used-regs.
> > >       * config/i386/i386.c (ix86_zero_call_used_regno_p): New function.
> > >       (ix86_zero_call_used_regno_mode): Likewise.
> > >       (ix86_zero_all_vector_registers): Likewise.
> > >       (ix86_expand_prologue): Replace gen_prologue_use with
> > >       gen_pro_epilogue_use.
> > >       (TARGET_ZERO_CALL_USED_REGNO_P): Define.
> > >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Define.
> > >       (TARGET_PRO_EPILOGUE_USE): Define.
> > >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Define.
> > >       * config/i386/i386.md: Replace UNSPECV_PROLOGUE_USE
> > >       with UNSPECV_PRO_EPILOGUE_USE.
> > >       * coretypes.h (enum zero_call_used_regs): New type.
> > >       * doc/extend.texi: Document the new zero_call_used_regs attribute.
> > >       * doc/invoke.texi: Document the new -fzero-call-used-regs option.
> > >       * doc/tm.texi: Regenerate.
> > >       * doc/tm.texi.in <http://tm.texi.in/> (TARGET_ZERO_CALL_USED_REGNO_P): New hook.
> > >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Likewise.
> > >       (TARGET_PRO_EPILOGUE_USE): Likewise.
> > >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Likewise.
> > >       * function.c (is_live_reg_at_exit): New function.
> > >       (gen_call_used_regs_seq): Likewise.
> > >       (make_epilogue_seq): Call gen_call_used_regs_seq.
> > >       * function.h (is_live_reg_at_exit): Declare.
> > >       * target.def (zero_call_used_regno_p): New hook.
> > >       (zero_call_used_regno_mode): Likewise.
> > >       (pro_epilogue_use): Likewise.
> > >       (zero_all_vector_registers): Likewise.
> > >       * targhooks.c (default_zero_call_used_regno_p): New function.
> > >       (default_zero_call_used_regno_mode): Likewise.
> > >       * targhooks.h (default_zero_call_used_regno_p): Declare.
> > >       (default_zero_call_used_regno_mode): Declare.
> > >       * toplev.c (process_options): Issue errors when -fzero-call-used-regs
> > >       is used on targets that do not support it.
> > >       * tree-core.h (struct tree_decl_with_vis): New field 
> > >       zero_call_used_regs_type.
> > >       * tree.h (DECL_ZERO_CALL_USED_REGS): New macro.
> > > 
> > > gcc/c-family/ChangeLog:
> > > 
> > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > 
> > >       * c-attribs.c (c_common_attribute_table): Add new attribute
> > >       zero_call_used_regs.
> > >       (handle_zero_call_used_regs_attribute): New function.
> > > 
> > > gcc/c/ChangeLog:
> > > 
> > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > 
> > >       * c-decl.c (merge_decls): Merge zero_call_used_regs_type.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > 
> > >       * c-c++-common/zero-scratch-regs-1.c: New test.
> > >       * c-c++-common/zero-scratch-regs-2.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-1.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-10.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-11.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-12.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-13.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-14.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-15.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-16.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-17.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-18.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-19.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-2.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-20.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-21.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-22.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-23.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-3.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-4.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-5.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-6.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-7.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-8.c: Likewise.
> > >       * gcc.target/i386/zero-scratch-regs-9.c: Likewise.
> > > 
> > > ---
> > > gcc/c-family/c-attribs.c                           |  68 ++++++++++
> > > gcc/c/c-decl.c                                     |   4 +
> > > gcc/common.opt                                     |  23 ++++
> > > gcc/config/i386/i386.c                             |  58 ++++++++-
> > > gcc/config/i386/i386.md                            |   6 +-
> > > gcc/coretypes.h                                    |  10 ++
> > > gcc/doc/extend.texi                                |  11 ++
> > > gcc/doc/invoke.texi                                |  13 +-
> > > gcc/doc/tm.texi                                    |  27 ++++
> > > gcc/doc/tm.texi.in <http://tm.texi.in/>                                 |   8 ++
> > > gcc/function.c                                     | 145 +++++++++++++++++++++
> > > gcc/function.h                                     |   2 +
> > > gcc/target.def                                     |  33 +++++
> > > gcc/targhooks.c                                    |  17 +++
> > > gcc/targhooks.h                                    |   3 +
> > > gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |   3 +
> > > gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |   4 +
> > > .../gcc.target/i386/zero-scratch-regs-1.c          |  12 ++
> > > .../gcc.target/i386/zero-scratch-regs-10.c         |  21 +++
> > > .../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++++
> > > .../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++++
> > > .../gcc.target/i386/zero-scratch-regs-13.c         |  21 +++
> > > .../gcc.target/i386/zero-scratch-regs-14.c         |  19 +++
> > > .../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
> > > .../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
> > > .../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
> > > .../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
> > > .../gcc.target/i386/zero-scratch-regs-19.c         |  12 ++
> > > .../gcc.target/i386/zero-scratch-regs-2.c          |  19 +++
> > > .../gcc.target/i386/zero-scratch-regs-20.c         |  23 ++++
> > > .../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
> > > .../gcc.target/i386/zero-scratch-regs-22.c         |  19 +++
> > > .../gcc.target/i386/zero-scratch-regs-23.c         |  19 +++
> > > .../gcc.target/i386/zero-scratch-regs-3.c          |  12 ++
> > > .../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
> > > .../gcc.target/i386/zero-scratch-regs-5.c          |  20 +++
> > > .../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
> > > .../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
> > > .../gcc.target/i386/zero-scratch-regs-8.c          |  19 +++
> > > .../gcc.target/i386/zero-scratch-regs-9.c          |  15 +++
> > > gcc/toplev.c                                       |   9 ++
> > > gcc/tree-core.h                                    |   6 +-
> > > gcc/tree.h                                         |   5 +
> > > 43 files changed, 866 insertions(+), 7 deletions(-)
> > > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > 
> > > diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> > > index 3721483..cc93d6f 100644
> > > --- a/gcc/c-family/c-attribs.c
> > > +++ b/gcc/c-family/c-attribs.c
> > > @@ -136,6 +136,8 @@ static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
> > > static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> > > static tree ignore_attribute (tree *, tree, tree, int, bool *);
> > > static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
> > > +static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
> > > +                                              bool *);
> > > static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
> > > static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
> > > static tree handle_returns_nonnull_attribute (tree *, tree, tree, int, bool *);
> > > @@ -434,6 +436,9 @@ const struct attribute_spec c_common_attribute_table[] =
> > >                             ignore_attribute, NULL },
> > >  { "no_split_stack",        0, 0, true,  false, false, false,
> > >                             handle_no_split_stack_attribute, NULL },
> > > +  { "zero_call_used_regs",    1, 1, true, false, false, false,
> > > +                           handle_zero_call_used_regs_attribute, NULL },
> > > +
> > >  /* For internal use (marking of builtins and runtime functions) only.
> > >     The name contains space to prevent its usage in source code.  */
> > >  { "fn spec",               1, 1, false, true, true, false,
> > > @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, tree name,
> > >  return NULL_TREE;
> > > }
> > > 
> > > +/* Handle a "zero_call_used_regs" attribute; arguments as in
> > > +   struct attribute_spec.handler.  */
> > > +
> > > +static tree
> > > +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> > > +                                   int ARG_UNUSED (flags),
> > > +                                   bool *no_add_attris)
> > > +{
> > > +  tree decl = *node;
> > > +  tree id = TREE_VALUE (args);
> > > +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> > > +
> > > +  if (TREE_CODE (decl) != FUNCTION_DECL)
> > > +    {
> > > +      error_at (DECL_SOURCE_LOCATION (decl),
> > > +             "%qE attribute applies only to functions", name);
> > > +      *no_add_attris = true;
> > > +      return NULL_TREE;
> > > +    }
> > > +  else if (DECL_INITIAL (decl))
> > > +    {
> > > +      error_at (DECL_SOURCE_LOCATION (decl),
> > > +             "cannot set %qE attribute after definition", name);
> > > +      *no_add_attris = true;
> > > +      return NULL_TREE;
> > > +    }
> > > +
> > > +  if (TREE_CODE (id) != STRING_CST)
> > > +    {
> > > +      error ("attribute %qE arguments not a string", name);
> > > +      *no_add_attris = true;
> > > +      return NULL_TREE;
> > > +    }
> > > +
> > > +  if (!targetm.calls.pro_epilogue_use)
> > > +    {
> > > +      warning (OPT_Wattributes, "%qE attribute directive ignored", name);
> > > +      return NULL_TREE;
> > > +    }
> > > +
> > > +  if (strcmp (TREE_STRING_POINTER (id), "skip") == 0)
> > > +    zero_call_used_regs_type = zero_call_used_regs_skip;
> > > +  else if (strcmp (TREE_STRING_POINTER (id), "used-gpr") == 0)
> > > +    zero_call_used_regs_type = zero_call_used_regs_used_gpr;
> > > +  else if (strcmp (TREE_STRING_POINTER (id), "all-gpr") == 0)
> > > +    zero_call_used_regs_type = zero_call_used_regs_all_gpr;
> > > +  else if (strcmp (TREE_STRING_POINTER (id), "used") == 0)
> > > +    zero_call_used_regs_type = zero_call_used_regs_used;
> > > +  else if (strcmp (TREE_STRING_POINTER (id), "all") == 0)
> > > +    zero_call_used_regs_type = zero_call_used_regs_all;
> > > +  else
> > > +    {
> > > +      error ("attribute %qE argument must be one of %qs, %qs, %qs, %qs, or %qs",
> > > +          name, "skip", "used-gpr", "all-gpr", "used", "all");
> > > +      *no_add_attris = true;
> > > +      return NULL_TREE;
> > > +    }
> > > +
> > > +  DECL_ZERO_CALL_USED_REGS (decl) = zero_call_used_regs_type;
> > > +
> > > +  return NULL_TREE;
> > > +}
> > > +
> > > /* Handle a "returns_nonnull" attribute; arguments as in
> > >   struct attribute_spec.handler.  */
> > > 
> > > diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> > > index 81bd2ee..ded1880 100644
> > > --- a/gcc/c/c-decl.c
> > > +++ b/gcc/c/c-decl.c
> > > @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
> > >         DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
> > >       }
> > > 
> > > +      /* Merge the zero_call_used_regs_type information.  */
> > > +      if (TREE_CODE (newdecl) == FUNCTION_DECL)
> > > +     DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS (olddecl);
> > > +
> > >      /* Merge the storage class information.  */
> > >      merge_weak (newdecl, olddecl);
> > > 
> > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > index df8af36..19900f9 100644
> > > --- a/gcc/common.opt
> > > +++ b/gcc/common.opt
> > > @@ -3083,6 +3083,29 @@ fzero-initialized-in-bss
> > > Common Report Var(flag_zero_initialized_in_bss) Init(1)
> > > Put zero initialized data in the bss section.
> > > 
> > > +fzero-call-used-regs=
> > > +Common Report RejectNegative Joined Enum(zero_call_used_regs) Var(flag_zero_call_used_regs) Init(zero_call_used_regs_skip)
> > > +Clear call-used registers upon function return.
> > > +
> > > +Enum
> > > +Name(zero_call_used_regs) Type(enum zero_call_used_regs)
> > > +Known choices of clearing call-used registers upon function return (for use with the -fzero-call-used-regs= option):
> > > +
> > > +EnumValue
> > > +Enum(zero_call_used_regs) String(skip) Value(zero_call_used_regs_skip)
> > > +
> > > +EnumValue
> > > +Enum(zero_call_used_regs) String(used-gpr) Value(zero_call_used_regs_used_gpr)
> > > +
> > > +EnumValue
> > > +Enum(zero_call_used_regs) String(all-gpr) Value(zero_call_used_regs_all_gpr)
> > > +
> > > +EnumValue
> > > +Enum(zero_call_used_regs) String(used) Value(zero_call_used_regs_used)
> > > +
> > > +EnumValue
> > > +Enum(zero_call_used_regs) String(all) Value(zero_call_used_regs_all)
> > > +
> > > g
> > > Common Driver RejectNegative JoinedOrMissing
> > > Generate debug information in default format.
> > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > index 5c373c0..fd1aa9c 100644
> > > --- a/gcc/config/i386/i386.c
> > > +++ b/gcc/config/i386/i386.c
> > > @@ -3551,6 +3551,48 @@ ix86_function_value_regno_p (const unsigned int regno)
> > >  return false;
> > > }
> > > 
> > > +/* TARGET_ZERO_CALL_USED_REGNO_P.  */
> > > +
> > > +static bool
> > > +ix86_zero_call_used_regno_p (const unsigned int regno,
> > > +                          bool gpr_only)
> > > +{
> > > +  return GENERAL_REGNO_P (regno) || (!gpr_only && SSE_REGNO_P (regno));
> > > +}
> > > +
> > > +/* TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > > +
> > > +static machine_mode
> > > +ix86_zero_call_used_regno_mode (const unsigned int regno, machine_mode)
> > > +{
> > > +  /* NB: We only need to zero the lower 32 bits for integer registers
> > > +     and the lower 128 bits for vector registers since destination are
> > > +     zero-extended to the full register width.  */
> > > +  return GENERAL_REGNO_P (regno) ? SImode : V4SFmode;
> > > +}
> > > +
> > > +/* TARGET_ZERO_ALL_VECTOR_REGISTERS.  */
> > > +
> > > +static rtx
> > > +ix86_zero_all_vector_registers (bool used_only)
> > > +{
> > > +  if (!TARGET_AVX)
> > > +    return NULL;
> > > +
> > > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > > +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> > > +      || (TARGET_64BIT
> > > +          && (REX_SSE_REGNO_P (regno)
> > > +              || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> > > +     && (!this_target_hard_regs->x_call_used_regs[regno]
> > > +         || fixed_regs[regno]
> > > +         || is_live_reg_at_exit (regno)
> > > +         || (used_only && !df_regs_ever_live_p (regno))))
> > > +      return NULL;
> > > +
> > > +  return gen_avx_vzeroall ();
> > > +}
> > > +
> > > /* Define how to find the value returned by a function.
> > >   VALTYPE is the data type of the value (as a tree).
> > >   If the precise function being called is known, FUNC is its FUNCTION_DECL;
> > > @@ -8513,7 +8555,7 @@ ix86_expand_prologue (void)
> > >      insn = emit_insn (gen_set_got (pic));
> > >      RTX_FRAME_RELATED_P (insn) = 1;
> > >      add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
> > > -      emit_insn (gen_prologue_use (pic));
> > > +      emit_insn (gen_pro_epilogue_use (pic));
> > >      /* Deleting already emmitted SET_GOT if exist and allocated to
> > >        REAL_PIC_OFFSET_TABLE_REGNUM.  */
> > >      ix86_elim_entry_set_got (pic);
> > > @@ -8542,7 +8584,7 @@ ix86_expand_prologue (void)
> > >     Further, prevent alloca modifications to the stack pointer from being
> > >     combined with prologue modifications.  */
> > >  if (TARGET_SEH)
> > > -    emit_insn (gen_prologue_use (stack_pointer_rtx));
> > > +    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));
> > > }
> > > 
> > > /* Emit code to restore REG using a POP insn.  */
> > > @@ -23319,6 +23361,18 @@ ix86_run_selftests (void)
> > > #undef TARGET_FUNCTION_VALUE_REGNO_P
> > > #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p
> > > 
> > > +#undef TARGET_ZERO_CALL_USED_REGNO_P
> > > +#define TARGET_ZERO_CALL_USED_REGNO_P ix86_zero_call_used_regno_p
> > > +
> > > +#undef TARGET_ZERO_CALL_USED_REGNO_MODE
> > > +#define TARGET_ZERO_CALL_USED_REGNO_MODE ix86_zero_call_used_regno_mode
> > > +
> > > +#undef TARGET_PRO_EPILOGUE_USE
> > > +#define TARGET_PRO_EPILOGUE_USE gen_pro_epilogue_use
> > > +
> > > +#undef TARGET_ZERO_ALL_VECTOR_REGISTERS
> > > +#define TARGET_ZERO_ALL_VECTOR_REGISTERS ix86_zero_all_vector_registers
> > > +
> > > #undef TARGET_PROMOTE_FUNCTION_MODE
> > > #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
> > > 
> > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > > index d0ecd9e..e7df59f 100644
> > > --- a/gcc/config/i386/i386.md
> > > +++ b/gcc/config/i386/i386.md
> > > @@ -194,7 +194,7 @@
> > >  UNSPECV_STACK_PROBE
> > >  UNSPECV_PROBE_STACK_RANGE
> > >  UNSPECV_ALIGN
> > > -  UNSPECV_PROLOGUE_USE
> > > +  UNSPECV_PRO_EPILOGUE_USE
> > >  UNSPECV_SPLIT_STACK_RETURN
> > >  UNSPECV_CLD
> > >  UNSPECV_NOPS
> > > @@ -13525,8 +13525,8 @@
> > > 
> > > ;; As USE insns aren't meaningful after reload, this is used instead
> > > ;; to prevent deleting instructions setting registers for PIC code
> > > -(define_insn "prologue_use"
> > > -  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]
> > > +(define_insn "pro_epilogue_use"
> > > +  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
> > >  ""
> > >  ""
> > >  [(set_attr "length" "0")])
> > > diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> > > index 6b6cfcd..e56d6ec 100644
> > > --- a/gcc/coretypes.h
> > > +++ b/gcc/coretypes.h
> > > @@ -418,6 +418,16 @@ enum symbol_visibility
> > >  VISIBILITY_INTERNAL
> > > };
> > > 
> > > +/* Zero call-used registers type.  */
> > > +enum zero_call_used_regs {
> > > +  zero_call_used_regs_unset = 0,
> > > +  zero_call_used_regs_skip,
> > > +  zero_call_used_regs_used_gpr,
> > > +  zero_call_used_regs_all_gpr,
> > > +  zero_call_used_regs_used,
> > > +  zero_call_used_regs_all
> > > +};
> > > +
> > > /* enums used by the targetm.excess_precision hook.  */
> > > 
> > > enum flt_eval_method
> > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > > index c800b74..b32c55f 100644
> > > --- a/gcc/doc/extend.texi
> > > +++ b/gcc/doc/extend.texi
> > > @@ -3984,6 +3984,17 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
> > > A declaration to which @code{weakref} is attached and that is associated
> > > with a named @code{target} must be @code{static}.
> > > 
> > > +@item zero_call_used_regs ("@var{choice}")
> > > +@cindex @code{zero_call_used_regs} function attribute
> > > +The @code{zero_call_used_regs} attribute causes the compiler to zero
> > > +call-used registers at function return according to @var{choice}.
> > > +@samp{skip} doesn't zero call-used registers. @samp{used-gpr} zeros
> > > +call-used general purpose registers which are used in funciton.
> > > +@samp{all-gpr} zeros all call-used general purpose registers.
> > > +@samp{used} zeros call-used registers which are used in function.
> > > +@samp{all} zeros all call-used registers.  The default for the
> > > +attribute is controlled by @option{-fzero-call-used-regs}.
> > > +
> > > @end table
> > > 
> > > @c This is the end of the target-independent attribute table
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index 09bcc5b..da02686 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -542,7 +542,7 @@ Objective-C and Objective-C++ Dialects}.
> > > -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
> > > -funsafe-math-optimizations  -funswitch-loops @gol
> > > -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
> > > --fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
> > > +-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
> > > --param @var{name}=@var{value}
> > > -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}
> > > 
> > > @@ -12273,6 +12273,17 @@ int foo (void)
> > > 
> > > Not all targets support this option.
> > > 
> > > +@item -fzero-call-used-regs=@var{choice}
> > > +@opindex fzero-call-used-regs
> > > +Zero call-used registers at function return according to
> > > +@var{choice}.  @samp{skip}, which is the default, doesn't zero
> > > +call-used registers.  @samp{used-gpr} zeros call-used general purpose
> > > +registers which are used in function.  @samp{all-gpr} zeros all
> > > +call-used registers.  @samp{used} zeros call-used registers which
> > > +are used in function.  @samp{all} zeros all call-used registers.  You
> > > +can control this behavior for a specific function by using the function
> > > +attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
> > > +
> > > @item --param @var{name}=@var{value}
> > > @opindex param
> > > In some places, GCC uses various constants to control the amount of
> > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > > index 6e7d9dc..43dddd3 100644
> > > --- a/gcc/doc/tm.texi
> > > +++ b/gcc/doc/tm.texi
> > > @@ -4571,6 +4571,22 @@ should recognize only the caller's register numbers.
> > > If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.
> > > @end deftypefn
> > > 
> > > +@deftypefn {Target Hook} bool TARGET_ZERO_CALL_USED_REGNO_P (const unsigned int @var{regno}, bool @var{general_reg_only_p})
> > > +A target hook that returns @code{true} if @var{regno} is the number of a
> > > +call used register.  If @var{general_reg_only_p} is @code{true},
> > > +@var{regno} must be the number of a hard general register.
> > > +
> > > +If this hook is not defined, then default_zero_call_used_regno_p will be used.
> > > +@end deftypefn
> > > +
> > > +@deftypefn {Target Hook} machine_mode TARGET_ZERO_CALL_USED_REGNO_MODE (const unsigned int @var{regno}, machine_mode @var{mode})
> > > +A target hook that returns a mode of suitable to zero the register for the
> > > +call used register @var{regno} in @var{mode}.
> > > +
> > > +If this hook is not defined, then default_zero_call_used_regno_mode will be
> > > +used.
> > > +@end deftypefn
> > > +
> > > @defmac APPLY_RESULT_SIZE
> > > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > > @@ -12043,6 +12059,17 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
> > > is needed.
> > > @end deftypefn
> > > 
> > > +@deftypefn {Target Hook} rtx TARGET_PRO_EPILOGUE_USE (rtx @var{reg})
> > > +This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
> > > +prevent deleting register setting instructions in proprologue and epilogue.
> > > +@end deftypefn
> > > +
> > > +@deftypefn {Target Hook} rtx TARGET_ZERO_ALL_VECTOR_REGISTERS (bool @var{used_only})
> > > +This hook should return an rtx to zero all vector registers at function
> > > +exit.  If @var{used_only} is @code{true}, only used vector registers should
> > > +be zeroed.  Return @code{NULL} if possible
> > > +@end deftypefn
> > > +
> > > @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
> > > When optimization is disabled, this hook indicates whether or not
> > > arguments should be allocated to stack slots.  Normally, GCC allocates
> > > diff --git a/gcc/doc/tm.texi.in <http://tm.texi.in/> b/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > index 3be984b..bee917a 100644
> > > --- a/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > +++ b/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > @@ -3430,6 +3430,10 @@ for a new target instead.
> > > 
> > > @hook TARGET_FUNCTION_VALUE_REGNO_P
> > > 
> > > +@hook TARGET_ZERO_CALL_USED_REGNO_P
> > > +
> > > +@hook TARGET_ZERO_CALL_USED_REGNO_MODE
> > > +
> > > @defmac APPLY_RESULT_SIZE
> > > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > > @@ -8109,6 +8113,10 @@ and the associated definitions of those functions.
> > > 
> > > @hook TARGET_GET_DRAP_RTX
> > > 
> > > +@hook TARGET_PRO_EPILOGUE_USE
> > > +
> > > +@hook TARGET_ZERO_ALL_VECTOR_REGISTERS
> > > +
> > > @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
> > > 
> > > @hook TARGET_CONST_ANCHOR
> > > diff --git a/gcc/function.c b/gcc/function.c
> > > index 9eee9b5..9908530 100644
> > > --- a/gcc/function.c
> > > +++ b/gcc/function.c
> > > @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
> > > #include "emit-rtl.h"
> > > #include "recog.h"
> > > #include "rtl-error.h"
> > > +#include "hard-reg-set.h"
> > > #include "alias.h"
> > > #include "fold-const.h"
> > > #include "stor-layout.h"
> > > @@ -5808,6 +5809,147 @@ make_prologue_seq (void)
> > >  return seq;
> > > }
> > > 
> > > +/* Check whether the hard register REGNO is live at the exit block
> > > + * of the current routine.  */
> > > +bool
> > > +is_live_reg_at_exit (unsigned int regno)
> > > +{
> > > +  edge e;
> > > +  edge_iterator ei;
> > > +
> > > +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> > > +    {
> > > +      bitmap live_out = df_get_live_out (e->src);
> > > +      if (REGNO_REG_SET_P (live_out, regno))
> > > +     return true;
> > > +    }
> > > +
> > > +  return false;
> > > +}
> > > +
> > > +/* Emit a sequence of insns to zero the call-used-registers for the current
> > > + * function.  */
> > > +
> > > +static void
> > > +gen_call_used_regs_seq (void)
> > > +{
> > > +  if (!targetm.calls.pro_epilogue_use)
> > > +    return;
> > > +
> > > +  bool gpr_only = true;
> > > +  bool used_only = true;
> > > +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> > > +
> > > +  if (flag_zero_call_used_regs)
> > > +    if (DECL_ZERO_CALL_USED_REGS (current_function_decl)
> > > +     == zero_call_used_regs_unset)
> > > +      zero_call_used_regs_type = flag_zero_call_used_regs;
> > > +    else
> > > +      zero_call_used_regs_type
> > > +     = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> > > +  else
> > > +    zero_call_used_regs_type = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> > > +
> > > +  /* No need to zero call-used-regs when no user request is present.  */
> > > +  if (zero_call_used_regs_type <= zero_call_used_regs_skip)
> > > +    return;
> > > +
> > > +  /* No need to zero call-used-regs in main ().  */
> > > +  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
> > > +    return;
> > > +
> > > +  /* No need to zero call-used-regs if __builtin_eh_return is called
> > > +     since it isn't a normal function return.  */
> > > +  if (crtl->calls_eh_return)
> > > +    return;
> > > +
> > > +  /* If gpr_only is true, only zero call-used-registers that are
> > > +     general-purpose registers; if used_only is true, only zero
> > > +     call-used-registers that are used in the current function.  */
> > > +  switch (zero_call_used_regs_type)
> > > +    {
> > > +      case zero_call_used_regs_all_gpr:
> > > +     used_only = false;
> > > +     break;
> > > +      case zero_call_used_regs_used:
> > > +     gpr_only = false;
> > > +     break;
> > > +      case zero_call_used_regs_all:
> > > +     gpr_only = false;
> > > +     used_only = false;
> > > +     break;
> > > +      default:
> > > +     break;
> > > +    }
> > > +
> > > +  /* An optimization to use a single hard insn to zero all vector registers on
> > > +     the target that provides such insn.  */
> > > +  if (!gpr_only
> > > +      && targetm.calls.zero_all_vector_registers)
> > > +    {
> > > +      rtx zero_all_vec_insn
> > > +     = targetm.calls.zero_all_vector_registers (used_only);
> > > +      if (zero_all_vec_insn)
> > > +     {
> > > +       emit_insn (zero_all_vec_insn);
> > > +       gpr_only = true;
> > > +     }
> > > +    }
> > > +
> > > +  /* For each of the hard registers, check to see whether we should zero it if:
> > > +     1. it is a call-used-registers;
> > > + and 2. it is not a fixed-registers;
> > > + and 3. it is not live at the end of the routine;
> > > + and 4. it is general purpose register if gpr_only is true;
> > > + and 5. it is used in the routine if used_only is true;
> > > +   */
> > > +
> > > +  /* This array holds the zero rtx with the correponding machine mode.  */
> > > +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
> > > +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> > > +    zero_rtx[i] = NULL_RTX;
> > > +
> > > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > > +    {
> > > +      if (!this_target_hard_regs->x_call_used_regs[regno])
> > > +     continue;
> > > +      if (fixed_regs[regno])
> > > +     continue;
> > > +      if (is_live_reg_at_exit (regno))
> > > +     continue;
> > > +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
> > > +     continue;
> > > +      if (used_only && !df_regs_ever_live_p (regno))
> > > +     continue;
> > > +
> > > +      /* Now we can emit insn to zero this register.  */
> > > +      rtx reg, tmp;
> > > +
> > > +      machine_mode mode
> > > +     = targetm.calls.zero_call_used_regno_mode (regno,
> > > +                                                reg_raw_mode[regno]);
> > > +      if (mode == VOIDmode)
> > > +     continue;
> > > +      if (!have_regs_of_mode[mode])
> > > +     continue;
> > > +
> > > +      reg = gen_rtx_REG (mode, regno);
> > > +      if (zero_rtx[(int)mode] == NULL_RTX)
> > > +     {
> > > +       zero_rtx[(int)mode] = reg;
> > > +       tmp = gen_rtx_SET (reg, const0_rtx);
> > > +       emit_insn (tmp);
> > > +     }
> > > +      else
> > > +     emit_move_insn (reg, zero_rtx[(int)mode]);
> > > +
> > > +      emit_insn (targetm.calls.pro_epilogue_use (reg));
> > > +    }
> > > +
> > > +  return;
> > > +}
> > > +
> > > +
> > > /* Return a sequence to be used as the epilogue for the current function,
> > >   or NULL.  */
> > > 
> > > @@ -5819,6 +5961,9 @@ make_epilogue_seq (void)
> > > 
> > >  start_sequence ();
> > >  emit_note (NOTE_INSN_EPILOGUE_BEG);
> > > +
> > > +  gen_call_used_regs_seq ();
> > > +
> > >  rtx_insn *seq = targetm.gen_epilogue ();
> > >  if (seq)
> > >    emit_jump_insn (seq);
> > > diff --git a/gcc/function.h b/gcc/function.h
> > > index d55cbdd..fc36c3e 100644
> > > --- a/gcc/function.h
> > > +++ b/gcc/function.h
> > > @@ -705,4 +705,6 @@ extern const char *current_function_name (void);
> > > 
> > > extern void used_types_insert (tree);
> > > 
> > > +extern bool is_live_reg_at_exit (unsigned int);
> > > +
> > > #endif  /* GCC_FUNCTION_H */
> > > diff --git a/gcc/target.def b/gcc/target.def
> > > index 07059a8..8aab63e 100644
> > > --- a/gcc/target.def
> > > +++ b/gcc/target.def
> > > @@ -5022,6 +5022,26 @@ If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.",
> > > default_function_value_regno_p)
> > > 
> > > DEFHOOK
> > > +(zero_call_used_regno_p,
> > > + "A target hook that returns @code{true} if @var{regno} is the number of a\n\
> > > +call used register.  If @var{general_reg_only_p} is @code{true},\n\
> > > +@var{regno} must be the number of a hard general register.\n\
> > > +\n\
> > > +If this hook is not defined, then default_zero_call_used_regno_p will be used.",
> > > + bool, (const unsigned int regno, bool general_reg_only_p),
> > > + default_zero_call_used_regno_p)
> > > +
> > > +DEFHOOK
> > > +(zero_call_used_regno_mode,
> > > + "A target hook that returns a mode of suitable to zero the register for the\n\
> > > +call used register @var{regno} in @var{mode}.\n\
> > > +\n\
> > > +If this hook is not defined, then default_zero_call_used_regno_mode will be\n\
> > > +used.",
> > > + machine_mode, (const unsigned int regno, machine_mode mode),
> > > + default_zero_call_used_regno_mode)
> > > +
> > > +DEFHOOK
> > > (fntype_abi,
> > > "Return the ABI used by a function with type @var{type}; see the\n\
> > > definition of @code{predefined_function_abi} for details of the ABI\n\
> > > @@ -5068,6 +5088,19 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
> > > is needed.",
> > > rtx, (void), NULL)
> > > 
> > > +DEFHOOK
> > > +(pro_epilogue_use,
> > > + "This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to\n\
> > > +prevent deleting register setting instructions in proprologue and epilogue.",
> > > + rtx, (rtx reg), NULL)
> > > +
> > > +DEFHOOK
> > > +(zero_all_vector_registers,
> > > + "This hook should return an rtx to zero all vector registers at function\n\
> > > +exit.  If @var{used_only} is @code{true}, only used vector registers should\n\
> > > +be zeroed.  Return @code{NULL} if possible",
> > > + rtx, (bool used_only), NULL)
> > > +
> > > /* Return true if all function parameters should be spilled to the
> > >   stack.  */
> > > DEFHOOK
> > > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> > > index 0113c7b..ed02173 100644
> > > --- a/gcc/targhooks.c
> > > +++ b/gcc/targhooks.c
> > > @@ -987,6 +987,23 @@ default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
> > > #endif
> > > }
> > > 
> > > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_P.  */
> > > +
> > > +bool
> > > +default_zero_call_used_regno_p (const unsigned int,
> > > +                             bool)
> > > +{
> > > +  return false;
> > > +}
> > > +
> > > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > > +
> > > +machine_mode
> > > +default_zero_call_used_regno_mode (const unsigned int, machine_mode mode)
> > > +{
> > > +  return mode;
> > > +}
> > > +
> > > rtx
> > > default_internal_arg_pointer (void)
> > > {
> > > diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> > > index b572a36..370df19 100644
> > > --- a/gcc/targhooks.h
> > > +++ b/gcc/targhooks.h
> > > @@ -162,6 +162,9 @@ extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
> > > extern rtx default_function_value (const_tree, const_tree, bool);
> > > extern rtx default_libcall_value (machine_mode, const_rtx);
> > > extern bool default_function_value_regno_p (const unsigned int);
> > > +extern bool default_zero_call_used_regno_p (const unsigned int, bool);
> > > +extern machine_mode default_zero_call_used_regno_mode (const unsigned int,
> > > +                                                    machine_mode);
> > > extern rtx default_internal_arg_pointer (void);
> > > extern rtx default_static_chain (const_tree, bool);
> > > extern void default_trampoline_init (rtx, tree, rtx);
> > > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > new file mode 100644
> > > index 0000000..3c2ac72
> > > --- /dev/null
> > > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > @@ -0,0 +1,3 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > +/* { dg-error "'-fzero-call-used-regs=' is not supported for this target" "" { target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> > > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > new file mode 100644
> > > index 0000000..acf48c4
> > > --- /dev/null
> > > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > @@ -0,0 +1,4 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2" } */
> > > +
> > > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr"))); /* { dg-warning " attribute directive ignored" "" {target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > new file mode 100644
> > > index 0000000..9f61dc4
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > @@ -0,0 +1,12 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > new file mode 100644
> > > index 0000000..09048e5
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > @@ -0,0 +1,21 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > +
> > > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
> > > +
> > > +int
> > > +foo (int x)
> > > +{
> > > +  return x;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > new file mode 100644
> > > index 0000000..4862688
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > @@ -0,0 +1,39 @@
> > > +/* { dg-do run { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > > +
> > > +struct S { int i; };
> > > +__attribute__((const, noinline, noclone))
> > > +struct S foo (int x)
> > > +{
> > > +  struct S s;
> > > +  s.i = x;
> > > +  return s;
> > > +}
> > > +
> > > +int a[2048], b[2048], c[2048], d[2048];
> > > +struct S e[2048];
> > > +
> > > +__attribute__((noinline, noclone)) void
> > > +bar (void)
> > > +{
> > > +  int i;
> > > +  for (i = 0; i < 1024; i++)
> > > +    {
> > > +      e[i] = foo (i);
> > > +      a[i+2] = a[i] + a[i+1];
> > > +      b[10] = b[10] + i;
> > > +      c[i] = c[2047 - i];
> > > +      d[i] = d[i + 1];
> > > +    }
> > > +}
> > > +
> > > +int
> > > +main ()
> > > +{
> > > +  int i;
> > > +  bar ();
> > > +  for (i = 0; i < 1024; i++)
> > > +    if (e[i].i != i)
> > > +      __builtin_abort ();
> > > +  return 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > new file mode 100644
> > > index 0000000..500251b
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > @@ -0,0 +1,39 @@
> > > +/* { dg-do run { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > +
> > > +struct S { int i; };
> > > +__attribute__((const, noinline, noclone))
> > > +struct S foo (int x)
> > > +{
> > > +  struct S s;
> > > +  s.i = x;
> > > +  return s;
> > > +}
> > > +
> > > +int a[2048], b[2048], c[2048], d[2048];
> > > +struct S e[2048];
> > > +
> > > +__attribute__((noinline, noclone)) void
> > > +bar (void)
> > > +{
> > > +  int i;
> > > +  for (i = 0; i < 1024; i++)
> > > +    {
> > > +      e[i] = foo (i);
> > > +      a[i+2] = a[i] + a[i+1];
> > > +      b[10] = b[10] + i;
> > > +      c[i] = c[2047 - i];
> > > +      d[i] = d[i + 1];
> > > +    }
> > > +}
> > > +
> > > +int
> > > +main ()
> > > +{
> > > +  int i;
> > > +  bar ();
> > > +  for (i = 0; i < 1024; i++)
> > > +    if (e[i].i != i)
> > > +      __builtin_abort ();
> > > +  return 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > new file mode 100644
> > > index 0000000..8b058e3
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > @@ -0,0 +1,21 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > new file mode 100644
> > > index 0000000..d4eaaf7
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > @@ -0,0 +1,19 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > new file mode 100644
> > > index 0000000..dd3bb90
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > @@ -0,0 +1,14 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > +
> > > +extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > new file mode 100644
> > > index 0000000..e2274f6
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > @@ -0,0 +1,14 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> > > +
> > > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > new file mode 100644
> > > index 0000000..7f5d153
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > @@ -0,0 +1,13 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > +
> > > +int
> > > +foo (int x)
> > > +{
> > > +  return x;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > new file mode 100644
> > > index 0000000..fe13d2b
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > @@ -0,0 +1,13 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > > +
> > > +float
> > > +foo (float z, float y, float x)
> > > +{
> > > +  return x + y;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > new file mode 100644
> > > index 0000000..205a532
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > @@ -0,0 +1,12 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > > +
> > > +float
> > > +foo (float z, float y, float x)
> > > +{
> > > +  return x;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > new file mode 100644
> > > index 0000000..e046684
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > @@ -0,0 +1,19 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > new file mode 100644
> > > index 0000000..4be8ff6
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > @@ -0,0 +1,23 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > > +
> > > +float
> > > +foo (float z, float y, float x)
> > > +{
> > > +  return x + y;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
> > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > new file mode 100644
> > > index 0000000..0eb34e0
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > @@ -0,0 +1,14 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
> > > +
> > > +__attribute__ ((zero_call_used_regs("used")))
> > > +float
> > > +foo (float z, float y, float x)
> > > +{
> > > +  return x + y;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > new file mode 100644
> > > index 0000000..cbb63a4
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > @@ -0,0 +1,19 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > new file mode 100644
> > > index 0000000..7573197
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > @@ -0,0 +1,19 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > new file mode 100644
> > > index 0000000..de71223
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > @@ -0,0 +1,12 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > new file mode 100644
> > > index 0000000..ccfa441
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > @@ -0,0 +1,14 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > +
> > > +extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > new file mode 100644
> > > index 0000000..6b46ca3
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > @@ -0,0 +1,20 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > +
> > > +__attribute__ ((zero_call_used_regs("all-gpr")))
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > new file mode 100644
> > > index 0000000..0680f38
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > @@ -0,0 +1,14 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > +
> > > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > > +
> > > +void
> > > +foo (void)
> > > +{
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > new file mode 100644
> > > index 0000000..534defa
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > @@ -0,0 +1,13 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > > +
> > > +int
> > > +foo (int x)
> > > +{
> > > +  return x;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > new file mode 100644
> > > index 0000000..477bb19
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > @@ -0,0 +1,19 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > +
> > > +int
> > > +foo (int x)
> > > +{
> > > +  return x;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > new file mode 100644
> > > index 0000000..a305a60
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > @@ -0,0 +1,15 @@
> > > +/* { dg-do compile { target *-*-linux* } } */
> > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > +
> > > +extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
> > > +
> > > +int
> > > +foo (int x)
> > > +{
> > > +  return x;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > diff --git a/gcc/toplev.c b/gcc/toplev.c
> > > index 95eea63..01a1f24 100644
> > > --- a/gcc/toplev.c
> > > +++ b/gcc/toplev.c
> > > @@ -1464,6 +1464,15 @@ process_options (void)
> > >       }
> > >    }
> > > 
> > > +  if (flag_zero_call_used_regs != zero_call_used_regs_skip
> > > +      && !targetm.calls.pro_epilogue_use)
> > > +    {
> > > +      error_at (UNKNOWN_LOCATION,
> > > +             "%<-fzero-call-used-regs=%> is not supported for this "
> > > +             "target");
> > > +      flag_zero_call_used_regs = zero_call_used_regs_skip;
> > > +    }
> > > +
> > >  /* One region RA really helps to decrease the code size.  */
> > >  if (flag_ira_region == IRA_REGION_AUTODETECT)
> > >    flag_ira_region
> > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > > index 8c5a2e3..71badbd 100644
> > > --- a/gcc/tree-core.h
> > > +++ b/gcc/tree-core.h
> > > @@ -1825,7 +1825,11 @@ struct GTY(()) tree_decl_with_vis {
> > > unsigned final : 1;
> > > /* Belong to FUNCTION_DECL exclusively.  */
> > > unsigned regdecl_flag : 1;
> > > - /* 14 unused bits. */
> > > +
> > > + /* How to clear call-used registers upon function return.  */
> > > + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
> > > +
> > > + /* 11 unused bits.  */
> > > };
> > > 
> > > struct GTY(()) tree_var_decl {
> > > diff --git a/gcc/tree.h b/gcc/tree.h
> > > index cf546ed..d378a88 100644
> > > --- a/gcc/tree.h
> > > +++ b/gcc/tree.h
> > > @@ -2925,6 +2925,11 @@ extern void decl_value_expr_insert (tree, tree);
> > > #define DECL_VISIBILITY(NODE) \
> > >  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.visibility)
> > > 
> > > +/* Value of the function decl's type of zeroing the call used
> > > +   registers upon return from function.  */
> > > +#define DECL_ZERO_CALL_USED_REGS(NODE) \
> > > +  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.zero_call_used_regs_type)
> > > +
> > > /* Nonzero means that the decl (or an enclosing scope) had its
> > >   visibility specified rather than being inferred.  */
> > > #define DECL_VISIBILITY_SPECIFIED(NODE) \
> > > -- 
> > > 1.9.1
> >
> 


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-03 15:42             ` Qing Zhao
@ 2020-08-04  7:35               ` Richard Biener
  2020-08-04 18:23                 ` H.J. Lu
  2020-08-05 21:35                 ` Qing Zhao
  0 siblings, 2 replies; 188+ messages in thread
From: Richard Biener @ 2020-08-04  7:35 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Uros Bizjak, H. J. Lu, Jakub Jelinek, GCC Patches, Kees Cook,
	Rodriguez Bahena, Victor

On Mon, 3 Aug 2020, Qing Zhao wrote:

> Hi, Uros,
> 
> Thanks a lot for your review on X86 parts.
> 
> Hi, Richard,
> 
> Could you please take a look at the middle-end part to see whether the 
> rewritten addressed your previous concern?

I have a few comments below - I'm not sure I'm qualified to fully
review the rest though.

> Thanks a lot.
> 
> Qing
> 
> 
> > On Jul 31, 2020, at 12:57 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> > 
> > 
> > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
> > >
> > >
> > > Richard and Uros,
> > >
> > > Could you please review the change that H.J and I rewrote based on your comments in the previous round of discussion?
> > >
> > > This patch is a nice security enhancement for GCC that has been requested by security people for quite some time.  
> > >
> > > Thanks a lot for your time.
> > 
> > I'll be away from the keyboard for the next week, but the patch needs a middle end approval first.
> > 
> > That said, x86 parts looks OK.
> > 
> > 
> 
> > Uros.
> > > Qing
> > >
> > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > > 
> > > > Hi, Gcc team,
> > > > 
> > > > This patch is a follow-up on the previous patch and corresponding discussion:
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html> <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
> > > > 
> > > > From the previous round of discussion, the major issues raised were:
> > > > 
> > > > A. should be rewritten by using regsets infrastructure.  
> > > > B. Put the patch into middle-end instead of x86 backend. 
> > > > 
> > > > This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
> > > > 
> > > > 1. Change the names of the option and attribute from 
> > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > to:
> > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”) 
> > > > Add the new option and  new attribute in general. 
> > > > 2. The main code generation part is moved from i386 backend to middle-end;
> > > > 3. Add 4 target-hooks;
> > > > 4. Implement these 4 target-hooks on i386 backend. 
> > > > 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
> > > > 
> > > > The patch is as following:
> > > > 
> > > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > command-line option and
> > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> > > > 
> > > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > > 
> > > >  Don't zero call-used registers upon function return.

Does a return via EH unwinding also constitute a function return?  I
think you may want to have a finally handler or support in the unwinder
for this?  Then there's abnormal return via longjmp & friends, I guess
there's nothing that can be done there besides patching glibc?

In general I am missing reasoning as why to use -fzero-call-used-regs=
in the documentation, that is, what is the thread model and what are
the guarantees?  Is there any point zeroing registers when spill slots
are left populated with stale register contents?  How do I (and why
would I want to?) ensure that there's no information leak from the
implementation of 'foo' to their callers?  Do I need to compile all
of 'foo' and functions called from 'foo' with -fzero-call-used-regs=
or is it enough to annotate API boundaries I want to proptect with
zero_call_used_regs("...")?

Again - what's the intended use (and how does it fulful anything useful
for that case)?

> > > >  2. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")
> > > > 
> > > >  Zero used call-used general purpose registers upon function return.
> > > > 
> > > >  3. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")
> > > > 
> > > >  Zero all call-used general purpose registers upon function return.
> > > > 
> > > >  4. -fzero-call-used-regs=used and zero_call_used_regs("used")
> > > > 
> > > >  Zero used call-used registers upon function return.
> > > > 
> > > >  5. -fzero-call-used-regs=all and zero_call_used_regs("all")
> > > > 
> > > >  Zero all call-used registers upon function return.
> > > > 
> > > > The feature is implemented in middle-end. But currently is only valid on X86.
> > > > 
> > > > Tested on x86-64 and aarch64 with bootstrapping GCC trunk, making
> > > > -fzero-call-used-regs=used-gpr, -fzero-call-used-regs=all-gpr
> > > > -fzero-call-used-regs=used, and -fzero-call-used-regs=all enabled
> > > > by default on x86-64.
> > > > 
> > > > Please take a look and let me know any more comment?
> > > > 
> > > > thanks.
> > > > 
> > > > Qing
> > > > 
> > > > 
> > > > ====================================
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > 
> > > >       * common.opt: Add new option -fzero-call-used-regs.
> > > >       * config/i386/i386.c (ix86_zero_call_used_regno_p): New function.
> > > >       (ix86_zero_call_used_regno_mode): Likewise.
> > > >       (ix86_zero_all_vector_registers): Likewise.
> > > >       (ix86_expand_prologue): Replace gen_prologue_use with
> > > >       gen_pro_epilogue_use.
> > > >       (TARGET_ZERO_CALL_USED_REGNO_P): Define.
> > > >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Define.
> > > >       (TARGET_PRO_EPILOGUE_USE): Define.
> > > >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Define.
> > > >       * config/i386/i386.md: Replace UNSPECV_PROLOGUE_USE
> > > >       with UNSPECV_PRO_EPILOGUE_USE.
> > > >       * coretypes.h (enum zero_call_used_regs): New type.
> > > >       * doc/extend.texi: Document the new zero_call_used_regs attribute.
> > > >       * doc/invoke.texi: Document the new -fzero-call-used-regs option.
> > > >       * doc/tm.texi: Regenerate.
> > > >       * doc/tm.texi.in <http://tm.texi.in/> (TARGET_ZERO_CALL_USED_REGNO_P): New hook.
> > > >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Likewise.
> > > >       (TARGET_PRO_EPILOGUE_USE): Likewise.
> > > >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Likewise.
> > > >       * function.c (is_live_reg_at_exit): New function.
> > > >       (gen_call_used_regs_seq): Likewise.
> > > >       (make_epilogue_seq): Call gen_call_used_regs_seq.
> > > >       * function.h (is_live_reg_at_exit): Declare.
> > > >       * target.def (zero_call_used_regno_p): New hook.
> > > >       (zero_call_used_regno_mode): Likewise.
> > > >       (pro_epilogue_use): Likewise.
> > > >       (zero_all_vector_registers): Likewise.
> > > >       * targhooks.c (default_zero_call_used_regno_p): New function.
> > > >       (default_zero_call_used_regno_mode): Likewise.
> > > >       * targhooks.h (default_zero_call_used_regno_p): Declare.
> > > >       (default_zero_call_used_regno_mode): Declare.
> > > >       * toplev.c (process_options): Issue errors when -fzero-call-used-regs
> > > >       is used on targets that do not support it.
> > > >       * tree-core.h (struct tree_decl_with_vis): New field 
> > > >       zero_call_used_regs_type.
> > > >       * tree.h (DECL_ZERO_CALL_USED_REGS): New macro.
> > > > 
> > > > gcc/c-family/ChangeLog:
> > > > 
> > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > 
> > > >       * c-attribs.c (c_common_attribute_table): Add new attribute
> > > >       zero_call_used_regs.
> > > >       (handle_zero_call_used_regs_attribute): New function.
> > > > 
> > > > gcc/c/ChangeLog:
> > > > 
> > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > 
> > > >       * c-decl.c (merge_decls): Merge zero_call_used_regs_type.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > 
> > > >       * c-c++-common/zero-scratch-regs-1.c: New test.
> > > >       * c-c++-common/zero-scratch-regs-2.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-1.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-10.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-11.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-12.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-13.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-14.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-15.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-16.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-17.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-18.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-19.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-2.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-20.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-21.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-22.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-23.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-3.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-4.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-5.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-6.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-7.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-8.c: Likewise.
> > > >       * gcc.target/i386/zero-scratch-regs-9.c: Likewise.
> > > > 
> > > > ---
> > > > gcc/c-family/c-attribs.c                           |  68 ++++++++++
> > > > gcc/c/c-decl.c                                     |   4 +
> > > > gcc/common.opt                                     |  23 ++++
> > > > gcc/config/i386/i386.c                             |  58 ++++++++-
> > > > gcc/config/i386/i386.md                            |   6 +-
> > > > gcc/coretypes.h                                    |  10 ++
> > > > gcc/doc/extend.texi                                |  11 ++
> > > > gcc/doc/invoke.texi                                |  13 +-
> > > > gcc/doc/tm.texi                                    |  27 ++++
> > > > gcc/doc/tm.texi.in <http://tm.texi.in/>                                 |   8 ++
> > > > gcc/function.c                                     | 145 +++++++++++++++++++++
> > > > gcc/function.h                                     |   2 +
> > > > gcc/target.def                                     |  33 +++++
> > > > gcc/targhooks.c                                    |  17 +++
> > > > gcc/targhooks.h                                    |   3 +
> > > > gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |   3 +
> > > > gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |   4 +
> > > > .../gcc.target/i386/zero-scratch-regs-1.c          |  12 ++
> > > > .../gcc.target/i386/zero-scratch-regs-10.c         |  21 +++
> > > > .../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++++
> > > > .../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++++
> > > > .../gcc.target/i386/zero-scratch-regs-13.c         |  21 +++
> > > > .../gcc.target/i386/zero-scratch-regs-14.c         |  19 +++
> > > > .../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
> > > > .../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
> > > > .../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
> > > > .../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
> > > > .../gcc.target/i386/zero-scratch-regs-19.c         |  12 ++
> > > > .../gcc.target/i386/zero-scratch-regs-2.c          |  19 +++
> > > > .../gcc.target/i386/zero-scratch-regs-20.c         |  23 ++++
> > > > .../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
> > > > .../gcc.target/i386/zero-scratch-regs-22.c         |  19 +++
> > > > .../gcc.target/i386/zero-scratch-regs-23.c         |  19 +++
> > > > .../gcc.target/i386/zero-scratch-regs-3.c          |  12 ++
> > > > .../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
> > > > .../gcc.target/i386/zero-scratch-regs-5.c          |  20 +++
> > > > .../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
> > > > .../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
> > > > .../gcc.target/i386/zero-scratch-regs-8.c          |  19 +++
> > > > .../gcc.target/i386/zero-scratch-regs-9.c          |  15 +++
> > > > gcc/toplev.c                                       |   9 ++
> > > > gcc/tree-core.h                                    |   6 +-
> > > > gcc/tree.h                                         |   5 +
> > > > 43 files changed, 866 insertions(+), 7 deletions(-)
> > > > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > > 
> > > > diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> > > > index 3721483..cc93d6f 100644
> > > > --- a/gcc/c-family/c-attribs.c
> > > > +++ b/gcc/c-family/c-attribs.c
> > > > @@ -136,6 +136,8 @@ static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
> > > > static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> > > > static tree ignore_attribute (tree *, tree, tree, int, bool *);
> > > > static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
> > > > +static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
> > > > +                                              bool *);
> > > > static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
> > > > static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
> > > > static tree handle_returns_nonnull_attribute (tree *, tree, tree, int, bool *);
> > > > @@ -434,6 +436,9 @@ const struct attribute_spec c_common_attribute_table[] =
> > > >                             ignore_attribute, NULL },
> > > >  { "no_split_stack",        0, 0, true,  false, false, false,
> > > >                             handle_no_split_stack_attribute, NULL },
> > > > +  { "zero_call_used_regs",    1, 1, true, false, false, false,
> > > > +                           handle_zero_call_used_regs_attribute, NULL },
> > > > +
> > > >  /* For internal use (marking of builtins and runtime functions) only.
> > > >     The name contains space to prevent its usage in source code.  */
> > > >  { "fn spec",               1, 1, false, true, true, false,
> > > > @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, tree name,
> > > >  return NULL_TREE;
> > > > }
> > > > 
> > > > +/* Handle a "zero_call_used_regs" attribute; arguments as in
> > > > +   struct attribute_spec.handler.  */
> > > > +
> > > > +static tree
> > > > +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> > > > +                                   int ARG_UNUSED (flags),
> > > > +                                   bool *no_add_attris)
> > > > +{
> > > > +  tree decl = *node;
> > > > +  tree id = TREE_VALUE (args);
> > > > +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> > > > +
> > > > +  if (TREE_CODE (decl) != FUNCTION_DECL)
> > > > +    {
> > > > +      error_at (DECL_SOURCE_LOCATION (decl),
> > > > +             "%qE attribute applies only to functions", name);
> > > > +      *no_add_attris = true;
> > > > +      return NULL_TREE;
> > > > +    }
> > > > +  else if (DECL_INITIAL (decl))
> > > > +    {
> > > > +      error_at (DECL_SOURCE_LOCATION (decl),
> > > > +             "cannot set %qE attribute after definition", name);

Why's that?

> > > > +      *no_add_attris = true;
> > > > +      return NULL_TREE;
> > > > +    }
> > > > +
> > > > +  if (TREE_CODE (id) != STRING_CST)
> > > > +    {
> > > > +      error ("attribute %qE arguments not a string", name);
> > > > +      *no_add_attris = true;
> > > > +      return NULL_TREE;
> > > > +    }
> > > > +
> > > > +  if (!targetm.calls.pro_epilogue_use)
> > > > +    {
> > > > +      warning (OPT_Wattributes, "%qE attribute directive ignored", name);
> > > > +      return NULL_TREE;
> > > > +    }
> > > > +
> > > > +  if (strcmp (TREE_STRING_POINTER (id), "skip") == 0)
> > > > +    zero_call_used_regs_type = zero_call_used_regs_skip;
> > > > +  else if (strcmp (TREE_STRING_POINTER (id), "used-gpr") == 0)
> > > > +    zero_call_used_regs_type = zero_call_used_regs_used_gpr;
> > > > +  else if (strcmp (TREE_STRING_POINTER (id), "all-gpr") == 0)
> > > > +    zero_call_used_regs_type = zero_call_used_regs_all_gpr;
> > > > +  else if (strcmp (TREE_STRING_POINTER (id), "used") == 0)
> > > > +    zero_call_used_regs_type = zero_call_used_regs_used;
> > > > +  else if (strcmp (TREE_STRING_POINTER (id), "all") == 0)
> > > > +    zero_call_used_regs_type = zero_call_used_regs_all;
> > > > +  else
> > > > +    {
> > > > +      error ("attribute %qE argument must be one of %qs, %qs, %qs, %qs, or %qs",
> > > > +          name, "skip", "used-gpr", "all-gpr", "used", "all");
> > > > +      *no_add_attris = true;
> > > > +      return NULL_TREE;
> > > > +    }
> > > > +
> > > > +  DECL_ZERO_CALL_USED_REGS (decl) = zero_call_used_regs_type;
> > > > +
> > > > +  return NULL_TREE;
> > > > +}
> > > > +
> > > > /* Handle a "returns_nonnull" attribute; arguments as in
> > > >   struct attribute_spec.handler.  */
> > > > 
> > > > diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> > > > index 81bd2ee..ded1880 100644
> > > > --- a/gcc/c/c-decl.c
> > > > +++ b/gcc/c/c-decl.c
> > > > @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
> > > >         DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
> > > >       }
> > > > 
> > > > +      /* Merge the zero_call_used_regs_type information.  */
> > > > +      if (TREE_CODE (newdecl) == FUNCTION_DECL)
> > > > +     DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS (olddecl);
> > > > +

If you need this (see below) then likely cp/* needs similar adjustment
so do other places in the middle-end (function cloning, etc)

> > > >      /* Merge the storage class information.  */
> > > >      merge_weak (newdecl, olddecl);
> > > > 
> > > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > > index df8af36..19900f9 100644
> > > > --- a/gcc/common.opt
> > > > +++ b/gcc/common.opt
> > > > @@ -3083,6 +3083,29 @@ fzero-initialized-in-bss
> > > > Common Report Var(flag_zero_initialized_in_bss) Init(1)
> > > > Put zero initialized data in the bss section.
> > > > 
> > > > +fzero-call-used-regs=
> > > > +Common Report RejectNegative Joined Enum(zero_call_used_regs) Var(flag_zero_call_used_regs) Init(zero_call_used_regs_skip)
> > > > +Clear call-used registers upon function return.
> > > > +
> > > > +Enum
> > > > +Name(zero_call_used_regs) Type(enum zero_call_used_regs)
> > > > +Known choices of clearing call-used registers upon function return (for use with the -fzero-call-used-regs= option):
> > > > +
> > > > +EnumValue
> > > > +Enum(zero_call_used_regs) String(skip) Value(zero_call_used_regs_skip)
> > > > +
> > > > +EnumValue
> > > > +Enum(zero_call_used_regs) String(used-gpr) Value(zero_call_used_regs_used_gpr)
> > > > +
> > > > +EnumValue
> > > > +Enum(zero_call_used_regs) String(all-gpr) Value(zero_call_used_regs_all_gpr)
> > > > +
> > > > +EnumValue
> > > > +Enum(zero_call_used_regs) String(used) Value(zero_call_used_regs_used)
> > > > +
> > > > +EnumValue
> > > > +Enum(zero_call_used_regs) String(all) Value(zero_call_used_regs_all)
> > > > +
> > > > g
> > > > Common Driver RejectNegative JoinedOrMissing
> > > > Generate debug information in default format.
> > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > index 5c373c0..fd1aa9c 100644
> > > > --- a/gcc/config/i386/i386.c
> > > > +++ b/gcc/config/i386/i386.c
> > > > @@ -3551,6 +3551,48 @@ ix86_function_value_regno_p (const unsigned int regno)
> > > >  return false;
> > > > }
> > > > 
> > > > +/* TARGET_ZERO_CALL_USED_REGNO_P.  */
> > > > +
> > > > +static bool
> > > > +ix86_zero_call_used_regno_p (const unsigned int regno,
> > > > +                          bool gpr_only)
> > > > +{
> > > > +  return GENERAL_REGNO_P (regno) || (!gpr_only && SSE_REGNO_P (regno));
> > > > +}
> > > > +
> > > > +/* TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > > > +
> > > > +static machine_mode
> > > > +ix86_zero_call_used_regno_mode (const unsigned int regno, machine_mode)
> > > > +{
> > > > +  /* NB: We only need to zero the lower 32 bits for integer registers
> > > > +     and the lower 128 bits for vector registers since destination are
> > > > +     zero-extended to the full register width.  */
> > > > +  return GENERAL_REGNO_P (regno) ? SImode : V4SFmode;
> > > > +}
> > > > +
> > > > +/* TARGET_ZERO_ALL_VECTOR_REGISTERS.  */
> > > > +
> > > > +static rtx
> > > > +ix86_zero_all_vector_registers (bool used_only)
> > > > +{
> > > > +  if (!TARGET_AVX)
> > > > +    return NULL;
> > > > +
> > > > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > > > +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> > > > +      || (TARGET_64BIT
> > > > +          && (REX_SSE_REGNO_P (regno)
> > > > +              || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> > > > +     && (!this_target_hard_regs->x_call_used_regs[regno]
> > > > +         || fixed_regs[regno]
> > > > +         || is_live_reg_at_exit (regno)
> > > > +         || (used_only && !df_regs_ever_live_p (regno))))
> > > > +      return NULL;
> > > > +
> > > > +  return gen_avx_vzeroall ();
> > > > +}
> > > > +
> > > > /* Define how to find the value returned by a function.
> > > >   VALTYPE is the data type of the value (as a tree).
> > > >   If the precise function being called is known, FUNC is its FUNCTION_DECL;
> > > > @@ -8513,7 +8555,7 @@ ix86_expand_prologue (void)
> > > >      insn = emit_insn (gen_set_got (pic));
> > > >      RTX_FRAME_RELATED_P (insn) = 1;
> > > >      add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
> > > > -      emit_insn (gen_prologue_use (pic));
> > > > +      emit_insn (gen_pro_epilogue_use (pic));
> > > >      /* Deleting already emmitted SET_GOT if exist and allocated to
> > > >        REAL_PIC_OFFSET_TABLE_REGNUM.  */
> > > >      ix86_elim_entry_set_got (pic);
> > > > @@ -8542,7 +8584,7 @@ ix86_expand_prologue (void)
> > > >     Further, prevent alloca modifications to the stack pointer from being
> > > >     combined with prologue modifications.  */
> > > >  if (TARGET_SEH)
> > > > -    emit_insn (gen_prologue_use (stack_pointer_rtx));
> > > > +    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));
> > > > }
> > > > 
> > > > /* Emit code to restore REG using a POP insn.  */
> > > > @@ -23319,6 +23361,18 @@ ix86_run_selftests (void)
> > > > #undef TARGET_FUNCTION_VALUE_REGNO_P
> > > > #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p
> > > > 
> > > > +#undef TARGET_ZERO_CALL_USED_REGNO_P
> > > > +#define TARGET_ZERO_CALL_USED_REGNO_P ix86_zero_call_used_regno_p
> > > > +
> > > > +#undef TARGET_ZERO_CALL_USED_REGNO_MODE
> > > > +#define TARGET_ZERO_CALL_USED_REGNO_MODE ix86_zero_call_used_regno_mode
> > > > +
> > > > +#undef TARGET_PRO_EPILOGUE_USE
> > > > +#define TARGET_PRO_EPILOGUE_USE gen_pro_epilogue_use
> > > > +
> > > > +#undef TARGET_ZERO_ALL_VECTOR_REGISTERS
> > > > +#define TARGET_ZERO_ALL_VECTOR_REGISTERS ix86_zero_all_vector_registers
> > > > +
> > > > #undef TARGET_PROMOTE_FUNCTION_MODE
> > > > #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
> > > > 
> > > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > > > index d0ecd9e..e7df59f 100644
> > > > --- a/gcc/config/i386/i386.md
> > > > +++ b/gcc/config/i386/i386.md
> > > > @@ -194,7 +194,7 @@
> > > >  UNSPECV_STACK_PROBE
> > > >  UNSPECV_PROBE_STACK_RANGE
> > > >  UNSPECV_ALIGN
> > > > -  UNSPECV_PROLOGUE_USE
> > > > +  UNSPECV_PRO_EPILOGUE_USE
> > > >  UNSPECV_SPLIT_STACK_RETURN
> > > >  UNSPECV_CLD
> > > >  UNSPECV_NOPS
> > > > @@ -13525,8 +13525,8 @@
> > > > 
> > > > ;; As USE insns aren't meaningful after reload, this is used instead
> > > > ;; to prevent deleting instructions setting registers for PIC code
> > > > -(define_insn "prologue_use"
> > > > -  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]
> > > > +(define_insn "pro_epilogue_use"
> > > > +  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
> > > >  ""
> > > >  ""
> > > >  [(set_attr "length" "0")])
> > > > diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> > > > index 6b6cfcd..e56d6ec 100644
> > > > --- a/gcc/coretypes.h
> > > > +++ b/gcc/coretypes.h
> > > > @@ -418,6 +418,16 @@ enum symbol_visibility
> > > >  VISIBILITY_INTERNAL
> > > > };
> > > > 
> > > > +/* Zero call-used registers type.  */
> > > > +enum zero_call_used_regs {
> > > > +  zero_call_used_regs_unset = 0,
> > > > +  zero_call_used_regs_skip,
> > > > +  zero_call_used_regs_used_gpr,
> > > > +  zero_call_used_regs_all_gpr,
> > > > +  zero_call_used_regs_used,
> > > > +  zero_call_used_regs_all
> > > > +};
> > > > +
> > > > /* enums used by the targetm.excess_precision hook.  */
> > > > 
> > > > enum flt_eval_method
> > > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > > > index c800b74..b32c55f 100644
> > > > --- a/gcc/doc/extend.texi
> > > > +++ b/gcc/doc/extend.texi
> > > > @@ -3984,6 +3984,17 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
> > > > A declaration to which @code{weakref} is attached and that is associated
> > > > with a named @code{target} must be @code{static}.
> > > > 
> > > > +@item zero_call_used_regs ("@var{choice}")
> > > > +@cindex @code{zero_call_used_regs} function attribute
> > > > +The @code{zero_call_used_regs} attribute causes the compiler to zero
> > > > +call-used registers at function return according to @var{choice}.
> > > > +@samp{skip} doesn't zero call-used registers. @samp{used-gpr} zeros
> > > > +call-used general purpose registers which are used in funciton.
> > > > +@samp{all-gpr} zeros all call-used general purpose registers.
> > > > +@samp{used} zeros call-used registers which are used in function.
> > > > +@samp{all} zeros all call-used registers.  The default for the
> > > > +attribute is controlled by @option{-fzero-call-used-regs}.
> > > > +
> > > > @end table
> > > > 
> > > > @c This is the end of the target-independent attribute table
> > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > > index 09bcc5b..da02686 100644
> > > > --- a/gcc/doc/invoke.texi
> > > > +++ b/gcc/doc/invoke.texi
> > > > @@ -542,7 +542,7 @@ Objective-C and Objective-C++ Dialects}.
> > > > -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
> > > > -funsafe-math-optimizations  -funswitch-loops @gol
> > > > -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
> > > > --fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
> > > > +-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
> > > > --param @var{name}=@var{value}
> > > > -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}
> > > > 
> > > > @@ -12273,6 +12273,17 @@ int foo (void)
> > > > 
> > > > Not all targets support this option.
> > > > 
> > > > +@item -fzero-call-used-regs=@var{choice}
> > > > +@opindex fzero-call-used-regs
> > > > +Zero call-used registers at function return according to
> > > > +@var{choice}.  @samp{skip}, which is the default, doesn't zero
> > > > +call-used registers.  @samp{used-gpr} zeros call-used general purpose
> > > > +registers which are used in function.  @samp{all-gpr} zeros all
> > > > +call-used registers.  @samp{used} zeros call-used registers which
> > > > +are used in function.  @samp{all} zeros all call-used registers.  You
> > > > +can control this behavior for a specific function by using the function
> > > > +attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
> > > > +
> > > > @item --param @var{name}=@var{value}
> > > > @opindex param
> > > > In some places, GCC uses various constants to control the amount of
> > > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > > > index 6e7d9dc..43dddd3 100644
> > > > --- a/gcc/doc/tm.texi
> > > > +++ b/gcc/doc/tm.texi
> > > > @@ -4571,6 +4571,22 @@ should recognize only the caller's register numbers.
> > > > If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.
> > > > @end deftypefn
> > > > 
> > > > +@deftypefn {Target Hook} bool TARGET_ZERO_CALL_USED_REGNO_P (const unsigned int @var{regno}, bool @var{general_reg_only_p})
> > > > +A target hook that returns @code{true} if @var{regno} is the number of a
> > > > +call used register.  If @var{general_reg_only_p} is @code{true},
> > > > +@var{regno} must be the number of a hard general register.
> > > > +
> > > > +If this hook is not defined, then default_zero_call_used_regno_p will be used.
> > > > +@end deftypefn
> > > > +
> > > > +@deftypefn {Target Hook} machine_mode TARGET_ZERO_CALL_USED_REGNO_MODE (const unsigned int @var{regno}, machine_mode @var{mode})
> > > > +A target hook that returns a mode of suitable to zero the register for the
> > > > +call used register @var{regno} in @var{mode}.
> > > > +
> > > > +If this hook is not defined, then default_zero_call_used_regno_mode will be
> > > > +used.
> > > > +@end deftypefn
> > > > +
> > > > @defmac APPLY_RESULT_SIZE
> > > > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > > > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > > > @@ -12043,6 +12059,17 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
> > > > is needed.
> > > > @end deftypefn
> > > > 
> > > > +@deftypefn {Target Hook} rtx TARGET_PRO_EPILOGUE_USE (rtx @var{reg})
> > > > +This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
> > > > +prevent deleting register setting instructions in proprologue and epilogue.
> > > > +@end deftypefn
> > > > +
> > > > +@deftypefn {Target Hook} rtx TARGET_ZERO_ALL_VECTOR_REGISTERS (bool @var{used_only})
> > > > +This hook should return an rtx to zero all vector registers at function
> > > > +exit.  If @var{used_only} is @code{true}, only used vector registers should
> > > > +be zeroed.  Return @code{NULL} if possible
> > > > +@end deftypefn
> > > > +
> > > > @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
> > > > When optimization is disabled, this hook indicates whether or not
> > > > arguments should be allocated to stack slots.  Normally, GCC allocates
> > > > diff --git a/gcc/doc/tm.texi.in <http://tm.texi.in/> b/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > > index 3be984b..bee917a 100644
> > > > --- a/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > > +++ b/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > > @@ -3430,6 +3430,10 @@ for a new target instead.
> > > > 
> > > > @hook TARGET_FUNCTION_VALUE_REGNO_P
> > > > 
> > > > +@hook TARGET_ZERO_CALL_USED_REGNO_P
> > > > +
> > > > +@hook TARGET_ZERO_CALL_USED_REGNO_MODE
> > > > +
> > > > @defmac APPLY_RESULT_SIZE
> > > > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > > > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > > > @@ -8109,6 +8113,10 @@ and the associated definitions of those functions.
> > > > 
> > > > @hook TARGET_GET_DRAP_RTX
> > > > 
> > > > +@hook TARGET_PRO_EPILOGUE_USE
> > > > +
> > > > +@hook TARGET_ZERO_ALL_VECTOR_REGISTERS
> > > > +
> > > > @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
> > > > 
> > > > @hook TARGET_CONST_ANCHOR
> > > > diff --git a/gcc/function.c b/gcc/function.c
> > > > index 9eee9b5..9908530 100644
> > > > --- a/gcc/function.c
> > > > +++ b/gcc/function.c
> > > > @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
> > > > #include "emit-rtl.h"
> > > > #include "recog.h"
> > > > #include "rtl-error.h"
> > > > +#include "hard-reg-set.h"
> > > > #include "alias.h"
> > > > #include "fold-const.h"
> > > > #include "stor-layout.h"
> > > > @@ -5808,6 +5809,147 @@ make_prologue_seq (void)
> > > >  return seq;
> > > > }
> > > > 
> > > > +/* Check whether the hard register REGNO is live at the exit block
> > > > + * of the current routine.  */
> > > > +bool
> > > > +is_live_reg_at_exit (unsigned int regno)
> > > > +{
> > > > +  edge e;
> > > > +  edge_iterator ei;
> > > > +
> > > > +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> > > > +    {
> > > > +      bitmap live_out = df_get_live_out (e->src);
> > > > +      if (REGNO_REG_SET_P (live_out, regno))
> > > > +     return true;
> > > > +    }
> > > > +
> > > > +  return false;
> > > > +}
> > > > +
> > > > +/* Emit a sequence of insns to zero the call-used-registers for the current
> > > > + * function.  */

No '*' on the continuation line

> > > > +
> > > > +static void
> > > > +gen_call_used_regs_seq (void)
> > > > +{
> > > > +  if (!targetm.calls.pro_epilogue_use)
> > > > +    return;
> > > > +
> > > > +  bool gpr_only = true;
> > > > +  bool used_only = true;
> > > > +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> > > > +
> > > > +  if (flag_zero_call_used_regs)
> > > > +    if (DECL_ZERO_CALL_USED_REGS (current_function_decl)
> > > > +     == zero_call_used_regs_unset)
> > > > +      zero_call_used_regs_type = flag_zero_call_used_regs;
> > > > +    else
> > > > +      zero_call_used_regs_type
> > > > +     = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> > > > +  else
> > > > +    zero_call_used_regs_type = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> > > > +
> > > > +  /* No need to zero call-used-regs when no user request is present.  */
> > > > +  if (zero_call_used_regs_type <= zero_call_used_regs_skip)
> > > > +    return;
> > > > +
> > > > +  /* No need to zero call-used-regs in main ().  */
> > > > +  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
> > > > +    return;
> > > > +
> > > > +  /* No need to zero call-used-regs if __builtin_eh_return is called
> > > > +     since it isn't a normal function return.  */
> > > > +  if (crtl->calls_eh_return)
> > > > +    return;
> > > > +
> > > > +  /* If gpr_only is true, only zero call-used-registers that are
> > > > +     general-purpose registers; if used_only is true, only zero
> > > > +     call-used-registers that are used in the current function.  */
> > > > +  switch (zero_call_used_regs_type)
> > > > +    {
> > > > +      case zero_call_used_regs_all_gpr:
> > > > +     used_only = false;
> > > > +     break;
> > > > +      case zero_call_used_regs_used:
> > > > +     gpr_only = false;
> > > > +     break;
> > > > +      case zero_call_used_regs_all:
> > > > +     gpr_only = false;
> > > > +     used_only = false;
> > > > +     break;
> > > > +      default:
> > > > +     break;
> > > > +    }
> > > > +
> > > > +  /* An optimization to use a single hard insn to zero all vector registers on
> > > > +     the target that provides such insn.  */
> > > > +  if (!gpr_only
> > > > +      && targetm.calls.zero_all_vector_registers)
> > > > +    {
> > > > +      rtx zero_all_vec_insn
> > > > +     = targetm.calls.zero_all_vector_registers (used_only);
> > > > +      if (zero_all_vec_insn)
> > > > +     {
> > > > +       emit_insn (zero_all_vec_insn);
> > > > +       gpr_only = true;
> > > > +     }
> > > > +    }
> > > > +
> > > > +  /* For each of the hard registers, check to see whether we should zero it if:
> > > > +     1. it is a call-used-registers;
> > > > + and 2. it is not a fixed-registers;
> > > > + and 3. it is not live at the end of the routine;
> > > > + and 4. it is general purpose register if gpr_only is true;
> > > > + and 5. it is used in the routine if used_only is true;
> > > > +   */
> > > > +
> > > > +  /* This array holds the zero rtx with the correponding machine mode.  */
> > > > +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
> > > > +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> > > > +    zero_rtx[i] = NULL_RTX;
> > > > +
> > > > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > > > +    {
> > > > +      if (!this_target_hard_regs->x_call_used_regs[regno])

Use if (!call_used_regs[regno])

> > > > +     continue;
> > > > +      if (fixed_regs[regno])
> > > > +     continue;
> > > > +      if (is_live_reg_at_exit (regno))
> > > > +     continue;

How can a call-used reg be live at exit?

> > > > +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
> > > > +     continue;

Why does the target need some extra say here?

> > > > +      if (used_only && !df_regs_ever_live_p (regno))

So I suppose this does not include uses by callees of this function?

> > > > +     continue;
> > > > +
> > > > +      /* Now we can emit insn to zero this register.  */
> > > > +      rtx reg, tmp;
> > > > +
> > > > +      machine_mode mode
> > > > +     = targetm.calls.zero_call_used_regno_mode (regno,
> > > > +                                                reg_raw_mode[regno]);

In what case does the target ever need to adjust this (we're dealing
with hard-regs only?)?

> > > > +      if (mode == VOIDmode)
> > > > +     continue;
> > > > +      if (!have_regs_of_mode[mode])
> > > > +     continue;

When does this happen?

> > > > +
> > > > +      reg = gen_rtx_REG (mode, regno);
> > > > +      if (zero_rtx[(int)mode] == NULL_RTX)
> > > > +     {
> > > > +       zero_rtx[(int)mode] = reg;
> > > > +       tmp = gen_rtx_SET (reg, const0_rtx);
> > > > +       emit_insn (tmp);
> > > > +     }
> > > > +      else
> > > > +     emit_move_insn (reg, zero_rtx[(int)mode]);

Not sure but I think the canonical zero to use is CONST0_RTX (mode)
but I may be wrong.  I'd rather have the target be able to specify
some special instruction for zeroing here.  Some may have
multi-reg set instructions for example.  That said, can't we
defer the actual zeroing to the target in full and only compute
a hard-reg-set of to-be zerored registers here and pass that
to a target hook?

> > > > +
> > > > +      emit_insn (targetm.calls.pro_epilogue_use (reg));
> > > > +    }
> > > > +
> > > > +  return;
> > > > +}
> > > > +
> > > > +
> > > > /* Return a sequence to be used as the epilogue for the current function,
> > > >   or NULL.  */
> > > > 
> > > > @@ -5819,6 +5961,9 @@ make_epilogue_seq (void)
> > > > 
> > > >  start_sequence ();
> > > >  emit_note (NOTE_INSN_EPILOGUE_BEG);
> > > > +
> > > > +  gen_call_used_regs_seq ();
> > > > +

The caller eventually performs shrink-wrapping - are you sure that
doesn't mess up things?

> > > >  rtx_insn *seq = targetm.gen_epilogue ();
> > > >  if (seq)
> > > >    emit_jump_insn (seq);
> > > > diff --git a/gcc/function.h b/gcc/function.h
> > > > index d55cbdd..fc36c3e 100644
> > > > --- a/gcc/function.h
> > > > +++ b/gcc/function.h
> > > > @@ -705,4 +705,6 @@ extern const char *current_function_name (void);
> > > > 
> > > > extern void used_types_insert (tree);
> > > > 
> > > > +extern bool is_live_reg_at_exit (unsigned int);
> > > > +
> > > > #endif  /* GCC_FUNCTION_H */
> > > > diff --git a/gcc/target.def b/gcc/target.def
> > > > index 07059a8..8aab63e 100644
> > > > --- a/gcc/target.def
> > > > +++ b/gcc/target.def
> > > > @@ -5022,6 +5022,26 @@ If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.",
> > > > default_function_value_regno_p)
> > > > 
> > > > DEFHOOK
> > > > +(zero_call_used_regno_p,
> > > > + "A target hook that returns @code{true} if @var{regno} is the number of a\n\
> > > > +call used register.  If @var{general_reg_only_p} is @code{true},\n\
> > > > +@var{regno} must be the number of a hard general register.\n\
> > > > +\n\
> > > > +If this hook is not defined, then default_zero_call_used_regno_p will be used.",
> > > > + bool, (const unsigned int regno, bool general_reg_only_p),
> > > > + default_zero_call_used_regno_p)
> > > > +
> > > > +DEFHOOK
> > > > +(zero_call_used_regno_mode,
> > > > + "A target hook that returns a mode of suitable to zero the register for the\n\
> > > > +call used register @var{regno} in @var{mode}.\n\
> > > > +\n\
> > > > +If this hook is not defined, then default_zero_call_used_regno_mode will be\n\
> > > > +used.",
> > > > + machine_mode, (const unsigned int regno, machine_mode mode),
> > > > + default_zero_call_used_regno_mode)
> > > > +
> > > > +DEFHOOK
> > > > (fntype_abi,
> > > > "Return the ABI used by a function with type @var{type}; see the\n\
> > > > definition of @code{predefined_function_abi} for details of the ABI\n\
> > > > @@ -5068,6 +5088,19 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
> > > > is needed.",
> > > > rtx, (void), NULL)
> > > > 
> > > > +DEFHOOK
> > > > +(pro_epilogue_use,
> > > > + "This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to\n\
> > > > +prevent deleting register setting instructions in proprologue and epilogue.",
> > > > + rtx, (rtx reg), NULL)
> > > > +
> > > > +DEFHOOK
> > > > +(zero_all_vector_registers,
> > > > + "This hook should return an rtx to zero all vector registers at function\n\
> > > > +exit.  If @var{used_only} is @code{true}, only used vector registers should\n\
> > > > +be zeroed.  Return @code{NULL} if possible",
> > > > + rtx, (bool used_only), NULL)
> > > > +
> > > > /* Return true if all function parameters should be spilled to the
> > > >   stack.  */
> > > > DEFHOOK
> > > > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> > > > index 0113c7b..ed02173 100644
> > > > --- a/gcc/targhooks.c
> > > > +++ b/gcc/targhooks.c
> > > > @@ -987,6 +987,23 @@ default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
> > > > #endif
> > > > }
> > > > 
> > > > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_P.  */
> > > > +
> > > > +bool
> > > > +default_zero_call_used_regno_p (const unsigned int,
> > > > +                             bool)
> > > > +{
> > > > +  return false;
> > > > +}
> > > > +
> > > > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > > > +
> > > > +machine_mode
> > > > +default_zero_call_used_regno_mode (const unsigned int, machine_mode mode)
> > > > +{
> > > > +  return mode;
> > > > +}
> > > > +
> > > > rtx
> > > > default_internal_arg_pointer (void)
> > > > {
> > > > diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> > > > index b572a36..370df19 100644
> > > > --- a/gcc/targhooks.h
> > > > +++ b/gcc/targhooks.h
> > > > @@ -162,6 +162,9 @@ extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
> > > > extern rtx default_function_value (const_tree, const_tree, bool);
> > > > extern rtx default_libcall_value (machine_mode, const_rtx);
> > > > extern bool default_function_value_regno_p (const unsigned int);
> > > > +extern bool default_zero_call_used_regno_p (const unsigned int, bool);
> > > > +extern machine_mode default_zero_call_used_regno_mode (const unsigned int,
> > > > +                                                    machine_mode);
> > > > extern rtx default_internal_arg_pointer (void);
> > > > extern rtx default_static_chain (const_tree, bool);
> > > > extern void default_trampoline_init (rtx, tree, rtx);
> > > > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > > new file mode 100644
> > > > index 0000000..3c2ac72
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > > @@ -0,0 +1,3 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > > +/* { dg-error "'-fzero-call-used-regs=' is not supported for this target" "" { target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> > > > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > > new file mode 100644
> > > > index 0000000..acf48c4
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > > @@ -0,0 +1,4 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2" } */
> > > > +
> > > > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr"))); /* { dg-warning " attribute directive ignored" "" {target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > > new file mode 100644
> > > > index 0000000..9f61dc4
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > > @@ -0,0 +1,12 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > > new file mode 100644
> > > > index 0000000..09048e5
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > > @@ -0,0 +1,21 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > +
> > > > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
> > > > +
> > > > +int
> > > > +foo (int x)
> > > > +{
> > > > +  return x;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > > new file mode 100644
> > > > index 0000000..4862688
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > > @@ -0,0 +1,39 @@
> > > > +/* { dg-do run { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > > > +
> > > > +struct S { int i; };
> > > > +__attribute__((const, noinline, noclone))
> > > > +struct S foo (int x)
> > > > +{
> > > > +  struct S s;
> > > > +  s.i = x;
> > > > +  return s;
> > > > +}
> > > > +
> > > > +int a[2048], b[2048], c[2048], d[2048];
> > > > +struct S e[2048];
> > > > +
> > > > +__attribute__((noinline, noclone)) void
> > > > +bar (void)
> > > > +{
> > > > +  int i;
> > > > +  for (i = 0; i < 1024; i++)
> > > > +    {
> > > > +      e[i] = foo (i);
> > > > +      a[i+2] = a[i] + a[i+1];
> > > > +      b[10] = b[10] + i;
> > > > +      c[i] = c[2047 - i];
> > > > +      d[i] = d[i + 1];
> > > > +    }
> > > > +}
> > > > +
> > > > +int
> > > > +main ()
> > > > +{
> > > > +  int i;
> > > > +  bar ();
> > > > +  for (i = 0; i < 1024; i++)
> > > > +    if (e[i].i != i)
> > > > +      __builtin_abort ();
> > > > +  return 0;
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > > new file mode 100644
> > > > index 0000000..500251b
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > > @@ -0,0 +1,39 @@
> > > > +/* { dg-do run { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > +
> > > > +struct S { int i; };
> > > > +__attribute__((const, noinline, noclone))
> > > > +struct S foo (int x)
> > > > +{
> > > > +  struct S s;
> > > > +  s.i = x;
> > > > +  return s;
> > > > +}
> > > > +
> > > > +int a[2048], b[2048], c[2048], d[2048];
> > > > +struct S e[2048];
> > > > +
> > > > +__attribute__((noinline, noclone)) void
> > > > +bar (void)
> > > > +{
> > > > +  int i;
> > > > +  for (i = 0; i < 1024; i++)
> > > > +    {
> > > > +      e[i] = foo (i);
> > > > +      a[i+2] = a[i] + a[i+1];
> > > > +      b[10] = b[10] + i;
> > > > +      c[i] = c[2047 - i];
> > > > +      d[i] = d[i + 1];
> > > > +    }
> > > > +}
> > > > +
> > > > +int
> > > > +main ()
> > > > +{
> > > > +  int i;
> > > > +  bar ();
> > > > +  for (i = 0; i < 1024; i++)
> > > > +    if (e[i].i != i)
> > > > +      __builtin_abort ();
> > > > +  return 0;
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > > new file mode 100644
> > > > index 0000000..8b058e3
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > > @@ -0,0 +1,21 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > > new file mode 100644
> > > > index 0000000..d4eaaf7
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > > @@ -0,0 +1,19 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > > new file mode 100644
> > > > index 0000000..dd3bb90
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > > @@ -0,0 +1,14 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > +
> > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > > new file mode 100644
> > > > index 0000000..e2274f6
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > > @@ -0,0 +1,14 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> > > > +
> > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > > new file mode 100644
> > > > index 0000000..7f5d153
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > > @@ -0,0 +1,13 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > > +
> > > > +int
> > > > +foo (int x)
> > > > +{
> > > > +  return x;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > > new file mode 100644
> > > > index 0000000..fe13d2b
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > > @@ -0,0 +1,13 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > > > +
> > > > +float
> > > > +foo (float z, float y, float x)
> > > > +{
> > > > +  return x + y;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > > new file mode 100644
> > > > index 0000000..205a532
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > > @@ -0,0 +1,12 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > > > +
> > > > +float
> > > > +foo (float z, float y, float x)
> > > > +{
> > > > +  return x;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > > new file mode 100644
> > > > index 0000000..e046684
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > > @@ -0,0 +1,19 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > > new file mode 100644
> > > > index 0000000..4be8ff6
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > > @@ -0,0 +1,23 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > > > +
> > > > +float
> > > > +foo (float z, float y, float x)
> > > > +{
> > > > +  return x + y;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > > new file mode 100644
> > > > index 0000000..0eb34e0
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > > @@ -0,0 +1,14 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
> > > > +
> > > > +__attribute__ ((zero_call_used_regs("used")))
> > > > +float
> > > > +foo (float z, float y, float x)
> > > > +{
> > > > +  return x + y;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > > new file mode 100644
> > > > index 0000000..cbb63a4
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > > @@ -0,0 +1,19 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > > new file mode 100644
> > > > index 0000000..7573197
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > > @@ -0,0 +1,19 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > > new file mode 100644
> > > > index 0000000..de71223
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > > @@ -0,0 +1,12 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > > new file mode 100644
> > > > index 0000000..ccfa441
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > > @@ -0,0 +1,14 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > +
> > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > > new file mode 100644
> > > > index 0000000..6b46ca3
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > > @@ -0,0 +1,20 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > +
> > > > +__attribute__ ((zero_call_used_regs("all-gpr")))
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > > new file mode 100644
> > > > index 0000000..0680f38
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > > @@ -0,0 +1,14 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > +
> > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > > > +
> > > > +void
> > > > +foo (void)
> > > > +{
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > > new file mode 100644
> > > > index 0000000..534defa
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > > @@ -0,0 +1,13 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > > > +
> > > > +int
> > > > +foo (int x)
> > > > +{
> > > > +  return x;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > > new file mode 100644
> > > > index 0000000..477bb19
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > > @@ -0,0 +1,19 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > +
> > > > +int
> > > > +foo (int x)
> > > > +{
> > > > +  return x;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > > new file mode 100644
> > > > index 0000000..a305a60
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > > @@ -0,0 +1,15 @@
> > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > +
> > > > +extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
> > > > +
> > > > +int
> > > > +foo (int x)
> > > > +{
> > > > +  return x;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > > diff --git a/gcc/toplev.c b/gcc/toplev.c
> > > > index 95eea63..01a1f24 100644
> > > > --- a/gcc/toplev.c
> > > > +++ b/gcc/toplev.c
> > > > @@ -1464,6 +1464,15 @@ process_options (void)
> > > >       }
> > > >    }
> > > > 
> > > > +  if (flag_zero_call_used_regs != zero_call_used_regs_skip
> > > > +      && !targetm.calls.pro_epilogue_use)
> > > > +    {
> > > > +      error_at (UNKNOWN_LOCATION,
> > > > +             "%<-fzero-call-used-regs=%> is not supported for this "
> > > > +             "target");
> > > > +      flag_zero_call_used_regs = zero_call_used_regs_skip;
> > > > +    }
> > > > +
> > > >  /* One region RA really helps to decrease the code size.  */
> > > >  if (flag_ira_region == IRA_REGION_AUTODETECT)
> > > >    flag_ira_region
> > > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > > > index 8c5a2e3..71badbd 100644
> > > > --- a/gcc/tree-core.h
> > > > +++ b/gcc/tree-core.h
> > > > @@ -1825,7 +1825,11 @@ struct GTY(()) tree_decl_with_vis {
> > > > unsigned final : 1;
> > > > /* Belong to FUNCTION_DECL exclusively.  */
> > > > unsigned regdecl_flag : 1;
> > > > - /* 14 unused bits. */
> > > > +
> > > > + /* How to clear call-used registers upon function return.  */
> > > > + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
> > > > +
> > > > + /* 11 unused bits.  */

So instead of wasting "precious" bits please use lookup_attribute
in the single place you query this value (which is once per function).
There's no need to complicate matters by trying to maintain the above.

> > > > };
> > > > 
> > > > struct GTY(()) tree_var_decl {
> > > > diff --git a/gcc/tree.h b/gcc/tree.h
> > > > index cf546ed..d378a88 100644
> > > > --- a/gcc/tree.h
> > > > +++ b/gcc/tree.h
> > > > @@ -2925,6 +2925,11 @@ extern void decl_value_expr_insert (tree, tree);
> > > > #define DECL_VISIBILITY(NODE) \
> > > >  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.visibility)
> > > > 
> > > > +/* Value of the function decl's type of zeroing the call used
> > > > +   registers upon return from function.  */
> > > > +#define DECL_ZERO_CALL_USED_REGS(NODE) \
> > > > +  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.zero_call_used_regs_type)
> > > > +
> > > > /* Nonzero means that the decl (or an enclosing scope) had its
> > > >   visibility specified rather than being inferred.  */
> > > > #define DECL_VISIBILITY_SPECIFIED(NODE) \
> > > > -- 
> > > > 1.9.1
> > >
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-04  7:35               ` Richard Biener
@ 2020-08-04 18:23                 ` H.J. Lu
  2020-08-05  7:06                   ` Richard Biener
  2020-08-05 21:35                 ` Qing Zhao
  1 sibling, 1 reply; 188+ messages in thread
From: H.J. Lu @ 2020-08-04 18:23 UTC (permalink / raw)
  To: Richard Biener
  Cc: Qing Zhao, Uros Bizjak, Jakub Jelinek, GCC Patches, Kees Cook,
	Rodriguez Bahena, Victor

On Tue, Aug 4, 2020 at 12:35 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Mon, 3 Aug 2020, Qing Zhao wrote:
>
> > Hi, Uros,
> >
> > Thanks a lot for your review on X86 parts.
> >
> > Hi, Richard,
> >
> > Could you please take a look at the middle-end part to see whether the
> > rewritten addressed your previous concern?
>
> I have a few comments below - I'm not sure I'm qualified to fully
> review the rest though.
>
> > Thanks a lot.
> >
> > Qing
> >
> >
> > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > >
> > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
> > > >
> > > >
> > > > Richard and Uros,
> > > >
> > > > Could you please review the change that H.J and I rewrote based on your comments in the previous round of discussion?
> > > >
> > > > This patch is a nice security enhancement for GCC that has been requested by security people for quite some time.
> > > >
> > > > Thanks a lot for your time.
> > >
> > > I'll be away from the keyboard for the next week, but the patch needs a middle end approval first.
> > >
> > > That said, x86 parts looks OK.
> > >
> > >
> >
> > > Uros.
> > > > Qing
> > > >
> > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > > >
> > > > > Hi, Gcc team,
> > > > >
> > > > > This patch is a follow-up on the previous patch and corresponding discussion:
> > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html> <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
> > > > >
> > > > > From the previous round of discussion, the major issues raised were:
> > > > >
> > > > > A. should be rewritten by using regsets infrastructure.
> > > > > B. Put the patch into middle-end instead of x86 backend.
> > > > >
> > > > > This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
> > > > >
> > > > > 1. Change the names of the option and attribute from
> > > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > to:
> > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > Add the new option and  new attribute in general.
> > > > > 2. The main code generation part is moved from i386 backend to middle-end;
> > > > > 3. Add 4 target-hooks;
> > > > > 4. Implement these 4 target-hooks on i386 backend.
> > > > > 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
> > > > >
> > > > > The patch is as following:
> > > > >
> > > > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > > command-line option and
> > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> > > > >
> > > > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > > >
> > > > >  Don't zero call-used registers upon function return.
>
> Does a return via EH unwinding also constitute a function return?  I
> think you may want to have a finally handler or support in the unwinder
> for this?  Then there's abnormal return via longjmp & friends, I guess
> there's nothing that can be done there besides patching glibc?

Abnormal returns, like EH unwinding and longjmp, aren't covered by this
patch. Only normal returns are covered.

> In general I am missing reasoning as why to use -fzero-call-used-regs=
> in the documentation, that is, what is the thread model and what are
> the guarantees?  Is there any point zeroing registers when spill slots
> are left populated with stale register contents?  How do I (and why
> would I want to?) ensure that there's no information leak from the
> implementation of 'foo' to their callers?  Do I need to compile all
> of 'foo' and functions called from 'foo' with -fzero-call-used-regs=
> or is it enough to annotate API boundaries I want to proptect with
> zero_call_used_regs("...")?
>
> Again - what's the intended use (and how does it fulful anything useful
> for that case)?
>
> > > > >  2. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")
> > > > >
> > > > >  Zero used call-used general purpose registers upon function return.
> > > > >
> > > > >  3. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")
> > > > >
> > > > >  Zero all call-used general purpose registers upon function return.
> > > > >
> > > > >  4. -fzero-call-used-regs=used and zero_call_used_regs("used")
> > > > >
> > > > >  Zero used call-used registers upon function return.
> > > > >
> > > > >  5. -fzero-call-used-regs=all and zero_call_used_regs("all")
> > > > >
> > > > >  Zero all call-used registers upon function return.
> > > > >
> > > > > The feature is implemented in middle-end. But currently is only valid on X86.
> > > > >
> > > > > Tested on x86-64 and aarch64 with bootstrapping GCC trunk, making
> > > > > -fzero-call-used-regs=used-gpr, -fzero-call-used-regs=all-gpr
> > > > > -fzero-call-used-regs=used, and -fzero-call-used-regs=all enabled
> > > > > by default on x86-64.
> > > > >
> > > > > Please take a look and let me know any more comment?
> > > > >
> > > > > thanks.
> > > > >
> > > > > Qing
> > > > >
> > > > >
> > > > > ====================================
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > >
> > > > >       * common.opt: Add new option -fzero-call-used-regs.
> > > > >       * config/i386/i386.c (ix86_zero_call_used_regno_p): New function.
> > > > >       (ix86_zero_call_used_regno_mode): Likewise.
> > > > >       (ix86_zero_all_vector_registers): Likewise.
> > > > >       (ix86_expand_prologue): Replace gen_prologue_use with
> > > > >       gen_pro_epilogue_use.
> > > > >       (TARGET_ZERO_CALL_USED_REGNO_P): Define.
> > > > >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Define.
> > > > >       (TARGET_PRO_EPILOGUE_USE): Define.
> > > > >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Define.
> > > > >       * config/i386/i386.md: Replace UNSPECV_PROLOGUE_USE
> > > > >       with UNSPECV_PRO_EPILOGUE_USE.
> > > > >       * coretypes.h (enum zero_call_used_regs): New type.
> > > > >       * doc/extend.texi: Document the new zero_call_used_regs attribute.
> > > > >       * doc/invoke.texi: Document the new -fzero-call-used-regs option.
> > > > >       * doc/tm.texi: Regenerate.
> > > > >       * doc/tm.texi.in <http://tm.texi.in/> (TARGET_ZERO_CALL_USED_REGNO_P): New hook.
> > > > >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Likewise.
> > > > >       (TARGET_PRO_EPILOGUE_USE): Likewise.
> > > > >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Likewise.
> > > > >       * function.c (is_live_reg_at_exit): New function.
> > > > >       (gen_call_used_regs_seq): Likewise.
> > > > >       (make_epilogue_seq): Call gen_call_used_regs_seq.
> > > > >       * function.h (is_live_reg_at_exit): Declare.
> > > > >       * target.def (zero_call_used_regno_p): New hook.
> > > > >       (zero_call_used_regno_mode): Likewise.
> > > > >       (pro_epilogue_use): Likewise.
> > > > >       (zero_all_vector_registers): Likewise.
> > > > >       * targhooks.c (default_zero_call_used_regno_p): New function.
> > > > >       (default_zero_call_used_regno_mode): Likewise.
> > > > >       * targhooks.h (default_zero_call_used_regno_p): Declare.
> > > > >       (default_zero_call_used_regno_mode): Declare.
> > > > >       * toplev.c (process_options): Issue errors when -fzero-call-used-regs
> > > > >       is used on targets that do not support it.
> > > > >       * tree-core.h (struct tree_decl_with_vis): New field
> > > > >       zero_call_used_regs_type.
> > > > >       * tree.h (DECL_ZERO_CALL_USED_REGS): New macro.
> > > > >
> > > > > gcc/c-family/ChangeLog:
> > > > >
> > > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > >
> > > > >       * c-attribs.c (c_common_attribute_table): Add new attribute
> > > > >       zero_call_used_regs.
> > > > >       (handle_zero_call_used_regs_attribute): New function.
> > > > >
> > > > > gcc/c/ChangeLog:
> > > > >
> > > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > >
> > > > >       * c-decl.c (merge_decls): Merge zero_call_used_regs_type.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > >
> > > > >       * c-c++-common/zero-scratch-regs-1.c: New test.
> > > > >       * c-c++-common/zero-scratch-regs-2.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-1.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-10.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-11.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-12.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-13.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-14.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-15.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-16.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-17.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-18.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-19.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-2.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-20.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-21.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-22.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-23.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-3.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-4.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-5.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-6.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-7.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-8.c: Likewise.
> > > > >       * gcc.target/i386/zero-scratch-regs-9.c: Likewise.
> > > > >
> > > > > ---
> > > > > gcc/c-family/c-attribs.c                           |  68 ++++++++++
> > > > > gcc/c/c-decl.c                                     |   4 +
> > > > > gcc/common.opt                                     |  23 ++++
> > > > > gcc/config/i386/i386.c                             |  58 ++++++++-
> > > > > gcc/config/i386/i386.md                            |   6 +-
> > > > > gcc/coretypes.h                                    |  10 ++
> > > > > gcc/doc/extend.texi                                |  11 ++
> > > > > gcc/doc/invoke.texi                                |  13 +-
> > > > > gcc/doc/tm.texi                                    |  27 ++++
> > > > > gcc/doc/tm.texi.in <http://tm.texi.in/>                                 |   8 ++
> > > > > gcc/function.c                                     | 145 +++++++++++++++++++++
> > > > > gcc/function.h                                     |   2 +
> > > > > gcc/target.def                                     |  33 +++++
> > > > > gcc/targhooks.c                                    |  17 +++
> > > > > gcc/targhooks.h                                    |   3 +
> > > > > gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |   3 +
> > > > > gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |   4 +
> > > > > .../gcc.target/i386/zero-scratch-regs-1.c          |  12 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-10.c         |  21 +++
> > > > > .../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++++
> > > > > .../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++++
> > > > > .../gcc.target/i386/zero-scratch-regs-13.c         |  21 +++
> > > > > .../gcc.target/i386/zero-scratch-regs-14.c         |  19 +++
> > > > > .../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-19.c         |  12 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-2.c          |  19 +++
> > > > > .../gcc.target/i386/zero-scratch-regs-20.c         |  23 ++++
> > > > > .../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-22.c         |  19 +++
> > > > > .../gcc.target/i386/zero-scratch-regs-23.c         |  19 +++
> > > > > .../gcc.target/i386/zero-scratch-regs-3.c          |  12 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-5.c          |  20 +++
> > > > > .../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
> > > > > .../gcc.target/i386/zero-scratch-regs-8.c          |  19 +++
> > > > > .../gcc.target/i386/zero-scratch-regs-9.c          |  15 +++
> > > > > gcc/toplev.c                                       |   9 ++
> > > > > gcc/tree-core.h                                    |   6 +-
> > > > > gcc/tree.h                                         |   5 +
> > > > > 43 files changed, 866 insertions(+), 7 deletions(-)
> > > > > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > > > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > > >
> > > > > diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> > > > > index 3721483..cc93d6f 100644
> > > > > --- a/gcc/c-family/c-attribs.c
> > > > > +++ b/gcc/c-family/c-attribs.c
> > > > > @@ -136,6 +136,8 @@ static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
> > > > > static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> > > > > static tree ignore_attribute (tree *, tree, tree, int, bool *);
> > > > > static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
> > > > > +static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
> > > > > +                                              bool *);
> > > > > static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
> > > > > static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
> > > > > static tree handle_returns_nonnull_attribute (tree *, tree, tree, int, bool *);
> > > > > @@ -434,6 +436,9 @@ const struct attribute_spec c_common_attribute_table[] =
> > > > >                             ignore_attribute, NULL },
> > > > >  { "no_split_stack",        0, 0, true,  false, false, false,
> > > > >                             handle_no_split_stack_attribute, NULL },
> > > > > +  { "zero_call_used_regs",    1, 1, true, false, false, false,
> > > > > +                           handle_zero_call_used_regs_attribute, NULL },
> > > > > +
> > > > >  /* For internal use (marking of builtins and runtime functions) only.
> > > > >     The name contains space to prevent its usage in source code.  */
> > > > >  { "fn spec",               1, 1, false, true, true, false,
> > > > > @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, tree name,
> > > > >  return NULL_TREE;
> > > > > }
> > > > >
> > > > > +/* Handle a "zero_call_used_regs" attribute; arguments as in
> > > > > +   struct attribute_spec.handler.  */
> > > > > +
> > > > > +static tree
> > > > > +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> > > > > +                                   int ARG_UNUSED (flags),
> > > > > +                                   bool *no_add_attris)
> > > > > +{
> > > > > +  tree decl = *node;
> > > > > +  tree id = TREE_VALUE (args);
> > > > > +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> > > > > +
> > > > > +  if (TREE_CODE (decl) != FUNCTION_DECL)
> > > > > +    {
> > > > > +      error_at (DECL_SOURCE_LOCATION (decl),
> > > > > +             "%qE attribute applies only to functions", name);
> > > > > +      *no_add_attris = true;
> > > > > +      return NULL_TREE;
> > > > > +    }
> > > > > +  else if (DECL_INITIAL (decl))
> > > > > +    {
> > > > > +      error_at (DECL_SOURCE_LOCATION (decl),
> > > > > +             "cannot set %qE attribute after definition", name);
>
> Why's that?
>
> > > > > +      *no_add_attris = true;
> > > > > +      return NULL_TREE;
> > > > > +    }
> > > > > +
> > > > > +  if (TREE_CODE (id) != STRING_CST)
> > > > > +    {
> > > > > +      error ("attribute %qE arguments not a string", name);
> > > > > +      *no_add_attris = true;
> > > > > +      return NULL_TREE;
> > > > > +    }
> > > > > +
> > > > > +  if (!targetm.calls.pro_epilogue_use)
> > > > > +    {
> > > > > +      warning (OPT_Wattributes, "%qE attribute directive ignored", name);
> > > > > +      return NULL_TREE;
> > > > > +    }
> > > > > +
> > > > > +  if (strcmp (TREE_STRING_POINTER (id), "skip") == 0)
> > > > > +    zero_call_used_regs_type = zero_call_used_regs_skip;
> > > > > +  else if (strcmp (TREE_STRING_POINTER (id), "used-gpr") == 0)
> > > > > +    zero_call_used_regs_type = zero_call_used_regs_used_gpr;
> > > > > +  else if (strcmp (TREE_STRING_POINTER (id), "all-gpr") == 0)
> > > > > +    zero_call_used_regs_type = zero_call_used_regs_all_gpr;
> > > > > +  else if (strcmp (TREE_STRING_POINTER (id), "used") == 0)
> > > > > +    zero_call_used_regs_type = zero_call_used_regs_used;
> > > > > +  else if (strcmp (TREE_STRING_POINTER (id), "all") == 0)
> > > > > +    zero_call_used_regs_type = zero_call_used_regs_all;
> > > > > +  else
> > > > > +    {
> > > > > +      error ("attribute %qE argument must be one of %qs, %qs, %qs, %qs, or %qs",
> > > > > +          name, "skip", "used-gpr", "all-gpr", "used", "all");
> > > > > +      *no_add_attris = true;
> > > > > +      return NULL_TREE;
> > > > > +    }
> > > > > +
> > > > > +  DECL_ZERO_CALL_USED_REGS (decl) = zero_call_used_regs_type;
> > > > > +
> > > > > +  return NULL_TREE;
> > > > > +}
> > > > > +
> > > > > /* Handle a "returns_nonnull" attribute; arguments as in
> > > > >   struct attribute_spec.handler.  */
> > > > >
> > > > > diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> > > > > index 81bd2ee..ded1880 100644
> > > > > --- a/gcc/c/c-decl.c
> > > > > +++ b/gcc/c/c-decl.c
> > > > > @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
> > > > >         DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
> > > > >       }
> > > > >
> > > > > +      /* Merge the zero_call_used_regs_type information.  */
> > > > > +      if (TREE_CODE (newdecl) == FUNCTION_DECL)
> > > > > +     DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS (olddecl);
> > > > > +
>
> If you need this (see below) then likely cp/* needs similar adjustment
> so do other places in the middle-end (function cloning, etc)
>
> > > > >      /* Merge the storage class information.  */
> > > > >      merge_weak (newdecl, olddecl);
> > > > >
> > > > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > > > index df8af36..19900f9 100644
> > > > > --- a/gcc/common.opt
> > > > > +++ b/gcc/common.opt
> > > > > @@ -3083,6 +3083,29 @@ fzero-initialized-in-bss
> > > > > Common Report Var(flag_zero_initialized_in_bss) Init(1)
> > > > > Put zero initialized data in the bss section.
> > > > >
> > > > > +fzero-call-used-regs=
> > > > > +Common Report RejectNegative Joined Enum(zero_call_used_regs) Var(flag_zero_call_used_regs) Init(zero_call_used_regs_skip)
> > > > > +Clear call-used registers upon function return.
> > > > > +
> > > > > +Enum
> > > > > +Name(zero_call_used_regs) Type(enum zero_call_used_regs)
> > > > > +Known choices of clearing call-used registers upon function return (for use with the -fzero-call-used-regs= option):
> > > > > +
> > > > > +EnumValue
> > > > > +Enum(zero_call_used_regs) String(skip) Value(zero_call_used_regs_skip)
> > > > > +
> > > > > +EnumValue
> > > > > +Enum(zero_call_used_regs) String(used-gpr) Value(zero_call_used_regs_used_gpr)
> > > > > +
> > > > > +EnumValue
> > > > > +Enum(zero_call_used_regs) String(all-gpr) Value(zero_call_used_regs_all_gpr)
> > > > > +
> > > > > +EnumValue
> > > > > +Enum(zero_call_used_regs) String(used) Value(zero_call_used_regs_used)
> > > > > +
> > > > > +EnumValue
> > > > > +Enum(zero_call_used_regs) String(all) Value(zero_call_used_regs_all)
> > > > > +
> > > > > g
> > > > > Common Driver RejectNegative JoinedOrMissing
> > > > > Generate debug information in default format.
> > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > > index 5c373c0..fd1aa9c 100644
> > > > > --- a/gcc/config/i386/i386.c
> > > > > +++ b/gcc/config/i386/i386.c
> > > > > @@ -3551,6 +3551,48 @@ ix86_function_value_regno_p (const unsigned int regno)
> > > > >  return false;
> > > > > }
> > > > >
> > > > > +/* TARGET_ZERO_CALL_USED_REGNO_P.  */
> > > > > +
> > > > > +static bool
> > > > > +ix86_zero_call_used_regno_p (const unsigned int regno,
> > > > > +                          bool gpr_only)
> > > > > +{
> > > > > +  return GENERAL_REGNO_P (regno) || (!gpr_only && SSE_REGNO_P (regno));
> > > > > +}
> > > > > +
> > > > > +/* TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > > > > +
> > > > > +static machine_mode
> > > > > +ix86_zero_call_used_regno_mode (const unsigned int regno, machine_mode)
> > > > > +{
> > > > > +  /* NB: We only need to zero the lower 32 bits for integer registers
> > > > > +     and the lower 128 bits for vector registers since destination are
> > > > > +     zero-extended to the full register width.  */
> > > > > +  return GENERAL_REGNO_P (regno) ? SImode : V4SFmode;
> > > > > +}
> > > > > +
> > > > > +/* TARGET_ZERO_ALL_VECTOR_REGISTERS.  */
> > > > > +
> > > > > +static rtx
> > > > > +ix86_zero_all_vector_registers (bool used_only)
> > > > > +{
> > > > > +  if (!TARGET_AVX)
> > > > > +    return NULL;
> > > > > +
> > > > > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > > > > +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> > > > > +      || (TARGET_64BIT
> > > > > +          && (REX_SSE_REGNO_P (regno)
> > > > > +              || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> > > > > +     && (!this_target_hard_regs->x_call_used_regs[regno]
> > > > > +         || fixed_regs[regno]
> > > > > +         || is_live_reg_at_exit (regno)
> > > > > +         || (used_only && !df_regs_ever_live_p (regno))))
> > > > > +      return NULL;
> > > > > +
> > > > > +  return gen_avx_vzeroall ();
> > > > > +}
> > > > > +
> > > > > /* Define how to find the value returned by a function.
> > > > >   VALTYPE is the data type of the value (as a tree).
> > > > >   If the precise function being called is known, FUNC is its FUNCTION_DECL;
> > > > > @@ -8513,7 +8555,7 @@ ix86_expand_prologue (void)
> > > > >      insn = emit_insn (gen_set_got (pic));
> > > > >      RTX_FRAME_RELATED_P (insn) = 1;
> > > > >      add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
> > > > > -      emit_insn (gen_prologue_use (pic));
> > > > > +      emit_insn (gen_pro_epilogue_use (pic));
> > > > >      /* Deleting already emmitted SET_GOT if exist and allocated to
> > > > >        REAL_PIC_OFFSET_TABLE_REGNUM.  */
> > > > >      ix86_elim_entry_set_got (pic);
> > > > > @@ -8542,7 +8584,7 @@ ix86_expand_prologue (void)
> > > > >     Further, prevent alloca modifications to the stack pointer from being
> > > > >     combined with prologue modifications.  */
> > > > >  if (TARGET_SEH)
> > > > > -    emit_insn (gen_prologue_use (stack_pointer_rtx));
> > > > > +    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));
> > > > > }
> > > > >
> > > > > /* Emit code to restore REG using a POP insn.  */
> > > > > @@ -23319,6 +23361,18 @@ ix86_run_selftests (void)
> > > > > #undef TARGET_FUNCTION_VALUE_REGNO_P
> > > > > #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p
> > > > >
> > > > > +#undef TARGET_ZERO_CALL_USED_REGNO_P
> > > > > +#define TARGET_ZERO_CALL_USED_REGNO_P ix86_zero_call_used_regno_p
> > > > > +
> > > > > +#undef TARGET_ZERO_CALL_USED_REGNO_MODE
> > > > > +#define TARGET_ZERO_CALL_USED_REGNO_MODE ix86_zero_call_used_regno_mode
> > > > > +
> > > > > +#undef TARGET_PRO_EPILOGUE_USE
> > > > > +#define TARGET_PRO_EPILOGUE_USE gen_pro_epilogue_use
> > > > > +
> > > > > +#undef TARGET_ZERO_ALL_VECTOR_REGISTERS
> > > > > +#define TARGET_ZERO_ALL_VECTOR_REGISTERS ix86_zero_all_vector_registers
> > > > > +
> > > > > #undef TARGET_PROMOTE_FUNCTION_MODE
> > > > > #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
> > > > >
> > > > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > > > > index d0ecd9e..e7df59f 100644
> > > > > --- a/gcc/config/i386/i386.md
> > > > > +++ b/gcc/config/i386/i386.md
> > > > > @@ -194,7 +194,7 @@
> > > > >  UNSPECV_STACK_PROBE
> > > > >  UNSPECV_PROBE_STACK_RANGE
> > > > >  UNSPECV_ALIGN
> > > > > -  UNSPECV_PROLOGUE_USE
> > > > > +  UNSPECV_PRO_EPILOGUE_USE
> > > > >  UNSPECV_SPLIT_STACK_RETURN
> > > > >  UNSPECV_CLD
> > > > >  UNSPECV_NOPS
> > > > > @@ -13525,8 +13525,8 @@
> > > > >
> > > > > ;; As USE insns aren't meaningful after reload, this is used instead
> > > > > ;; to prevent deleting instructions setting registers for PIC code
> > > > > -(define_insn "prologue_use"
> > > > > -  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]
> > > > > +(define_insn "pro_epilogue_use"
> > > > > +  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
> > > > >  ""
> > > > >  ""
> > > > >  [(set_attr "length" "0")])
> > > > > diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> > > > > index 6b6cfcd..e56d6ec 100644
> > > > > --- a/gcc/coretypes.h
> > > > > +++ b/gcc/coretypes.h
> > > > > @@ -418,6 +418,16 @@ enum symbol_visibility
> > > > >  VISIBILITY_INTERNAL
> > > > > };
> > > > >
> > > > > +/* Zero call-used registers type.  */
> > > > > +enum zero_call_used_regs {
> > > > > +  zero_call_used_regs_unset = 0,
> > > > > +  zero_call_used_regs_skip,
> > > > > +  zero_call_used_regs_used_gpr,
> > > > > +  zero_call_used_regs_all_gpr,
> > > > > +  zero_call_used_regs_used,
> > > > > +  zero_call_used_regs_all
> > > > > +};
> > > > > +
> > > > > /* enums used by the targetm.excess_precision hook.  */
> > > > >
> > > > > enum flt_eval_method
> > > > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > > > > index c800b74..b32c55f 100644
> > > > > --- a/gcc/doc/extend.texi
> > > > > +++ b/gcc/doc/extend.texi
> > > > > @@ -3984,6 +3984,17 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
> > > > > A declaration to which @code{weakref} is attached and that is associated
> > > > > with a named @code{target} must be @code{static}.
> > > > >
> > > > > +@item zero_call_used_regs ("@var{choice}")
> > > > > +@cindex @code{zero_call_used_regs} function attribute
> > > > > +The @code{zero_call_used_regs} attribute causes the compiler to zero
> > > > > +call-used registers at function return according to @var{choice}.
> > > > > +@samp{skip} doesn't zero call-used registers. @samp{used-gpr} zeros
> > > > > +call-used general purpose registers which are used in funciton.
> > > > > +@samp{all-gpr} zeros all call-used general purpose registers.
> > > > > +@samp{used} zeros call-used registers which are used in function.
> > > > > +@samp{all} zeros all call-used registers.  The default for the
> > > > > +attribute is controlled by @option{-fzero-call-used-regs}.
> > > > > +
> > > > > @end table
> > > > >
> > > > > @c This is the end of the target-independent attribute table
> > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > > > index 09bcc5b..da02686 100644
> > > > > --- a/gcc/doc/invoke.texi
> > > > > +++ b/gcc/doc/invoke.texi
> > > > > @@ -542,7 +542,7 @@ Objective-C and Objective-C++ Dialects}.
> > > > > -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
> > > > > -funsafe-math-optimizations  -funswitch-loops @gol
> > > > > -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
> > > > > --fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
> > > > > +-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
> > > > > --param @var{name}=@var{value}
> > > > > -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}
> > > > >
> > > > > @@ -12273,6 +12273,17 @@ int foo (void)
> > > > >
> > > > > Not all targets support this option.
> > > > >
> > > > > +@item -fzero-call-used-regs=@var{choice}
> > > > > +@opindex fzero-call-used-regs
> > > > > +Zero call-used registers at function return according to
> > > > > +@var{choice}.  @samp{skip}, which is the default, doesn't zero
> > > > > +call-used registers.  @samp{used-gpr} zeros call-used general purpose
> > > > > +registers which are used in function.  @samp{all-gpr} zeros all
> > > > > +call-used registers.  @samp{used} zeros call-used registers which
> > > > > +are used in function.  @samp{all} zeros all call-used registers.  You
> > > > > +can control this behavior for a specific function by using the function
> > > > > +attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
> > > > > +
> > > > > @item --param @var{name}=@var{value}
> > > > > @opindex param
> > > > > In some places, GCC uses various constants to control the amount of
> > > > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > > > > index 6e7d9dc..43dddd3 100644
> > > > > --- a/gcc/doc/tm.texi
> > > > > +++ b/gcc/doc/tm.texi
> > > > > @@ -4571,6 +4571,22 @@ should recognize only the caller's register numbers.
> > > > > If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.
> > > > > @end deftypefn
> > > > >
> > > > > +@deftypefn {Target Hook} bool TARGET_ZERO_CALL_USED_REGNO_P (const unsigned int @var{regno}, bool @var{general_reg_only_p})
> > > > > +A target hook that returns @code{true} if @var{regno} is the number of a
> > > > > +call used register.  If @var{general_reg_only_p} is @code{true},
> > > > > +@var{regno} must be the number of a hard general register.
> > > > > +
> > > > > +If this hook is not defined, then default_zero_call_used_regno_p will be used.
> > > > > +@end deftypefn
> > > > > +
> > > > > +@deftypefn {Target Hook} machine_mode TARGET_ZERO_CALL_USED_REGNO_MODE (const unsigned int @var{regno}, machine_mode @var{mode})
> > > > > +A target hook that returns a mode of suitable to zero the register for the
> > > > > +call used register @var{regno} in @var{mode}.
> > > > > +
> > > > > +If this hook is not defined, then default_zero_call_used_regno_mode will be
> > > > > +used.
> > > > > +@end deftypefn
> > > > > +
> > > > > @defmac APPLY_RESULT_SIZE
> > > > > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > > > > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > > > > @@ -12043,6 +12059,17 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
> > > > > is needed.
> > > > > @end deftypefn
> > > > >
> > > > > +@deftypefn {Target Hook} rtx TARGET_PRO_EPILOGUE_USE (rtx @var{reg})
> > > > > +This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
> > > > > +prevent deleting register setting instructions in proprologue and epilogue.
> > > > > +@end deftypefn
> > > > > +
> > > > > +@deftypefn {Target Hook} rtx TARGET_ZERO_ALL_VECTOR_REGISTERS (bool @var{used_only})
> > > > > +This hook should return an rtx to zero all vector registers at function
> > > > > +exit.  If @var{used_only} is @code{true}, only used vector registers should
> > > > > +be zeroed.  Return @code{NULL} if possible
> > > > > +@end deftypefn
> > > > > +
> > > > > @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
> > > > > When optimization is disabled, this hook indicates whether or not
> > > > > arguments should be allocated to stack slots.  Normally, GCC allocates
> > > > > diff --git a/gcc/doc/tm.texi.in <http://tm.texi.in/> b/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > > > index 3be984b..bee917a 100644
> > > > > --- a/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > > > +++ b/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > > > @@ -3430,6 +3430,10 @@ for a new target instead.
> > > > >
> > > > > @hook TARGET_FUNCTION_VALUE_REGNO_P
> > > > >
> > > > > +@hook TARGET_ZERO_CALL_USED_REGNO_P
> > > > > +
> > > > > +@hook TARGET_ZERO_CALL_USED_REGNO_MODE
> > > > > +
> > > > > @defmac APPLY_RESULT_SIZE
> > > > > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > > > > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > > > > @@ -8109,6 +8113,10 @@ and the associated definitions of those functions.
> > > > >
> > > > > @hook TARGET_GET_DRAP_RTX
> > > > >
> > > > > +@hook TARGET_PRO_EPILOGUE_USE
> > > > > +
> > > > > +@hook TARGET_ZERO_ALL_VECTOR_REGISTERS
> > > > > +
> > > > > @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
> > > > >
> > > > > @hook TARGET_CONST_ANCHOR
> > > > > diff --git a/gcc/function.c b/gcc/function.c
> > > > > index 9eee9b5..9908530 100644
> > > > > --- a/gcc/function.c
> > > > > +++ b/gcc/function.c
> > > > > @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
> > > > > #include "emit-rtl.h"
> > > > > #include "recog.h"
> > > > > #include "rtl-error.h"
> > > > > +#include "hard-reg-set.h"
> > > > > #include "alias.h"
> > > > > #include "fold-const.h"
> > > > > #include "stor-layout.h"
> > > > > @@ -5808,6 +5809,147 @@ make_prologue_seq (void)
> > > > >  return seq;
> > > > > }
> > > > >
> > > > > +/* Check whether the hard register REGNO is live at the exit block
> > > > > + * of the current routine.  */
> > > > > +bool
> > > > > +is_live_reg_at_exit (unsigned int regno)
> > > > > +{
> > > > > +  edge e;
> > > > > +  edge_iterator ei;
> > > > > +
> > > > > +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> > > > > +    {
> > > > > +      bitmap live_out = df_get_live_out (e->src);
> > > > > +      if (REGNO_REG_SET_P (live_out, regno))
> > > > > +     return true;
> > > > > +    }
> > > > > +
> > > > > +  return false;
> > > > > +}
> > > > > +
> > > > > +/* Emit a sequence of insns to zero the call-used-registers for the current
> > > > > + * function.  */
>
> No '*' on the continuation line
>
> > > > > +
> > > > > +static void
> > > > > +gen_call_used_regs_seq (void)
> > > > > +{
> > > > > +  if (!targetm.calls.pro_epilogue_use)
> > > > > +    return;
> > > > > +
> > > > > +  bool gpr_only = true;
> > > > > +  bool used_only = true;
> > > > > +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> > > > > +
> > > > > +  if (flag_zero_call_used_regs)
> > > > > +    if (DECL_ZERO_CALL_USED_REGS (current_function_decl)
> > > > > +     == zero_call_used_regs_unset)
> > > > > +      zero_call_used_regs_type = flag_zero_call_used_regs;
> > > > > +    else
> > > > > +      zero_call_used_regs_type
> > > > > +     = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> > > > > +  else
> > > > > +    zero_call_used_regs_type = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> > > > > +
> > > > > +  /* No need to zero call-used-regs when no user request is present.  */
> > > > > +  if (zero_call_used_regs_type <= zero_call_used_regs_skip)
> > > > > +    return;
> > > > > +
> > > > > +  /* No need to zero call-used-regs in main ().  */
> > > > > +  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
> > > > > +    return;
> > > > > +
> > > > > +  /* No need to zero call-used-regs if __builtin_eh_return is called
> > > > > +     since it isn't a normal function return.  */
> > > > > +  if (crtl->calls_eh_return)
> > > > > +    return;
> > > > > +
> > > > > +  /* If gpr_only is true, only zero call-used-registers that are
> > > > > +     general-purpose registers; if used_only is true, only zero
> > > > > +     call-used-registers that are used in the current function.  */
> > > > > +  switch (zero_call_used_regs_type)
> > > > > +    {
> > > > > +      case zero_call_used_regs_all_gpr:
> > > > > +     used_only = false;
> > > > > +     break;
> > > > > +      case zero_call_used_regs_used:
> > > > > +     gpr_only = false;
> > > > > +     break;
> > > > > +      case zero_call_used_regs_all:
> > > > > +     gpr_only = false;
> > > > > +     used_only = false;
> > > > > +     break;
> > > > > +      default:
> > > > > +     break;
> > > > > +    }
> > > > > +
> > > > > +  /* An optimization to use a single hard insn to zero all vector registers on
> > > > > +     the target that provides such insn.  */
> > > > > +  if (!gpr_only
> > > > > +      && targetm.calls.zero_all_vector_registers)
> > > > > +    {
> > > > > +      rtx zero_all_vec_insn
> > > > > +     = targetm.calls.zero_all_vector_registers (used_only);
> > > > > +      if (zero_all_vec_insn)
> > > > > +     {
> > > > > +       emit_insn (zero_all_vec_insn);
> > > > > +       gpr_only = true;
> > > > > +     }
> > > > > +    }
> > > > > +
> > > > > +  /* For each of the hard registers, check to see whether we should zero it if:
> > > > > +     1. it is a call-used-registers;
> > > > > + and 2. it is not a fixed-registers;
> > > > > + and 3. it is not live at the end of the routine;
> > > > > + and 4. it is general purpose register if gpr_only is true;
> > > > > + and 5. it is used in the routine if used_only is true;
> > > > > +   */
> > > > > +
> > > > > +  /* This array holds the zero rtx with the correponding machine mode.  */
> > > > > +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
> > > > > +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> > > > > +    zero_rtx[i] = NULL_RTX;
> > > > > +
> > > > > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > > > > +    {
> > > > > +      if (!this_target_hard_regs->x_call_used_regs[regno])
>
> Use if (!call_used_regs[regno])
>
> > > > > +     continue;
> > > > > +      if (fixed_regs[regno])
> > > > > +     continue;
> > > > > +      if (is_live_reg_at_exit (regno))
> > > > > +     continue;
>
> How can a call-used reg be live at exit?
>
> > > > > +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
> > > > > +     continue;
>
> Why does the target need some extra say here?
>
> > > > > +      if (used_only && !df_regs_ever_live_p (regno))
>
> So I suppose this does not include uses by callees of this function?
>
> > > > > +     continue;
> > > > > +
> > > > > +      /* Now we can emit insn to zero this register.  */
> > > > > +      rtx reg, tmp;
> > > > > +
> > > > > +      machine_mode mode
> > > > > +     = targetm.calls.zero_call_used_regno_mode (regno,
> > > > > +                                                reg_raw_mode[regno]);
>
> In what case does the target ever need to adjust this (we're dealing
> with hard-regs only?)?
>
> > > > > +      if (mode == VOIDmode)
> > > > > +     continue;
> > > > > +      if (!have_regs_of_mode[mode])
> > > > > +     continue;
>
> When does this happen?
>
> > > > > +
> > > > > +      reg = gen_rtx_REG (mode, regno);
> > > > > +      if (zero_rtx[(int)mode] == NULL_RTX)
> > > > > +     {
> > > > > +       zero_rtx[(int)mode] = reg;
> > > > > +       tmp = gen_rtx_SET (reg, const0_rtx);
> > > > > +       emit_insn (tmp);
> > > > > +     }
> > > > > +      else
> > > > > +     emit_move_insn (reg, zero_rtx[(int)mode]);
>
> Not sure but I think the canonical zero to use is CONST0_RTX (mode)
> but I may be wrong.  I'd rather have the target be able to specify
> some special instruction for zeroing here.  Some may have
> multi-reg set instructions for example.  That said, can't we
> defer the actual zeroing to the target in full and only compute
> a hard-reg-set of to-be zerored registers here and pass that
> to a target hook?
>
> > > > > +
> > > > > +      emit_insn (targetm.calls.pro_epilogue_use (reg));
> > > > > +    }
> > > > > +
> > > > > +  return;
> > > > > +}
> > > > > +
> > > > > +
> > > > > /* Return a sequence to be used as the epilogue for the current function,
> > > > >   or NULL.  */
> > > > >
> > > > > @@ -5819,6 +5961,9 @@ make_epilogue_seq (void)
> > > > >
> > > > >  start_sequence ();
> > > > >  emit_note (NOTE_INSN_EPILOGUE_BEG);
> > > > > +
> > > > > +  gen_call_used_regs_seq ();
> > > > > +
>
> The caller eventually performs shrink-wrapping - are you sure that
> doesn't mess up things?
>
> > > > >  rtx_insn *seq = targetm.gen_epilogue ();
> > > > >  if (seq)
> > > > >    emit_jump_insn (seq);
> > > > > diff --git a/gcc/function.h b/gcc/function.h
> > > > > index d55cbdd..fc36c3e 100644
> > > > > --- a/gcc/function.h
> > > > > +++ b/gcc/function.h
> > > > > @@ -705,4 +705,6 @@ extern const char *current_function_name (void);
> > > > >
> > > > > extern void used_types_insert (tree);
> > > > >
> > > > > +extern bool is_live_reg_at_exit (unsigned int);
> > > > > +
> > > > > #endif  /* GCC_FUNCTION_H */
> > > > > diff --git a/gcc/target.def b/gcc/target.def
> > > > > index 07059a8..8aab63e 100644
> > > > > --- a/gcc/target.def
> > > > > +++ b/gcc/target.def
> > > > > @@ -5022,6 +5022,26 @@ If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.",
> > > > > default_function_value_regno_p)
> > > > >
> > > > > DEFHOOK
> > > > > +(zero_call_used_regno_p,
> > > > > + "A target hook that returns @code{true} if @var{regno} is the number of a\n\
> > > > > +call used register.  If @var{general_reg_only_p} is @code{true},\n\
> > > > > +@var{regno} must be the number of a hard general register.\n\
> > > > > +\n\
> > > > > +If this hook is not defined, then default_zero_call_used_regno_p will be used.",
> > > > > + bool, (const unsigned int regno, bool general_reg_only_p),
> > > > > + default_zero_call_used_regno_p)
> > > > > +
> > > > > +DEFHOOK
> > > > > +(zero_call_used_regno_mode,
> > > > > + "A target hook that returns a mode of suitable to zero the register for the\n\
> > > > > +call used register @var{regno} in @var{mode}.\n\
> > > > > +\n\
> > > > > +If this hook is not defined, then default_zero_call_used_regno_mode will be\n\
> > > > > +used.",
> > > > > + machine_mode, (const unsigned int regno, machine_mode mode),
> > > > > + default_zero_call_used_regno_mode)
> > > > > +
> > > > > +DEFHOOK
> > > > > (fntype_abi,
> > > > > "Return the ABI used by a function with type @var{type}; see the\n\
> > > > > definition of @code{predefined_function_abi} for details of the ABI\n\
> > > > > @@ -5068,6 +5088,19 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
> > > > > is needed.",
> > > > > rtx, (void), NULL)
> > > > >
> > > > > +DEFHOOK
> > > > > +(pro_epilogue_use,
> > > > > + "This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to\n\
> > > > > +prevent deleting register setting instructions in proprologue and epilogue.",
> > > > > + rtx, (rtx reg), NULL)
> > > > > +
> > > > > +DEFHOOK
> > > > > +(zero_all_vector_registers,
> > > > > + "This hook should return an rtx to zero all vector registers at function\n\
> > > > > +exit.  If @var{used_only} is @code{true}, only used vector registers should\n\
> > > > > +be zeroed.  Return @code{NULL} if possible",
> > > > > + rtx, (bool used_only), NULL)
> > > > > +
> > > > > /* Return true if all function parameters should be spilled to the
> > > > >   stack.  */
> > > > > DEFHOOK
> > > > > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> > > > > index 0113c7b..ed02173 100644
> > > > > --- a/gcc/targhooks.c
> > > > > +++ b/gcc/targhooks.c
> > > > > @@ -987,6 +987,23 @@ default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
> > > > > #endif
> > > > > }
> > > > >
> > > > > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_P.  */
> > > > > +
> > > > > +bool
> > > > > +default_zero_call_used_regno_p (const unsigned int,
> > > > > +                             bool)
> > > > > +{
> > > > > +  return false;
> > > > > +}
> > > > > +
> > > > > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > > > > +
> > > > > +machine_mode
> > > > > +default_zero_call_used_regno_mode (const unsigned int, machine_mode mode)
> > > > > +{
> > > > > +  return mode;
> > > > > +}
> > > > > +
> > > > > rtx
> > > > > default_internal_arg_pointer (void)
> > > > > {
> > > > > diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> > > > > index b572a36..370df19 100644
> > > > > --- a/gcc/targhooks.h
> > > > > +++ b/gcc/targhooks.h
> > > > > @@ -162,6 +162,9 @@ extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
> > > > > extern rtx default_function_value (const_tree, const_tree, bool);
> > > > > extern rtx default_libcall_value (machine_mode, const_rtx);
> > > > > extern bool default_function_value_regno_p (const unsigned int);
> > > > > +extern bool default_zero_call_used_regno_p (const unsigned int, bool);
> > > > > +extern machine_mode default_zero_call_used_regno_mode (const unsigned int,
> > > > > +                                                    machine_mode);
> > > > > extern rtx default_internal_arg_pointer (void);
> > > > > extern rtx default_static_chain (const_tree, bool);
> > > > > extern void default_trampoline_init (rtx, tree, rtx);
> > > > > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > > > new file mode 100644
> > > > > index 0000000..3c2ac72
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > > > @@ -0,0 +1,3 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > > > +/* { dg-error "'-fzero-call-used-regs=' is not supported for this target" "" { target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> > > > > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > > > new file mode 100644
> > > > > index 0000000..acf48c4
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > > > @@ -0,0 +1,4 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2" } */
> > > > > +
> > > > > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr"))); /* { dg-warning " attribute directive ignored" "" {target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > > > new file mode 100644
> > > > > index 0000000..9f61dc4
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > > > @@ -0,0 +1,12 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > > > new file mode 100644
> > > > > index 0000000..09048e5
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > > > @@ -0,0 +1,21 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > +
> > > > > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
> > > > > +
> > > > > +int
> > > > > +foo (int x)
> > > > > +{
> > > > > +  return x;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > > > new file mode 100644
> > > > > index 0000000..4862688
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > > > @@ -0,0 +1,39 @@
> > > > > +/* { dg-do run { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > > > > +
> > > > > +struct S { int i; };
> > > > > +__attribute__((const, noinline, noclone))
> > > > > +struct S foo (int x)
> > > > > +{
> > > > > +  struct S s;
> > > > > +  s.i = x;
> > > > > +  return s;
> > > > > +}
> > > > > +
> > > > > +int a[2048], b[2048], c[2048], d[2048];
> > > > > +struct S e[2048];
> > > > > +
> > > > > +__attribute__((noinline, noclone)) void
> > > > > +bar (void)
> > > > > +{
> > > > > +  int i;
> > > > > +  for (i = 0; i < 1024; i++)
> > > > > +    {
> > > > > +      e[i] = foo (i);
> > > > > +      a[i+2] = a[i] + a[i+1];
> > > > > +      b[10] = b[10] + i;
> > > > > +      c[i] = c[2047 - i];
> > > > > +      d[i] = d[i + 1];
> > > > > +    }
> > > > > +}
> > > > > +
> > > > > +int
> > > > > +main ()
> > > > > +{
> > > > > +  int i;
> > > > > +  bar ();
> > > > > +  for (i = 0; i < 1024; i++)
> > > > > +    if (e[i].i != i)
> > > > > +      __builtin_abort ();
> > > > > +  return 0;
> > > > > +}
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > > > new file mode 100644
> > > > > index 0000000..500251b
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > > > @@ -0,0 +1,39 @@
> > > > > +/* { dg-do run { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > > +
> > > > > +struct S { int i; };
> > > > > +__attribute__((const, noinline, noclone))
> > > > > +struct S foo (int x)
> > > > > +{
> > > > > +  struct S s;
> > > > > +  s.i = x;
> > > > > +  return s;
> > > > > +}
> > > > > +
> > > > > +int a[2048], b[2048], c[2048], d[2048];
> > > > > +struct S e[2048];
> > > > > +
> > > > > +__attribute__((noinline, noclone)) void
> > > > > +bar (void)
> > > > > +{
> > > > > +  int i;
> > > > > +  for (i = 0; i < 1024; i++)
> > > > > +    {
> > > > > +      e[i] = foo (i);
> > > > > +      a[i+2] = a[i] + a[i+1];
> > > > > +      b[10] = b[10] + i;
> > > > > +      c[i] = c[2047 - i];
> > > > > +      d[i] = d[i + 1];
> > > > > +    }
> > > > > +}
> > > > > +
> > > > > +int
> > > > > +main ()
> > > > > +{
> > > > > +  int i;
> > > > > +  bar ();
> > > > > +  for (i = 0; i < 1024; i++)
> > > > > +    if (e[i].i != i)
> > > > > +      __builtin_abort ();
> > > > > +  return 0;
> > > > > +}
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > > > new file mode 100644
> > > > > index 0000000..8b058e3
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > > > @@ -0,0 +1,21 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> > > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > > > new file mode 100644
> > > > > index 0000000..d4eaaf7
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > > > @@ -0,0 +1,19 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > > > new file mode 100644
> > > > > index 0000000..dd3bb90
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > > > @@ -0,0 +1,14 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > +
> > > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > > > new file mode 100644
> > > > > index 0000000..e2274f6
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > > > @@ -0,0 +1,14 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> > > > > +
> > > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > > > new file mode 100644
> > > > > index 0000000..7f5d153
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > > > @@ -0,0 +1,13 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > > > +
> > > > > +int
> > > > > +foo (int x)
> > > > > +{
> > > > > +  return x;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > > > new file mode 100644
> > > > > index 0000000..fe13d2b
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > > > @@ -0,0 +1,13 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > > > > +
> > > > > +float
> > > > > +foo (float z, float y, float x)
> > > > > +{
> > > > > +  return x + y;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > > > new file mode 100644
> > > > > index 0000000..205a532
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > > > @@ -0,0 +1,12 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > > > > +
> > > > > +float
> > > > > +foo (float z, float y, float x)
> > > > > +{
> > > > > +  return x;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > > > new file mode 100644
> > > > > index 0000000..e046684
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > > > @@ -0,0 +1,19 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > > > new file mode 100644
> > > > > index 0000000..4be8ff6
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > > > @@ -0,0 +1,23 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > > > > +
> > > > > +float
> > > > > +foo (float z, float y, float x)
> > > > > +{
> > > > > +  return x + y;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > > > new file mode 100644
> > > > > index 0000000..0eb34e0
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > > > @@ -0,0 +1,14 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
> > > > > +
> > > > > +__attribute__ ((zero_call_used_regs("used")))
> > > > > +float
> > > > > +foo (float z, float y, float x)
> > > > > +{
> > > > > +  return x + y;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > > > new file mode 100644
> > > > > index 0000000..cbb63a4
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > > > @@ -0,0 +1,19 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > > > new file mode 100644
> > > > > index 0000000..7573197
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > > > @@ -0,0 +1,19 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > > > new file mode 100644
> > > > > index 0000000..de71223
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > > > @@ -0,0 +1,12 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > > > new file mode 100644
> > > > > index 0000000..ccfa441
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > > > @@ -0,0 +1,14 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > +
> > > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > > > new file mode 100644
> > > > > index 0000000..6b46ca3
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > > > @@ -0,0 +1,20 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > +
> > > > > +__attribute__ ((zero_call_used_regs("all-gpr")))
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > > > new file mode 100644
> > > > > index 0000000..0680f38
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > > > @@ -0,0 +1,14 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > > +
> > > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > > > > +
> > > > > +void
> > > > > +foo (void)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > > > new file mode 100644
> > > > > index 0000000..534defa
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > > > @@ -0,0 +1,13 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > > > > +
> > > > > +int
> > > > > +foo (int x)
> > > > > +{
> > > > > +  return x;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > > > new file mode 100644
> > > > > index 0000000..477bb19
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > > > @@ -0,0 +1,19 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > > +
> > > > > +int
> > > > > +foo (int x)
> > > > > +{
> > > > > +  return x;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > > > new file mode 100644
> > > > > index 0000000..a305a60
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > > > @@ -0,0 +1,15 @@
> > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > +
> > > > > +extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
> > > > > +
> > > > > +int
> > > > > +foo (int x)
> > > > > +{
> > > > > +  return x;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > > > diff --git a/gcc/toplev.c b/gcc/toplev.c
> > > > > index 95eea63..01a1f24 100644
> > > > > --- a/gcc/toplev.c
> > > > > +++ b/gcc/toplev.c
> > > > > @@ -1464,6 +1464,15 @@ process_options (void)
> > > > >       }
> > > > >    }
> > > > >
> > > > > +  if (flag_zero_call_used_regs != zero_call_used_regs_skip
> > > > > +      && !targetm.calls.pro_epilogue_use)
> > > > > +    {
> > > > > +      error_at (UNKNOWN_LOCATION,
> > > > > +             "%<-fzero-call-used-regs=%> is not supported for this "
> > > > > +             "target");
> > > > > +      flag_zero_call_used_regs = zero_call_used_regs_skip;
> > > > > +    }
> > > > > +
> > > > >  /* One region RA really helps to decrease the code size.  */
> > > > >  if (flag_ira_region == IRA_REGION_AUTODETECT)
> > > > >    flag_ira_region
> > > > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > > > > index 8c5a2e3..71badbd 100644
> > > > > --- a/gcc/tree-core.h
> > > > > +++ b/gcc/tree-core.h
> > > > > @@ -1825,7 +1825,11 @@ struct GTY(()) tree_decl_with_vis {
> > > > > unsigned final : 1;
> > > > > /* Belong to FUNCTION_DECL exclusively.  */
> > > > > unsigned regdecl_flag : 1;
> > > > > - /* 14 unused bits. */
> > > > > +
> > > > > + /* How to clear call-used registers upon function return.  */
> > > > > + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
> > > > > +
> > > > > + /* 11 unused bits.  */
>
> So instead of wasting "precious" bits please use lookup_attribute
> in the single place you query this value (which is once per function).
> There's no need to complicate matters by trying to maintain the above.
>
> > > > > };
> > > > >
> > > > > struct GTY(()) tree_var_decl {
> > > > > diff --git a/gcc/tree.h b/gcc/tree.h
> > > > > index cf546ed..d378a88 100644
> > > > > --- a/gcc/tree.h
> > > > > +++ b/gcc/tree.h
> > > > > @@ -2925,6 +2925,11 @@ extern void decl_value_expr_insert (tree, tree);
> > > > > #define DECL_VISIBILITY(NODE) \
> > > > >  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.visibility)
> > > > >
> > > > > +/* Value of the function decl's type of zeroing the call used
> > > > > +   registers upon return from function.  */
> > > > > +#define DECL_ZERO_CALL_USED_REGS(NODE) \
> > > > > +  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.zero_call_used_regs_type)
> > > > > +
> > > > > /* Nonzero means that the decl (or an enclosing scope) had its
> > > > >   visibility specified rather than being inferred.  */
> > > > > #define DECL_VISIBILITY_SPECIFIED(NODE) \
> > > > > --
> > > > > 1.9.1
> > > >
> > >
> >
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)



-- 
H.J.

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-04 18:23                 ` H.J. Lu
@ 2020-08-05  7:06                   ` Richard Biener
  2020-08-05 12:26                     ` H.J. Lu
  0 siblings, 1 reply; 188+ messages in thread
From: Richard Biener @ 2020-08-05  7:06 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Qing Zhao, Uros Bizjak, Jakub Jelinek, GCC Patches, Kees Cook,
	Rodriguez Bahena, Victor

On Tue, 4 Aug 2020, H.J. Lu wrote:

> On Tue, Aug 4, 2020 at 12:35 AM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Mon, 3 Aug 2020, Qing Zhao wrote:
> >
> > > Hi, Uros,
> > >
> > > Thanks a lot for your review on X86 parts.
> > >
> > > Hi, Richard,
> > >
> > > Could you please take a look at the middle-end part to see whether the
> > > rewritten addressed your previous concern?
> >
> > I have a few comments below - I'm not sure I'm qualified to fully
> > review the rest though.
> >
> > > Thanks a lot.
> > >
> > > Qing
> > >
> > >
> > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > >
> > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
> > > > >
> > > > >
> > > > > Richard and Uros,
> > > > >
> > > > > Could you please review the change that H.J and I rewrote based on your comments in the previous round of discussion?
> > > > >
> > > > > This patch is a nice security enhancement for GCC that has been requested by security people for quite some time.
> > > > >
> > > > > Thanks a lot for your time.
> > > >
> > > > I'll be away from the keyboard for the next week, but the patch needs a middle end approval first.
> > > >
> > > > That said, x86 parts looks OK.
> > > >
> > > >
> > >
> > > > Uros.
> > > > > Qing
> > > > >
> > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > > > >
> > > > > > Hi, Gcc team,
> > > > > >
> > > > > > This patch is a follow-up on the previous patch and corresponding discussion:
> > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html> <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
> > > > > >
> > > > > > From the previous round of discussion, the major issues raised were:
> > > > > >
> > > > > > A. should be rewritten by using regsets infrastructure.
> > > > > > B. Put the patch into middle-end instead of x86 backend.
> > > > > >
> > > > > > This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
> > > > > >
> > > > > > 1. Change the names of the option and attribute from
> > > > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > to:
> > > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > Add the new option and  new attribute in general.
> > > > > > 2. The main code generation part is moved from i386 backend to middle-end;
> > > > > > 3. Add 4 target-hooks;
> > > > > > 4. Implement these 4 target-hooks on i386 backend.
> > > > > > 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
> > > > > >
> > > > > > The patch is as following:
> > > > > >
> > > > > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > > > command-line option and
> > > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> > > > > >
> > > > > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > > > >
> > > > > >  Don't zero call-used registers upon function return.
> >
> > Does a return via EH unwinding also constitute a function return?  I
> > think you may want to have a finally handler or support in the unwinder
> > for this?  Then there's abnormal return via longjmp & friends, I guess
> > there's nothing that can be done there besides patching glibc?
> 
> Abnormal returns, like EH unwinding and longjmp, aren't covered by this
> patch. Only normal returns are covered.

What's the point then?  Also specifically thinking about spill slots.

Richard.

> > In general I am missing reasoning as why to use -fzero-call-used-regs=
> > in the documentation, that is, what is the thread model and what are
> > the guarantees?  Is there any point zeroing registers when spill slots
> > are left populated with stale register contents?  How do I (and why
> > would I want to?) ensure that there's no information leak from the
> > implementation of 'foo' to their callers?  Do I need to compile all
> > of 'foo' and functions called from 'foo' with -fzero-call-used-regs=
> > or is it enough to annotate API boundaries I want to proptect with
> > zero_call_used_regs("...")?
> >
> > Again - what's the intended use (and how does it fulful anything useful
> > for that case)?
> >
> > > > > >  2. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")
> > > > > >
> > > > > >  Zero used call-used general purpose registers upon function return.
> > > > > >
> > > > > >  3. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")
> > > > > >
> > > > > >  Zero all call-used general purpose registers upon function return.
> > > > > >
> > > > > >  4. -fzero-call-used-regs=used and zero_call_used_regs("used")
> > > > > >
> > > > > >  Zero used call-used registers upon function return.
> > > > > >
> > > > > >  5. -fzero-call-used-regs=all and zero_call_used_regs("all")
> > > > > >
> > > > > >  Zero all call-used registers upon function return.
> > > > > >
> > > > > > The feature is implemented in middle-end. But currently is only valid on X86.
> > > > > >
> > > > > > Tested on x86-64 and aarch64 with bootstrapping GCC trunk, making
> > > > > > -fzero-call-used-regs=used-gpr, -fzero-call-used-regs=all-gpr
> > > > > > -fzero-call-used-regs=used, and -fzero-call-used-regs=all enabled
> > > > > > by default on x86-64.
> > > > > >
> > > > > > Please take a look and let me know any more comment?
> > > > > >
> > > > > > thanks.
> > > > > >
> > > > > > Qing
> > > > > >
> > > > > >
> > > > > > ====================================
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > > >
> > > > > >       * common.opt: Add new option -fzero-call-used-regs.
> > > > > >       * config/i386/i386.c (ix86_zero_call_used_regno_p): New function.
> > > > > >       (ix86_zero_call_used_regno_mode): Likewise.
> > > > > >       (ix86_zero_all_vector_registers): Likewise.
> > > > > >       (ix86_expand_prologue): Replace gen_prologue_use with
> > > > > >       gen_pro_epilogue_use.
> > > > > >       (TARGET_ZERO_CALL_USED_REGNO_P): Define.
> > > > > >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Define.
> > > > > >       (TARGET_PRO_EPILOGUE_USE): Define.
> > > > > >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Define.
> > > > > >       * config/i386/i386.md: Replace UNSPECV_PROLOGUE_USE
> > > > > >       with UNSPECV_PRO_EPILOGUE_USE.
> > > > > >       * coretypes.h (enum zero_call_used_regs): New type.
> > > > > >       * doc/extend.texi: Document the new zero_call_used_regs attribute.
> > > > > >       * doc/invoke.texi: Document the new -fzero-call-used-regs option.
> > > > > >       * doc/tm.texi: Regenerate.
> > > > > >       * doc/tm.texi.in <http://tm.texi.in/> (TARGET_ZERO_CALL_USED_REGNO_P): New hook.
> > > > > >       (TARGET_ZERO_CALL_USED_REGNO_MODE): Likewise.
> > > > > >       (TARGET_PRO_EPILOGUE_USE): Likewise.
> > > > > >       (TARGET_ZERO_ALL_VECTOR_REGISTERS): Likewise.
> > > > > >       * function.c (is_live_reg_at_exit): New function.
> > > > > >       (gen_call_used_regs_seq): Likewise.
> > > > > >       (make_epilogue_seq): Call gen_call_used_regs_seq.
> > > > > >       * function.h (is_live_reg_at_exit): Declare.
> > > > > >       * target.def (zero_call_used_regno_p): New hook.
> > > > > >       (zero_call_used_regno_mode): Likewise.
> > > > > >       (pro_epilogue_use): Likewise.
> > > > > >       (zero_all_vector_registers): Likewise.
> > > > > >       * targhooks.c (default_zero_call_used_regno_p): New function.
> > > > > >       (default_zero_call_used_regno_mode): Likewise.
> > > > > >       * targhooks.h (default_zero_call_used_regno_p): Declare.
> > > > > >       (default_zero_call_used_regno_mode): Declare.
> > > > > >       * toplev.c (process_options): Issue errors when -fzero-call-used-regs
> > > > > >       is used on targets that do not support it.
> > > > > >       * tree-core.h (struct tree_decl_with_vis): New field
> > > > > >       zero_call_used_regs_type.
> > > > > >       * tree.h (DECL_ZERO_CALL_USED_REGS): New macro.
> > > > > >
> > > > > > gcc/c-family/ChangeLog:
> > > > > >
> > > > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > > >
> > > > > >       * c-attribs.c (c_common_attribute_table): Add new attribute
> > > > > >       zero_call_used_regs.
> > > > > >       (handle_zero_call_used_regs_attribute): New function.
> > > > > >
> > > > > > gcc/c/ChangeLog:
> > > > > >
> > > > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > > >
> > > > > >       * c-decl.c (merge_decls): Merge zero_call_used_regs_type.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > 2020-07-13  qing zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com> <mailto:qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>>
> > > > > > 2020-07-13  H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>>
> > > > > >
> > > > > >       * c-c++-common/zero-scratch-regs-1.c: New test.
> > > > > >       * c-c++-common/zero-scratch-regs-2.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-1.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-10.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-11.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-12.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-13.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-14.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-15.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-16.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-17.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-18.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-19.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-2.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-20.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-21.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-22.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-23.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-3.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-4.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-5.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-6.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-7.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-8.c: Likewise.
> > > > > >       * gcc.target/i386/zero-scratch-regs-9.c: Likewise.
> > > > > >
> > > > > > ---
> > > > > > gcc/c-family/c-attribs.c                           |  68 ++++++++++
> > > > > > gcc/c/c-decl.c                                     |   4 +
> > > > > > gcc/common.opt                                     |  23 ++++
> > > > > > gcc/config/i386/i386.c                             |  58 ++++++++-
> > > > > > gcc/config/i386/i386.md                            |   6 +-
> > > > > > gcc/coretypes.h                                    |  10 ++
> > > > > > gcc/doc/extend.texi                                |  11 ++
> > > > > > gcc/doc/invoke.texi                                |  13 +-
> > > > > > gcc/doc/tm.texi                                    |  27 ++++
> > > > > > gcc/doc/tm.texi.in <http://tm.texi.in/>                                 |   8 ++
> > > > > > gcc/function.c                                     | 145 +++++++++++++++++++++
> > > > > > gcc/function.h                                     |   2 +
> > > > > > gcc/target.def                                     |  33 +++++
> > > > > > gcc/targhooks.c                                    |  17 +++
> > > > > > gcc/targhooks.h                                    |   3 +
> > > > > > gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |   3 +
> > > > > > gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |   4 +
> > > > > > .../gcc.target/i386/zero-scratch-regs-1.c          |  12 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-10.c         |  21 +++
> > > > > > .../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++++
> > > > > > .../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++++
> > > > > > .../gcc.target/i386/zero-scratch-regs-13.c         |  21 +++
> > > > > > .../gcc.target/i386/zero-scratch-regs-14.c         |  19 +++
> > > > > > .../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-19.c         |  12 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-2.c          |  19 +++
> > > > > > .../gcc.target/i386/zero-scratch-regs-20.c         |  23 ++++
> > > > > > .../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-22.c         |  19 +++
> > > > > > .../gcc.target/i386/zero-scratch-regs-23.c         |  19 +++
> > > > > > .../gcc.target/i386/zero-scratch-regs-3.c          |  12 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-5.c          |  20 +++
> > > > > > .../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
> > > > > > .../gcc.target/i386/zero-scratch-regs-8.c          |  19 +++
> > > > > > .../gcc.target/i386/zero-scratch-regs-9.c          |  15 +++
> > > > > > gcc/toplev.c                                       |   9 ++
> > > > > > gcc/tree-core.h                                    |   6 +-
> > > > > > gcc/tree.h                                         |   5 +
> > > > > > 43 files changed, 866 insertions(+), 7 deletions(-)
> > > > > > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > > > > create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > > > >
> > > > > > diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> > > > > > index 3721483..cc93d6f 100644
> > > > > > --- a/gcc/c-family/c-attribs.c
> > > > > > +++ b/gcc/c-family/c-attribs.c
> > > > > > @@ -136,6 +136,8 @@ static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
> > > > > > static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> > > > > > static tree ignore_attribute (tree *, tree, tree, int, bool *);
> > > > > > static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
> > > > > > +static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
> > > > > > +                                              bool *);
> > > > > > static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
> > > > > > static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
> > > > > > static tree handle_returns_nonnull_attribute (tree *, tree, tree, int, bool *);
> > > > > > @@ -434,6 +436,9 @@ const struct attribute_spec c_common_attribute_table[] =
> > > > > >                             ignore_attribute, NULL },
> > > > > >  { "no_split_stack",        0, 0, true,  false, false, false,
> > > > > >                             handle_no_split_stack_attribute, NULL },
> > > > > > +  { "zero_call_used_regs",    1, 1, true, false, false, false,
> > > > > > +                           handle_zero_call_used_regs_attribute, NULL },
> > > > > > +
> > > > > >  /* For internal use (marking of builtins and runtime functions) only.
> > > > > >     The name contains space to prevent its usage in source code.  */
> > > > > >  { "fn spec",               1, 1, false, true, true, false,
> > > > > > @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, tree name,
> > > > > >  return NULL_TREE;
> > > > > > }
> > > > > >
> > > > > > +/* Handle a "zero_call_used_regs" attribute; arguments as in
> > > > > > +   struct attribute_spec.handler.  */
> > > > > > +
> > > > > > +static tree
> > > > > > +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> > > > > > +                                   int ARG_UNUSED (flags),
> > > > > > +                                   bool *no_add_attris)
> > > > > > +{
> > > > > > +  tree decl = *node;
> > > > > > +  tree id = TREE_VALUE (args);
> > > > > > +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> > > > > > +
> > > > > > +  if (TREE_CODE (decl) != FUNCTION_DECL)
> > > > > > +    {
> > > > > > +      error_at (DECL_SOURCE_LOCATION (decl),
> > > > > > +             "%qE attribute applies only to functions", name);
> > > > > > +      *no_add_attris = true;
> > > > > > +      return NULL_TREE;
> > > > > > +    }
> > > > > > +  else if (DECL_INITIAL (decl))
> > > > > > +    {
> > > > > > +      error_at (DECL_SOURCE_LOCATION (decl),
> > > > > > +             "cannot set %qE attribute after definition", name);
> >
> > Why's that?
> >
> > > > > > +      *no_add_attris = true;
> > > > > > +      return NULL_TREE;
> > > > > > +    }
> > > > > > +
> > > > > > +  if (TREE_CODE (id) != STRING_CST)
> > > > > > +    {
> > > > > > +      error ("attribute %qE arguments not a string", name);
> > > > > > +      *no_add_attris = true;
> > > > > > +      return NULL_TREE;
> > > > > > +    }
> > > > > > +
> > > > > > +  if (!targetm.calls.pro_epilogue_use)
> > > > > > +    {
> > > > > > +      warning (OPT_Wattributes, "%qE attribute directive ignored", name);
> > > > > > +      return NULL_TREE;
> > > > > > +    }
> > > > > > +
> > > > > > +  if (strcmp (TREE_STRING_POINTER (id), "skip") == 0)
> > > > > > +    zero_call_used_regs_type = zero_call_used_regs_skip;
> > > > > > +  else if (strcmp (TREE_STRING_POINTER (id), "used-gpr") == 0)
> > > > > > +    zero_call_used_regs_type = zero_call_used_regs_used_gpr;
> > > > > > +  else if (strcmp (TREE_STRING_POINTER (id), "all-gpr") == 0)
> > > > > > +    zero_call_used_regs_type = zero_call_used_regs_all_gpr;
> > > > > > +  else if (strcmp (TREE_STRING_POINTER (id), "used") == 0)
> > > > > > +    zero_call_used_regs_type = zero_call_used_regs_used;
> > > > > > +  else if (strcmp (TREE_STRING_POINTER (id), "all") == 0)
> > > > > > +    zero_call_used_regs_type = zero_call_used_regs_all;
> > > > > > +  else
> > > > > > +    {
> > > > > > +      error ("attribute %qE argument must be one of %qs, %qs, %qs, %qs, or %qs",
> > > > > > +          name, "skip", "used-gpr", "all-gpr", "used", "all");
> > > > > > +      *no_add_attris = true;
> > > > > > +      return NULL_TREE;
> > > > > > +    }
> > > > > > +
> > > > > > +  DECL_ZERO_CALL_USED_REGS (decl) = zero_call_used_regs_type;
> > > > > > +
> > > > > > +  return NULL_TREE;
> > > > > > +}
> > > > > > +
> > > > > > /* Handle a "returns_nonnull" attribute; arguments as in
> > > > > >   struct attribute_spec.handler.  */
> > > > > >
> > > > > > diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> > > > > > index 81bd2ee..ded1880 100644
> > > > > > --- a/gcc/c/c-decl.c
> > > > > > +++ b/gcc/c/c-decl.c
> > > > > > @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
> > > > > >         DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
> > > > > >       }
> > > > > >
> > > > > > +      /* Merge the zero_call_used_regs_type information.  */
> > > > > > +      if (TREE_CODE (newdecl) == FUNCTION_DECL)
> > > > > > +     DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS (olddecl);
> > > > > > +
> >
> > If you need this (see below) then likely cp/* needs similar adjustment
> > so do other places in the middle-end (function cloning, etc)
> >
> > > > > >      /* Merge the storage class information.  */
> > > > > >      merge_weak (newdecl, olddecl);
> > > > > >
> > > > > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > > > > index df8af36..19900f9 100644
> > > > > > --- a/gcc/common.opt
> > > > > > +++ b/gcc/common.opt
> > > > > > @@ -3083,6 +3083,29 @@ fzero-initialized-in-bss
> > > > > > Common Report Var(flag_zero_initialized_in_bss) Init(1)
> > > > > > Put zero initialized data in the bss section.
> > > > > >
> > > > > > +fzero-call-used-regs=
> > > > > > +Common Report RejectNegative Joined Enum(zero_call_used_regs) Var(flag_zero_call_used_regs) Init(zero_call_used_regs_skip)
> > > > > > +Clear call-used registers upon function return.
> > > > > > +
> > > > > > +Enum
> > > > > > +Name(zero_call_used_regs) Type(enum zero_call_used_regs)
> > > > > > +Known choices of clearing call-used registers upon function return (for use with the -fzero-call-used-regs= option):
> > > > > > +
> > > > > > +EnumValue
> > > > > > +Enum(zero_call_used_regs) String(skip) Value(zero_call_used_regs_skip)
> > > > > > +
> > > > > > +EnumValue
> > > > > > +Enum(zero_call_used_regs) String(used-gpr) Value(zero_call_used_regs_used_gpr)
> > > > > > +
> > > > > > +EnumValue
> > > > > > +Enum(zero_call_used_regs) String(all-gpr) Value(zero_call_used_regs_all_gpr)
> > > > > > +
> > > > > > +EnumValue
> > > > > > +Enum(zero_call_used_regs) String(used) Value(zero_call_used_regs_used)
> > > > > > +
> > > > > > +EnumValue
> > > > > > +Enum(zero_call_used_regs) String(all) Value(zero_call_used_regs_all)
> > > > > > +
> > > > > > g
> > > > > > Common Driver RejectNegative JoinedOrMissing
> > > > > > Generate debug information in default format.
> > > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > > > index 5c373c0..fd1aa9c 100644
> > > > > > --- a/gcc/config/i386/i386.c
> > > > > > +++ b/gcc/config/i386/i386.c
> > > > > > @@ -3551,6 +3551,48 @@ ix86_function_value_regno_p (const unsigned int regno)
> > > > > >  return false;
> > > > > > }
> > > > > >
> > > > > > +/* TARGET_ZERO_CALL_USED_REGNO_P.  */
> > > > > > +
> > > > > > +static bool
> > > > > > +ix86_zero_call_used_regno_p (const unsigned int regno,
> > > > > > +                          bool gpr_only)
> > > > > > +{
> > > > > > +  return GENERAL_REGNO_P (regno) || (!gpr_only && SSE_REGNO_P (regno));
> > > > > > +}
> > > > > > +
> > > > > > +/* TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > > > > > +
> > > > > > +static machine_mode
> > > > > > +ix86_zero_call_used_regno_mode (const unsigned int regno, machine_mode)
> > > > > > +{
> > > > > > +  /* NB: We only need to zero the lower 32 bits for integer registers
> > > > > > +     and the lower 128 bits for vector registers since destination are
> > > > > > +     zero-extended to the full register width.  */
> > > > > > +  return GENERAL_REGNO_P (regno) ? SImode : V4SFmode;
> > > > > > +}
> > > > > > +
> > > > > > +/* TARGET_ZERO_ALL_VECTOR_REGISTERS.  */
> > > > > > +
> > > > > > +static rtx
> > > > > > +ix86_zero_all_vector_registers (bool used_only)
> > > > > > +{
> > > > > > +  if (!TARGET_AVX)
> > > > > > +    return NULL;
> > > > > > +
> > > > > > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > > > > > +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> > > > > > +      || (TARGET_64BIT
> > > > > > +          && (REX_SSE_REGNO_P (regno)
> > > > > > +              || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> > > > > > +     && (!this_target_hard_regs->x_call_used_regs[regno]
> > > > > > +         || fixed_regs[regno]
> > > > > > +         || is_live_reg_at_exit (regno)
> > > > > > +         || (used_only && !df_regs_ever_live_p (regno))))
> > > > > > +      return NULL;
> > > > > > +
> > > > > > +  return gen_avx_vzeroall ();
> > > > > > +}
> > > > > > +
> > > > > > /* Define how to find the value returned by a function.
> > > > > >   VALTYPE is the data type of the value (as a tree).
> > > > > >   If the precise function being called is known, FUNC is its FUNCTION_DECL;
> > > > > > @@ -8513,7 +8555,7 @@ ix86_expand_prologue (void)
> > > > > >      insn = emit_insn (gen_set_got (pic));
> > > > > >      RTX_FRAME_RELATED_P (insn) = 1;
> > > > > >      add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
> > > > > > -      emit_insn (gen_prologue_use (pic));
> > > > > > +      emit_insn (gen_pro_epilogue_use (pic));
> > > > > >      /* Deleting already emmitted SET_GOT if exist and allocated to
> > > > > >        REAL_PIC_OFFSET_TABLE_REGNUM.  */
> > > > > >      ix86_elim_entry_set_got (pic);
> > > > > > @@ -8542,7 +8584,7 @@ ix86_expand_prologue (void)
> > > > > >     Further, prevent alloca modifications to the stack pointer from being
> > > > > >     combined with prologue modifications.  */
> > > > > >  if (TARGET_SEH)
> > > > > > -    emit_insn (gen_prologue_use (stack_pointer_rtx));
> > > > > > +    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));
> > > > > > }
> > > > > >
> > > > > > /* Emit code to restore REG using a POP insn.  */
> > > > > > @@ -23319,6 +23361,18 @@ ix86_run_selftests (void)
> > > > > > #undef TARGET_FUNCTION_VALUE_REGNO_P
> > > > > > #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p
> > > > > >
> > > > > > +#undef TARGET_ZERO_CALL_USED_REGNO_P
> > > > > > +#define TARGET_ZERO_CALL_USED_REGNO_P ix86_zero_call_used_regno_p
> > > > > > +
> > > > > > +#undef TARGET_ZERO_CALL_USED_REGNO_MODE
> > > > > > +#define TARGET_ZERO_CALL_USED_REGNO_MODE ix86_zero_call_used_regno_mode
> > > > > > +
> > > > > > +#undef TARGET_PRO_EPILOGUE_USE
> > > > > > +#define TARGET_PRO_EPILOGUE_USE gen_pro_epilogue_use
> > > > > > +
> > > > > > +#undef TARGET_ZERO_ALL_VECTOR_REGISTERS
> > > > > > +#define TARGET_ZERO_ALL_VECTOR_REGISTERS ix86_zero_all_vector_registers
> > > > > > +
> > > > > > #undef TARGET_PROMOTE_FUNCTION_MODE
> > > > > > #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
> > > > > >
> > > > > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > > > > > index d0ecd9e..e7df59f 100644
> > > > > > --- a/gcc/config/i386/i386.md
> > > > > > +++ b/gcc/config/i386/i386.md
> > > > > > @@ -194,7 +194,7 @@
> > > > > >  UNSPECV_STACK_PROBE
> > > > > >  UNSPECV_PROBE_STACK_RANGE
> > > > > >  UNSPECV_ALIGN
> > > > > > -  UNSPECV_PROLOGUE_USE
> > > > > > +  UNSPECV_PRO_EPILOGUE_USE
> > > > > >  UNSPECV_SPLIT_STACK_RETURN
> > > > > >  UNSPECV_CLD
> > > > > >  UNSPECV_NOPS
> > > > > > @@ -13525,8 +13525,8 @@
> > > > > >
> > > > > > ;; As USE insns aren't meaningful after reload, this is used instead
> > > > > > ;; to prevent deleting instructions setting registers for PIC code
> > > > > > -(define_insn "prologue_use"
> > > > > > -  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]
> > > > > > +(define_insn "pro_epilogue_use"
> > > > > > +  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
> > > > > >  ""
> > > > > >  ""
> > > > > >  [(set_attr "length" "0")])
> > > > > > diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> > > > > > index 6b6cfcd..e56d6ec 100644
> > > > > > --- a/gcc/coretypes.h
> > > > > > +++ b/gcc/coretypes.h
> > > > > > @@ -418,6 +418,16 @@ enum symbol_visibility
> > > > > >  VISIBILITY_INTERNAL
> > > > > > };
> > > > > >
> > > > > > +/* Zero call-used registers type.  */
> > > > > > +enum zero_call_used_regs {
> > > > > > +  zero_call_used_regs_unset = 0,
> > > > > > +  zero_call_used_regs_skip,
> > > > > > +  zero_call_used_regs_used_gpr,
> > > > > > +  zero_call_used_regs_all_gpr,
> > > > > > +  zero_call_used_regs_used,
> > > > > > +  zero_call_used_regs_all
> > > > > > +};
> > > > > > +
> > > > > > /* enums used by the targetm.excess_precision hook.  */
> > > > > >
> > > > > > enum flt_eval_method
> > > > > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > > > > > index c800b74..b32c55f 100644
> > > > > > --- a/gcc/doc/extend.texi
> > > > > > +++ b/gcc/doc/extend.texi
> > > > > > @@ -3984,6 +3984,17 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
> > > > > > A declaration to which @code{weakref} is attached and that is associated
> > > > > > with a named @code{target} must be @code{static}.
> > > > > >
> > > > > > +@item zero_call_used_regs ("@var{choice}")
> > > > > > +@cindex @code{zero_call_used_regs} function attribute
> > > > > > +The @code{zero_call_used_regs} attribute causes the compiler to zero
> > > > > > +call-used registers at function return according to @var{choice}.
> > > > > > +@samp{skip} doesn't zero call-used registers. @samp{used-gpr} zeros
> > > > > > +call-used general purpose registers which are used in funciton.
> > > > > > +@samp{all-gpr} zeros all call-used general purpose registers.
> > > > > > +@samp{used} zeros call-used registers which are used in function.
> > > > > > +@samp{all} zeros all call-used registers.  The default for the
> > > > > > +attribute is controlled by @option{-fzero-call-used-regs}.
> > > > > > +
> > > > > > @end table
> > > > > >
> > > > > > @c This is the end of the target-independent attribute table
> > > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > > > > index 09bcc5b..da02686 100644
> > > > > > --- a/gcc/doc/invoke.texi
> > > > > > +++ b/gcc/doc/invoke.texi
> > > > > > @@ -542,7 +542,7 @@ Objective-C and Objective-C++ Dialects}.
> > > > > > -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
> > > > > > -funsafe-math-optimizations  -funswitch-loops @gol
> > > > > > -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
> > > > > > --fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
> > > > > > +-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
> > > > > > --param @var{name}=@var{value}
> > > > > > -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}
> > > > > >
> > > > > > @@ -12273,6 +12273,17 @@ int foo (void)
> > > > > >
> > > > > > Not all targets support this option.
> > > > > >
> > > > > > +@item -fzero-call-used-regs=@var{choice}
> > > > > > +@opindex fzero-call-used-regs
> > > > > > +Zero call-used registers at function return according to
> > > > > > +@var{choice}.  @samp{skip}, which is the default, doesn't zero
> > > > > > +call-used registers.  @samp{used-gpr} zeros call-used general purpose
> > > > > > +registers which are used in function.  @samp{all-gpr} zeros all
> > > > > > +call-used registers.  @samp{used} zeros call-used registers which
> > > > > > +are used in function.  @samp{all} zeros all call-used registers.  You
> > > > > > +can control this behavior for a specific function by using the function
> > > > > > +attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
> > > > > > +
> > > > > > @item --param @var{name}=@var{value}
> > > > > > @opindex param
> > > > > > In some places, GCC uses various constants to control the amount of
> > > > > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > > > > > index 6e7d9dc..43dddd3 100644
> > > > > > --- a/gcc/doc/tm.texi
> > > > > > +++ b/gcc/doc/tm.texi
> > > > > > @@ -4571,6 +4571,22 @@ should recognize only the caller's register numbers.
> > > > > > If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.
> > > > > > @end deftypefn
> > > > > >
> > > > > > +@deftypefn {Target Hook} bool TARGET_ZERO_CALL_USED_REGNO_P (const unsigned int @var{regno}, bool @var{general_reg_only_p})
> > > > > > +A target hook that returns @code{true} if @var{regno} is the number of a
> > > > > > +call used register.  If @var{general_reg_only_p} is @code{true},
> > > > > > +@var{regno} must be the number of a hard general register.
> > > > > > +
> > > > > > +If this hook is not defined, then default_zero_call_used_regno_p will be used.
> > > > > > +@end deftypefn
> > > > > > +
> > > > > > +@deftypefn {Target Hook} machine_mode TARGET_ZERO_CALL_USED_REGNO_MODE (const unsigned int @var{regno}, machine_mode @var{mode})
> > > > > > +A target hook that returns a mode of suitable to zero the register for the
> > > > > > +call used register @var{regno} in @var{mode}.
> > > > > > +
> > > > > > +If this hook is not defined, then default_zero_call_used_regno_mode will be
> > > > > > +used.
> > > > > > +@end deftypefn
> > > > > > +
> > > > > > @defmac APPLY_RESULT_SIZE
> > > > > > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > > > > > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > > > > > @@ -12043,6 +12059,17 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
> > > > > > is needed.
> > > > > > @end deftypefn
> > > > > >
> > > > > > +@deftypefn {Target Hook} rtx TARGET_PRO_EPILOGUE_USE (rtx @var{reg})
> > > > > > +This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
> > > > > > +prevent deleting register setting instructions in proprologue and epilogue.
> > > > > > +@end deftypefn
> > > > > > +
> > > > > > +@deftypefn {Target Hook} rtx TARGET_ZERO_ALL_VECTOR_REGISTERS (bool @var{used_only})
> > > > > > +This hook should return an rtx to zero all vector registers at function
> > > > > > +exit.  If @var{used_only} is @code{true}, only used vector registers should
> > > > > > +be zeroed.  Return @code{NULL} if possible
> > > > > > +@end deftypefn
> > > > > > +
> > > > > > @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
> > > > > > When optimization is disabled, this hook indicates whether or not
> > > > > > arguments should be allocated to stack slots.  Normally, GCC allocates
> > > > > > diff --git a/gcc/doc/tm.texi.in <http://tm.texi.in/> b/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > > > > index 3be984b..bee917a 100644
> > > > > > --- a/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > > > > +++ b/gcc/doc/tm.texi.in <http://tm.texi.in/>
> > > > > > @@ -3430,6 +3430,10 @@ for a new target instead.
> > > > > >
> > > > > > @hook TARGET_FUNCTION_VALUE_REGNO_P
> > > > > >
> > > > > > +@hook TARGET_ZERO_CALL_USED_REGNO_P
> > > > > > +
> > > > > > +@hook TARGET_ZERO_CALL_USED_REGNO_MODE
> > > > > > +
> > > > > > @defmac APPLY_RESULT_SIZE
> > > > > > Define this macro if @samp{untyped_call} and @samp{untyped_return}
> > > > > > need more space than is implied by @code{FUNCTION_VALUE_REGNO_P} for
> > > > > > @@ -8109,6 +8113,10 @@ and the associated definitions of those functions.
> > > > > >
> > > > > > @hook TARGET_GET_DRAP_RTX
> > > > > >
> > > > > > +@hook TARGET_PRO_EPILOGUE_USE
> > > > > > +
> > > > > > +@hook TARGET_ZERO_ALL_VECTOR_REGISTERS
> > > > > > +
> > > > > > @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
> > > > > >
> > > > > > @hook TARGET_CONST_ANCHOR
> > > > > > diff --git a/gcc/function.c b/gcc/function.c
> > > > > > index 9eee9b5..9908530 100644
> > > > > > --- a/gcc/function.c
> > > > > > +++ b/gcc/function.c
> > > > > > @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
> > > > > > #include "emit-rtl.h"
> > > > > > #include "recog.h"
> > > > > > #include "rtl-error.h"
> > > > > > +#include "hard-reg-set.h"
> > > > > > #include "alias.h"
> > > > > > #include "fold-const.h"
> > > > > > #include "stor-layout.h"
> > > > > > @@ -5808,6 +5809,147 @@ make_prologue_seq (void)
> > > > > >  return seq;
> > > > > > }
> > > > > >
> > > > > > +/* Check whether the hard register REGNO is live at the exit block
> > > > > > + * of the current routine.  */
> > > > > > +bool
> > > > > > +is_live_reg_at_exit (unsigned int regno)
> > > > > > +{
> > > > > > +  edge e;
> > > > > > +  edge_iterator ei;
> > > > > > +
> > > > > > +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> > > > > > +    {
> > > > > > +      bitmap live_out = df_get_live_out (e->src);
> > > > > > +      if (REGNO_REG_SET_P (live_out, regno))
> > > > > > +     return true;
> > > > > > +    }
> > > > > > +
> > > > > > +  return false;
> > > > > > +}
> > > > > > +
> > > > > > +/* Emit a sequence of insns to zero the call-used-registers for the current
> > > > > > + * function.  */
> >
> > No '*' on the continuation line
> >
> > > > > > +
> > > > > > +static void
> > > > > > +gen_call_used_regs_seq (void)
> > > > > > +{
> > > > > > +  if (!targetm.calls.pro_epilogue_use)
> > > > > > +    return;
> > > > > > +
> > > > > > +  bool gpr_only = true;
> > > > > > +  bool used_only = true;
> > > > > > +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> > > > > > +
> > > > > > +  if (flag_zero_call_used_regs)
> > > > > > +    if (DECL_ZERO_CALL_USED_REGS (current_function_decl)
> > > > > > +     == zero_call_used_regs_unset)
> > > > > > +      zero_call_used_regs_type = flag_zero_call_used_regs;
> > > > > > +    else
> > > > > > +      zero_call_used_regs_type
> > > > > > +     = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> > > > > > +  else
> > > > > > +    zero_call_used_regs_type = DECL_ZERO_CALL_USED_REGS (current_function_decl);
> > > > > > +
> > > > > > +  /* No need to zero call-used-regs when no user request is present.  */
> > > > > > +  if (zero_call_used_regs_type <= zero_call_used_regs_skip)
> > > > > > +    return;
> > > > > > +
> > > > > > +  /* No need to zero call-used-regs in main ().  */
> > > > > > +  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
> > > > > > +    return;
> > > > > > +
> > > > > > +  /* No need to zero call-used-regs if __builtin_eh_return is called
> > > > > > +     since it isn't a normal function return.  */
> > > > > > +  if (crtl->calls_eh_return)
> > > > > > +    return;
> > > > > > +
> > > > > > +  /* If gpr_only is true, only zero call-used-registers that are
> > > > > > +     general-purpose registers; if used_only is true, only zero
> > > > > > +     call-used-registers that are used in the current function.  */
> > > > > > +  switch (zero_call_used_regs_type)
> > > > > > +    {
> > > > > > +      case zero_call_used_regs_all_gpr:
> > > > > > +     used_only = false;
> > > > > > +     break;
> > > > > > +      case zero_call_used_regs_used:
> > > > > > +     gpr_only = false;
> > > > > > +     break;
> > > > > > +      case zero_call_used_regs_all:
> > > > > > +     gpr_only = false;
> > > > > > +     used_only = false;
> > > > > > +     break;
> > > > > > +      default:
> > > > > > +     break;
> > > > > > +    }
> > > > > > +
> > > > > > +  /* An optimization to use a single hard insn to zero all vector registers on
> > > > > > +     the target that provides such insn.  */
> > > > > > +  if (!gpr_only
> > > > > > +      && targetm.calls.zero_all_vector_registers)
> > > > > > +    {
> > > > > > +      rtx zero_all_vec_insn
> > > > > > +     = targetm.calls.zero_all_vector_registers (used_only);
> > > > > > +      if (zero_all_vec_insn)
> > > > > > +     {
> > > > > > +       emit_insn (zero_all_vec_insn);
> > > > > > +       gpr_only = true;
> > > > > > +     }
> > > > > > +    }
> > > > > > +
> > > > > > +  /* For each of the hard registers, check to see whether we should zero it if:
> > > > > > +     1. it is a call-used-registers;
> > > > > > + and 2. it is not a fixed-registers;
> > > > > > + and 3. it is not live at the end of the routine;
> > > > > > + and 4. it is general purpose register if gpr_only is true;
> > > > > > + and 5. it is used in the routine if used_only is true;
> > > > > > +   */
> > > > > > +
> > > > > > +  /* This array holds the zero rtx with the correponding machine mode.  */
> > > > > > +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
> > > > > > +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> > > > > > +    zero_rtx[i] = NULL_RTX;
> > > > > > +
> > > > > > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > > > > > +    {
> > > > > > +      if (!this_target_hard_regs->x_call_used_regs[regno])
> >
> > Use if (!call_used_regs[regno])
> >
> > > > > > +     continue;
> > > > > > +      if (fixed_regs[regno])
> > > > > > +     continue;
> > > > > > +      if (is_live_reg_at_exit (regno))
> > > > > > +     continue;
> >
> > How can a call-used reg be live at exit?
> >
> > > > > > +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
> > > > > > +     continue;
> >
> > Why does the target need some extra say here?
> >
> > > > > > +      if (used_only && !df_regs_ever_live_p (regno))
> >
> > So I suppose this does not include uses by callees of this function?
> >
> > > > > > +     continue;
> > > > > > +
> > > > > > +      /* Now we can emit insn to zero this register.  */
> > > > > > +      rtx reg, tmp;
> > > > > > +
> > > > > > +      machine_mode mode
> > > > > > +     = targetm.calls.zero_call_used_regno_mode (regno,
> > > > > > +                                                reg_raw_mode[regno]);
> >
> > In what case does the target ever need to adjust this (we're dealing
> > with hard-regs only?)?
> >
> > > > > > +      if (mode == VOIDmode)
> > > > > > +     continue;
> > > > > > +      if (!have_regs_of_mode[mode])
> > > > > > +     continue;
> >
> > When does this happen?
> >
> > > > > > +
> > > > > > +      reg = gen_rtx_REG (mode, regno);
> > > > > > +      if (zero_rtx[(int)mode] == NULL_RTX)
> > > > > > +     {
> > > > > > +       zero_rtx[(int)mode] = reg;
> > > > > > +       tmp = gen_rtx_SET (reg, const0_rtx);
> > > > > > +       emit_insn (tmp);
> > > > > > +     }
> > > > > > +      else
> > > > > > +     emit_move_insn (reg, zero_rtx[(int)mode]);
> >
> > Not sure but I think the canonical zero to use is CONST0_RTX (mode)
> > but I may be wrong.  I'd rather have the target be able to specify
> > some special instruction for zeroing here.  Some may have
> > multi-reg set instructions for example.  That said, can't we
> > defer the actual zeroing to the target in full and only compute
> > a hard-reg-set of to-be zerored registers here and pass that
> > to a target hook?
> >
> > > > > > +
> > > > > > +      emit_insn (targetm.calls.pro_epilogue_use (reg));
> > > > > > +    }
> > > > > > +
> > > > > > +  return;
> > > > > > +}
> > > > > > +
> > > > > > +
> > > > > > /* Return a sequence to be used as the epilogue for the current function,
> > > > > >   or NULL.  */
> > > > > >
> > > > > > @@ -5819,6 +5961,9 @@ make_epilogue_seq (void)
> > > > > >
> > > > > >  start_sequence ();
> > > > > >  emit_note (NOTE_INSN_EPILOGUE_BEG);
> > > > > > +
> > > > > > +  gen_call_used_regs_seq ();
> > > > > > +
> >
> > The caller eventually performs shrink-wrapping - are you sure that
> > doesn't mess up things?
> >
> > > > > >  rtx_insn *seq = targetm.gen_epilogue ();
> > > > > >  if (seq)
> > > > > >    emit_jump_insn (seq);
> > > > > > diff --git a/gcc/function.h b/gcc/function.h
> > > > > > index d55cbdd..fc36c3e 100644
> > > > > > --- a/gcc/function.h
> > > > > > +++ b/gcc/function.h
> > > > > > @@ -705,4 +705,6 @@ extern const char *current_function_name (void);
> > > > > >
> > > > > > extern void used_types_insert (tree);
> > > > > >
> > > > > > +extern bool is_live_reg_at_exit (unsigned int);
> > > > > > +
> > > > > > #endif  /* GCC_FUNCTION_H */
> > > > > > diff --git a/gcc/target.def b/gcc/target.def
> > > > > > index 07059a8..8aab63e 100644
> > > > > > --- a/gcc/target.def
> > > > > > +++ b/gcc/target.def
> > > > > > @@ -5022,6 +5022,26 @@ If this hook is not defined, then FUNCTION_VALUE_REGNO_P will be used.",
> > > > > > default_function_value_regno_p)
> > > > > >
> > > > > > DEFHOOK
> > > > > > +(zero_call_used_regno_p,
> > > > > > + "A target hook that returns @code{true} if @var{regno} is the number of a\n\
> > > > > > +call used register.  If @var{general_reg_only_p} is @code{true},\n\
> > > > > > +@var{regno} must be the number of a hard general register.\n\
> > > > > > +\n\
> > > > > > +If this hook is not defined, then default_zero_call_used_regno_p will be used.",
> > > > > > + bool, (const unsigned int regno, bool general_reg_only_p),
> > > > > > + default_zero_call_used_regno_p)
> > > > > > +
> > > > > > +DEFHOOK
> > > > > > +(zero_call_used_regno_mode,
> > > > > > + "A target hook that returns a mode of suitable to zero the register for the\n\
> > > > > > +call used register @var{regno} in @var{mode}.\n\
> > > > > > +\n\
> > > > > > +If this hook is not defined, then default_zero_call_used_regno_mode will be\n\
> > > > > > +used.",
> > > > > > + machine_mode, (const unsigned int regno, machine_mode mode),
> > > > > > + default_zero_call_used_regno_mode)
> > > > > > +
> > > > > > +DEFHOOK
> > > > > > (fntype_abi,
> > > > > > "Return the ABI used by a function with type @var{type}; see the\n\
> > > > > > definition of @code{predefined_function_abi} for details of the ABI\n\
> > > > > > @@ -5068,6 +5088,19 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
> > > > > > is needed.",
> > > > > > rtx, (void), NULL)
> > > > > >
> > > > > > +DEFHOOK
> > > > > > +(pro_epilogue_use,
> > > > > > + "This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to\n\
> > > > > > +prevent deleting register setting instructions in proprologue and epilogue.",
> > > > > > + rtx, (rtx reg), NULL)
> > > > > > +
> > > > > > +DEFHOOK
> > > > > > +(zero_all_vector_registers,
> > > > > > + "This hook should return an rtx to zero all vector registers at function\n\
> > > > > > +exit.  If @var{used_only} is @code{true}, only used vector registers should\n\
> > > > > > +be zeroed.  Return @code{NULL} if possible",
> > > > > > + rtx, (bool used_only), NULL)
> > > > > > +
> > > > > > /* Return true if all function parameters should be spilled to the
> > > > > >   stack.  */
> > > > > > DEFHOOK
> > > > > > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> > > > > > index 0113c7b..ed02173 100644
> > > > > > --- a/gcc/targhooks.c
> > > > > > +++ b/gcc/targhooks.c
> > > > > > @@ -987,6 +987,23 @@ default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
> > > > > > #endif
> > > > > > }
> > > > > >
> > > > > > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_P.  */
> > > > > > +
> > > > > > +bool
> > > > > > +default_zero_call_used_regno_p (const unsigned int,
> > > > > > +                             bool)
> > > > > > +{
> > > > > > +  return false;
> > > > > > +}
> > > > > > +
> > > > > > +/* The default hook for TARGET_ZERO_CALL_USED_REGNO_MODE.  */
> > > > > > +
> > > > > > +machine_mode
> > > > > > +default_zero_call_used_regno_mode (const unsigned int, machine_mode mode)
> > > > > > +{
> > > > > > +  return mode;
> > > > > > +}
> > > > > > +
> > > > > > rtx
> > > > > > default_internal_arg_pointer (void)
> > > > > > {
> > > > > > diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> > > > > > index b572a36..370df19 100644
> > > > > > --- a/gcc/targhooks.h
> > > > > > +++ b/gcc/targhooks.h
> > > > > > @@ -162,6 +162,9 @@ extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
> > > > > > extern rtx default_function_value (const_tree, const_tree, bool);
> > > > > > extern rtx default_libcall_value (machine_mode, const_rtx);
> > > > > > extern bool default_function_value_regno_p (const unsigned int);
> > > > > > +extern bool default_zero_call_used_regno_p (const unsigned int, bool);
> > > > > > +extern machine_mode default_zero_call_used_regno_mode (const unsigned int,
> > > > > > +                                                    machine_mode);
> > > > > > extern rtx default_internal_arg_pointer (void);
> > > > > > extern rtx default_static_chain (const_tree, bool);
> > > > > > extern void default_trampoline_init (rtx, tree, rtx);
> > > > > > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > > > > new file mode 100644
> > > > > > index 0000000..3c2ac72
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> > > > > > @@ -0,0 +1,3 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > > > > +/* { dg-error "'-fzero-call-used-regs=' is not supported for this target" "" { target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> > > > > > diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > > > > new file mode 100644
> > > > > > index 0000000..acf48c4
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> > > > > > @@ -0,0 +1,4 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2" } */
> > > > > > +
> > > > > > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr"))); /* { dg-warning " attribute directive ignored" "" {target { ! "i?86-*-* x86_64-*-*" } } 0 } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > > > > new file mode 100644
> > > > > > index 0000000..9f61dc4
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> > > > > > @@ -0,0 +1,12 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > > > > new file mode 100644
> > > > > > index 0000000..09048e5
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> > > > > > @@ -0,0 +1,21 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > > +
> > > > > > +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
> > > > > > +
> > > > > > +int
> > > > > > +foo (int x)
> > > > > > +{
> > > > > > +  return x;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > > > > new file mode 100644
> > > > > > index 0000000..4862688
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> > > > > > @@ -0,0 +1,39 @@
> > > > > > +/* { dg-do run { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > > > > > +
> > > > > > +struct S { int i; };
> > > > > > +__attribute__((const, noinline, noclone))
> > > > > > +struct S foo (int x)
> > > > > > +{
> > > > > > +  struct S s;
> > > > > > +  s.i = x;
> > > > > > +  return s;
> > > > > > +}
> > > > > > +
> > > > > > +int a[2048], b[2048], c[2048], d[2048];
> > > > > > +struct S e[2048];
> > > > > > +
> > > > > > +__attribute__((noinline, noclone)) void
> > > > > > +bar (void)
> > > > > > +{
> > > > > > +  int i;
> > > > > > +  for (i = 0; i < 1024; i++)
> > > > > > +    {
> > > > > > +      e[i] = foo (i);
> > > > > > +      a[i+2] = a[i] + a[i+1];
> > > > > > +      b[10] = b[10] + i;
> > > > > > +      c[i] = c[2047 - i];
> > > > > > +      d[i] = d[i + 1];
> > > > > > +    }
> > > > > > +}
> > > > > > +
> > > > > > +int
> > > > > > +main ()
> > > > > > +{
> > > > > > +  int i;
> > > > > > +  bar ();
> > > > > > +  for (i = 0; i < 1024; i++)
> > > > > > +    if (e[i].i != i)
> > > > > > +      __builtin_abort ();
> > > > > > +  return 0;
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > > > > new file mode 100644
> > > > > > index 0000000..500251b
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> > > > > > @@ -0,0 +1,39 @@
> > > > > > +/* { dg-do run { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > > > +
> > > > > > +struct S { int i; };
> > > > > > +__attribute__((const, noinline, noclone))
> > > > > > +struct S foo (int x)
> > > > > > +{
> > > > > > +  struct S s;
> > > > > > +  s.i = x;
> > > > > > +  return s;
> > > > > > +}
> > > > > > +
> > > > > > +int a[2048], b[2048], c[2048], d[2048];
> > > > > > +struct S e[2048];
> > > > > > +
> > > > > > +__attribute__((noinline, noclone)) void
> > > > > > +bar (void)
> > > > > > +{
> > > > > > +  int i;
> > > > > > +  for (i = 0; i < 1024; i++)
> > > > > > +    {
> > > > > > +      e[i] = foo (i);
> > > > > > +      a[i+2] = a[i] + a[i+1];
> > > > > > +      b[10] = b[10] + i;
> > > > > > +      c[i] = c[2047 - i];
> > > > > > +      d[i] = d[i + 1];
> > > > > > +    }
> > > > > > +}
> > > > > > +
> > > > > > +int
> > > > > > +main ()
> > > > > > +{
> > > > > > +  int i;
> > > > > > +  bar ();
> > > > > > +  for (i = 0; i < 1024; i++)
> > > > > > +    if (e[i].i != i)
> > > > > > +      __builtin_abort ();
> > > > > > +  return 0;
> > > > > > +}
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > > > > new file mode 100644
> > > > > > index 0000000..8b058e3
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> > > > > > @@ -0,0 +1,21 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> > > > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > > > > new file mode 100644
> > > > > > index 0000000..d4eaaf7
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> > > > > > @@ -0,0 +1,19 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > > > > new file mode 100644
> > > > > > index 0000000..dd3bb90
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> > > > > > @@ -0,0 +1,14 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > > +
> > > > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > > > > new file mode 100644
> > > > > > index 0000000..e2274f6
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> > > > > > @@ -0,0 +1,14 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> > > > > > +
> > > > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > > > > new file mode 100644
> > > > > > index 0000000..7f5d153
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> > > > > > @@ -0,0 +1,13 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> > > > > > +
> > > > > > +int
> > > > > > +foo (int x)
> > > > > > +{
> > > > > > +  return x;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > > > > new file mode 100644
> > > > > > index 0000000..fe13d2b
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> > > > > > @@ -0,0 +1,13 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > > > > > +
> > > > > > +float
> > > > > > +foo (float z, float y, float x)
> > > > > > +{
> > > > > > +  return x + y;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > > > > new file mode 100644
> > > > > > index 0000000..205a532
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> > > > > > @@ -0,0 +1,12 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> > > > > > +
> > > > > > +float
> > > > > > +foo (float z, float y, float x)
> > > > > > +{
> > > > > > +  return x;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > > > > new file mode 100644
> > > > > > index 0000000..e046684
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> > > > > > @@ -0,0 +1,19 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > > > > new file mode 100644
> > > > > > index 0000000..4be8ff6
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> > > > > > @@ -0,0 +1,23 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> > > > > > +
> > > > > > +float
> > > > > > +foo (float z, float y, float x)
> > > > > > +{
> > > > > > +  return x + y;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > > > > new file mode 100644
> > > > > > index 0000000..0eb34e0
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> > > > > > @@ -0,0 +1,14 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
> > > > > > +
> > > > > > +__attribute__ ((zero_call_used_regs("used")))
> > > > > > +float
> > > > > > +foo (float z, float y, float x)
> > > > > > +{
> > > > > > +  return x + y;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > > > > new file mode 100644
> > > > > > index 0000000..cbb63a4
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> > > > > > @@ -0,0 +1,19 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > > > > new file mode 100644
> > > > > > index 0000000..7573197
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> > > > > > @@ -0,0 +1,19 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > > > > new file mode 100644
> > > > > > index 0000000..de71223
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> > > > > > @@ -0,0 +1,12 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > > > > new file mode 100644
> > > > > > index 0000000..ccfa441
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> > > > > > @@ -0,0 +1,14 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > > +
> > > > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > > > > new file mode 100644
> > > > > > index 0000000..6b46ca3
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> > > > > > @@ -0,0 +1,20 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > > +
> > > > > > +__attribute__ ((zero_call_used_regs("all-gpr")))
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > > > > new file mode 100644
> > > > > > index 0000000..0680f38
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> > > > > > @@ -0,0 +1,14 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > > > +
> > > > > > +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> > > > > > +
> > > > > > +void
> > > > > > +foo (void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> > > > > > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > > > > new file mode 100644
> > > > > > index 0000000..534defa
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> > > > > > @@ -0,0 +1,13 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> > > > > > +
> > > > > > +int
> > > > > > +foo (int x)
> > > > > > +{
> > > > > > +  return x;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > > > > new file mode 100644
> > > > > > index 0000000..477bb19
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> > > > > > @@ -0,0 +1,19 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> > > > > > +
> > > > > > +int
> > > > > > +foo (int x)
> > > > > > +{
> > > > > > +  return x;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> > > > > > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > > > > new file mode 100644
> > > > > > index 0000000..a305a60
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> > > > > > @@ -0,0 +1,15 @@
> > > > > > +/* { dg-do compile { target *-*-linux* } } */
> > > > > > +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> > > > > > +
> > > > > > +extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
> > > > > > +
> > > > > > +int
> > > > > > +foo (int x)
> > > > > > +{
> > > > > > +  return x;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "vzeroall" } } */
> > > > > > +/* { dg-final { scan-assembler-not "%xmm" } } */
> > > > > > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> > > > > > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> > > > > > diff --git a/gcc/toplev.c b/gcc/toplev.c
> > > > > > index 95eea63..01a1f24 100644
> > > > > > --- a/gcc/toplev.c
> > > > > > +++ b/gcc/toplev.c
> > > > > > @@ -1464,6 +1464,15 @@ process_options (void)
> > > > > >       }
> > > > > >    }
> > > > > >
> > > > > > +  if (flag_zero_call_used_regs != zero_call_used_regs_skip
> > > > > > +      && !targetm.calls.pro_epilogue_use)
> > > > > > +    {
> > > > > > +      error_at (UNKNOWN_LOCATION,
> > > > > > +             "%<-fzero-call-used-regs=%> is not supported for this "
> > > > > > +             "target");
> > > > > > +      flag_zero_call_used_regs = zero_call_used_regs_skip;
> > > > > > +    }
> > > > > > +
> > > > > >  /* One region RA really helps to decrease the code size.  */
> > > > > >  if (flag_ira_region == IRA_REGION_AUTODETECT)
> > > > > >    flag_ira_region
> > > > > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > > > > > index 8c5a2e3..71badbd 100644
> > > > > > --- a/gcc/tree-core.h
> > > > > > +++ b/gcc/tree-core.h
> > > > > > @@ -1825,7 +1825,11 @@ struct GTY(()) tree_decl_with_vis {
> > > > > > unsigned final : 1;
> > > > > > /* Belong to FUNCTION_DECL exclusively.  */
> > > > > > unsigned regdecl_flag : 1;
> > > > > > - /* 14 unused bits. */
> > > > > > +
> > > > > > + /* How to clear call-used registers upon function return.  */
> > > > > > + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
> > > > > > +
> > > > > > + /* 11 unused bits.  */
> >
> > So instead of wasting "precious" bits please use lookup_attribute
> > in the single place you query this value (which is once per function).
> > There's no need to complicate matters by trying to maintain the above.
> >
> > > > > > };
> > > > > >
> > > > > > struct GTY(()) tree_var_decl {
> > > > > > diff --git a/gcc/tree.h b/gcc/tree.h
> > > > > > index cf546ed..d378a88 100644
> > > > > > --- a/gcc/tree.h
> > > > > > +++ b/gcc/tree.h
> > > > > > @@ -2925,6 +2925,11 @@ extern void decl_value_expr_insert (tree, tree);
> > > > > > #define DECL_VISIBILITY(NODE) \
> > > > > >  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.visibility)
> > > > > >
> > > > > > +/* Value of the function decl's type of zeroing the call used
> > > > > > +   registers upon return from function.  */
> > > > > > +#define DECL_ZERO_CALL_USED_REGS(NODE) \
> > > > > > +  (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.zero_call_used_regs_type)
> > > > > > +
> > > > > > /* Nonzero means that the decl (or an enclosing scope) had its
> > > > > >   visibility specified rather than being inferred.  */
> > > > > > #define DECL_VISIBILITY_SPECIFIED(NODE) \
> > > > > > --
> > > > > > 1.9.1
> > > > >
> > > >
> > >
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-05  7:06                   ` Richard Biener
@ 2020-08-05 12:26                     ` H.J. Lu
  2020-08-05 12:30                       ` Richard Biener
  0 siblings, 1 reply; 188+ messages in thread
From: H.J. Lu @ 2020-08-05 12:26 UTC (permalink / raw)
  To: Richard Biener
  Cc: Qing Zhao, Uros Bizjak, Jakub Jelinek, GCC Patches, Kees Cook,
	Rodriguez Bahena, Victor

On Wed, Aug 5, 2020 at 12:06 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Tue, 4 Aug 2020, H.J. Lu wrote:
>
> > On Tue, Aug 4, 2020 at 12:35 AM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Mon, 3 Aug 2020, Qing Zhao wrote:
> > >
> > > > Hi, Uros,
> > > >
> > > > Thanks a lot for your review on X86 parts.
> > > >
> > > > Hi, Richard,
> > > >
> > > > Could you please take a look at the middle-end part to see whether the
> > > > rewritten addressed your previous concern?
> > >
> > > I have a few comments below - I'm not sure I'm qualified to fully
> > > review the rest though.
> > >
> > > > Thanks a lot.
> > > >
> > > > Qing
> > > >
> > > >
> > > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > >
> > > > >
> > > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
> > > > > >
> > > > > >
> > > > > > Richard and Uros,
> > > > > >
> > > > > > Could you please review the change that H.J and I rewrote based on your comments in the previous round of discussion?
> > > > > >
> > > > > > This patch is a nice security enhancement for GCC that has been requested by security people for quite some time.
> > > > > >
> > > > > > Thanks a lot for your time.
> > > > >
> > > > > I'll be away from the keyboard for the next week, but the patch needs a middle end approval first.
> > > > >
> > > > > That said, x86 parts looks OK.
> > > > >
> > > > >
> > > >
> > > > > Uros.
> > > > > > Qing
> > > > > >
> > > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > > > > >
> > > > > > > Hi, Gcc team,
> > > > > > >
> > > > > > > This patch is a follow-up on the previous patch and corresponding discussion:
> > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html> <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
> > > > > > >
> > > > > > > From the previous round of discussion, the major issues raised were:
> > > > > > >
> > > > > > > A. should be rewritten by using regsets infrastructure.
> > > > > > > B. Put the patch into middle-end instead of x86 backend.
> > > > > > >
> > > > > > > This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
> > > > > > >
> > > > > > > 1. Change the names of the option and attribute from
> > > > > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > to:
> > > > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > Add the new option and  new attribute in general.
> > > > > > > 2. The main code generation part is moved from i386 backend to middle-end;
> > > > > > > 3. Add 4 target-hooks;
> > > > > > > 4. Implement these 4 target-hooks on i386 backend.
> > > > > > > 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
> > > > > > >
> > > > > > > The patch is as following:
> > > > > > >
> > > > > > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > > > > command-line option and
> > > > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> > > > > > >
> > > > > > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > > > > >
> > > > > > >  Don't zero call-used registers upon function return.
> > >
> > > Does a return via EH unwinding also constitute a function return?  I
> > > think you may want to have a finally handler or support in the unwinder
> > > for this?  Then there's abnormal return via longjmp & friends, I guess
> > > there's nothing that can be done there besides patching glibc?
> >
> > Abnormal returns, like EH unwinding and longjmp, aren't covered by this
> > patch. Only normal returns are covered.
>
> What's the point then?  Also specifically thinking about spill slots.
>

The goal of this patch is to zero caller-saved registers upon normal
function return.  Abnormal returns and spill slots are outside of the
scope of this patch.

-- 
H.J.

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-05 12:26                     ` H.J. Lu
@ 2020-08-05 12:30                       ` Richard Biener
  2020-08-05 12:34                         ` H.J. Lu
  0 siblings, 1 reply; 188+ messages in thread
From: Richard Biener @ 2020-08-05 12:30 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Qing Zhao, Uros Bizjak, Jakub Jelinek, GCC Patches, Kees Cook,
	Rodriguez Bahena, Victor

On Wed, 5 Aug 2020, H.J. Lu wrote:

> On Wed, Aug 5, 2020 at 12:06 AM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Tue, 4 Aug 2020, H.J. Lu wrote:
> >
> > > On Tue, Aug 4, 2020 at 12:35 AM Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Mon, 3 Aug 2020, Qing Zhao wrote:
> > > >
> > > > > Hi, Uros,
> > > > >
> > > > > Thanks a lot for your review on X86 parts.
> > > > >
> > > > > Hi, Richard,
> > > > >
> > > > > Could you please take a look at the middle-end part to see whether the
> > > > > rewritten addressed your previous concern?
> > > >
> > > > I have a few comments below - I'm not sure I'm qualified to fully
> > > > review the rest though.
> > > >
> > > > > Thanks a lot.
> > > > >
> > > > > Qing
> > > > >
> > > > >
> > > > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > > >
> > > > > >
> > > > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
> > > > > > >
> > > > > > >
> > > > > > > Richard and Uros,
> > > > > > >
> > > > > > > Could you please review the change that H.J and I rewrote based on your comments in the previous round of discussion?
> > > > > > >
> > > > > > > This patch is a nice security enhancement for GCC that has been requested by security people for quite some time.
> > > > > > >
> > > > > > > Thanks a lot for your time.
> > > > > >
> > > > > > I'll be away from the keyboard for the next week, but the patch needs a middle end approval first.
> > > > > >
> > > > > > That said, x86 parts looks OK.
> > > > > >
> > > > > >
> > > > >
> > > > > > Uros.
> > > > > > > Qing
> > > > > > >
> > > > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > > > > > >
> > > > > > > > Hi, Gcc team,
> > > > > > > >
> > > > > > > > This patch is a follow-up on the previous patch and corresponding discussion:
> > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html> <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
> > > > > > > >
> > > > > > > > From the previous round of discussion, the major issues raised were:
> > > > > > > >
> > > > > > > > A. should be rewritten by using regsets infrastructure.
> > > > > > > > B. Put the patch into middle-end instead of x86 backend.
> > > > > > > >
> > > > > > > > This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
> > > > > > > >
> > > > > > > > 1. Change the names of the option and attribute from
> > > > > > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > > to:
> > > > > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > > Add the new option and  new attribute in general.
> > > > > > > > 2. The main code generation part is moved from i386 backend to middle-end;
> > > > > > > > 3. Add 4 target-hooks;
> > > > > > > > 4. Implement these 4 target-hooks on i386 backend.
> > > > > > > > 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
> > > > > > > >
> > > > > > > > The patch is as following:
> > > > > > > >
> > > > > > > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > > > > > command-line option and
> > > > > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> > > > > > > >
> > > > > > > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > > > > > >
> > > > > > > >  Don't zero call-used registers upon function return.
> > > >
> > > > Does a return via EH unwinding also constitute a function return?  I
> > > > think you may want to have a finally handler or support in the unwinder
> > > > for this?  Then there's abnormal return via longjmp & friends, I guess
> > > > there's nothing that can be done there besides patching glibc?
> > >
> > > Abnormal returns, like EH unwinding and longjmp, aren't covered by this
> > > patch. Only normal returns are covered.
> >
> > What's the point then?  Also specifically thinking about spill slots.
> >
> 
> The goal of this patch is to zero caller-saved registers upon normal
> function return.  Abnormal returns and spill slots are outside of the
> scope of this patch.

Sure, I can write a patch that spills some regs, writes zeros to them
and then restores them.  And the patch will fulfil what it was designed
to do.

Still I need to come up with a reason that this is a useful feature
by its own for it to be accepted.

I am asking for that reason.  What's the reason for the "goal of this
patch"?  Why's that a useful goal on its own?

Richard.

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-05 12:30                       ` Richard Biener
@ 2020-08-05 12:34                         ` H.J. Lu
  2020-08-05 14:45                           ` H.J. Lu
  0 siblings, 1 reply; 188+ messages in thread
From: H.J. Lu @ 2020-08-05 12:34 UTC (permalink / raw)
  To: Richard Biener
  Cc: Qing Zhao, Uros Bizjak, Jakub Jelinek, GCC Patches, Kees Cook,
	Rodriguez Bahena, Victor

On Wed, Aug 5, 2020 at 5:31 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Wed, 5 Aug 2020, H.J. Lu wrote:
>
> > On Wed, Aug 5, 2020 at 12:06 AM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Tue, 4 Aug 2020, H.J. Lu wrote:
> > >
> > > > On Tue, Aug 4, 2020 at 12:35 AM Richard Biener <rguenther@suse.de> wrote:
> > > > >
> > > > > On Mon, 3 Aug 2020, Qing Zhao wrote:
> > > > >
> > > > > > Hi, Uros,
> > > > > >
> > > > > > Thanks a lot for your review on X86 parts.
> > > > > >
> > > > > > Hi, Richard,
> > > > > >
> > > > > > Could you please take a look at the middle-end part to see whether the
> > > > > > rewritten addressed your previous concern?
> > > > >
> > > > > I have a few comments below - I'm not sure I'm qualified to fully
> > > > > review the rest though.
> > > > >
> > > > > > Thanks a lot.
> > > > > >
> > > > > > Qing
> > > > > >
> > > > > >
> > > > > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
> > > > > > > >
> > > > > > > >
> > > > > > > > Richard and Uros,
> > > > > > > >
> > > > > > > > Could you please review the change that H.J and I rewrote based on your comments in the previous round of discussion?
> > > > > > > >
> > > > > > > > This patch is a nice security enhancement for GCC that has been requested by security people for quite some time.
> > > > > > > >
> > > > > > > > Thanks a lot for your time.
> > > > > > >
> > > > > > > I'll be away from the keyboard for the next week, but the patch needs a middle end approval first.
> > > > > > >
> > > > > > > That said, x86 parts looks OK.
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > > Uros.
> > > > > > > > Qing
> > > > > > > >
> > > > > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > > > > > > >
> > > > > > > > > Hi, Gcc team,
> > > > > > > > >
> > > > > > > > > This patch is a follow-up on the previous patch and corresponding discussion:
> > > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html> <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
> > > > > > > > >
> > > > > > > > > From the previous round of discussion, the major issues raised were:
> > > > > > > > >
> > > > > > > > > A. should be rewritten by using regsets infrastructure.
> > > > > > > > > B. Put the patch into middle-end instead of x86 backend.
> > > > > > > > >
> > > > > > > > > This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
> > > > > > > > >
> > > > > > > > > 1. Change the names of the option and attribute from
> > > > > > > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > > > to:
> > > > > > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > > > Add the new option and  new attribute in general.
> > > > > > > > > 2. The main code generation part is moved from i386 backend to middle-end;
> > > > > > > > > 3. Add 4 target-hooks;
> > > > > > > > > 4. Implement these 4 target-hooks on i386 backend.
> > > > > > > > > 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
> > > > > > > > >
> > > > > > > > > The patch is as following:
> > > > > > > > >
> > > > > > > > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > > > > > > command-line option and
> > > > > > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> > > > > > > > >
> > > > > > > > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > > > > > > >
> > > > > > > > >  Don't zero call-used registers upon function return.
> > > > >
> > > > > Does a return via EH unwinding also constitute a function return?  I
> > > > > think you may want to have a finally handler or support in the unwinder
> > > > > for this?  Then there's abnormal return via longjmp & friends, I guess
> > > > > there's nothing that can be done there besides patching glibc?
> > > >
> > > > Abnormal returns, like EH unwinding and longjmp, aren't covered by this
> > > > patch. Only normal returns are covered.
> > >
> > > What's the point then?  Also specifically thinking about spill slots.
> > >
> >
> > The goal of this patch is to zero caller-saved registers upon normal
> > function return.  Abnormal returns and spill slots are outside of the
> > scope of this patch.
>
> Sure, I can write a patch that spills some regs, writes zeros to them
> and then restores them.  And the patch will fulfil what it was designed
> to do.
>
> Still I need to come up with a reason that this is a useful feature
> by its own for it to be accepted.
>
> I am asking for that reason.  What's the reason for the "goal of this
> patch"?  Why's that a useful goal on its own?
>

Hi Victor,

Can you provide some background information about how/why this feature
is used?

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-05 12:34                         ` H.J. Lu
@ 2020-08-05 14:45                           ` H.J. Lu
  2020-08-05 15:00                             ` Qing Zhao
  2020-08-05 18:53                             ` Richard Biener
  0 siblings, 2 replies; 188+ messages in thread
From: H.J. Lu @ 2020-08-05 14:45 UTC (permalink / raw)
  To: Richard Biener
  Cc: Qing Zhao, Uros Bizjak, Jakub Jelinek, GCC Patches, Kees Cook,
	Rodriguez Bahena, Victor

On Wed, Aug 5, 2020 at 5:34 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Wed, Aug 5, 2020 at 5:31 AM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Wed, 5 Aug 2020, H.J. Lu wrote:
> >
> > > On Wed, Aug 5, 2020 at 12:06 AM Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Tue, 4 Aug 2020, H.J. Lu wrote:
> > > >
> > > > > On Tue, Aug 4, 2020 at 12:35 AM Richard Biener <rguenther@suse.de> wrote:
> > > > > >
> > > > > > On Mon, 3 Aug 2020, Qing Zhao wrote:
> > > > > >
> > > > > > > Hi, Uros,
> > > > > > >
> > > > > > > Thanks a lot for your review on X86 parts.
> > > > > > >
> > > > > > > Hi, Richard,
> > > > > > >
> > > > > > > Could you please take a look at the middle-end part to see whether the
> > > > > > > rewritten addressed your previous concern?
> > > > > >
> > > > > > I have a few comments below - I'm not sure I'm qualified to fully
> > > > > > review the rest though.
> > > > > >
> > > > > > > Thanks a lot.
> > > > > > >
> > > > > > > Qing
> > > > > > >
> > > > > > >
> > > > > > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Richard and Uros,
> > > > > > > > >
> > > > > > > > > Could you please review the change that H.J and I rewrote based on your comments in the previous round of discussion?
> > > > > > > > >
> > > > > > > > > This patch is a nice security enhancement for GCC that has been requested by security people for quite some time.
> > > > > > > > >
> > > > > > > > > Thanks a lot for your time.
> > > > > > > >
> > > > > > > > I'll be away from the keyboard for the next week, but the patch needs a middle end approval first.
> > > > > > > >
> > > > > > > > That said, x86 parts looks OK.
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > > Uros.
> > > > > > > > > Qing
> > > > > > > > >
> > > > > > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi, Gcc team,
> > > > > > > > > >
> > > > > > > > > > This patch is a follow-up on the previous patch and corresponding discussion:
> > > > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html> <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
> > > > > > > > > >
> > > > > > > > > > From the previous round of discussion, the major issues raised were:
> > > > > > > > > >
> > > > > > > > > > A. should be rewritten by using regsets infrastructure.
> > > > > > > > > > B. Put the patch into middle-end instead of x86 backend.
> > > > > > > > > >
> > > > > > > > > > This new patch is rewritten based on the above 2 comments.  The major changes compared to the previous patch are:
> > > > > > > > > >
> > > > > > > > > > 1. Change the names of the option and attribute from
> > > > > > > > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > > > > to:
> > > > > > > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > > > > Add the new option and  new attribute in general.
> > > > > > > > > > 2. The main code generation part is moved from i386 backend to middle-end;
> > > > > > > > > > 3. Add 4 target-hooks;
> > > > > > > > > > 4. Implement these 4 target-hooks on i386 backend.
> > > > > > > > > > 5. On a target that does not implement the target hook, issue error for the new option, issue warning for the new attribute.
> > > > > > > > > >
> > > > > > > > > > The patch is as following:
> > > > > > > > > >
> > > > > > > > > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > > > > > > > command-line option and
> > > > > > > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> > > > > > > > > >
> > > > > > > > > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > > > > > > > >
> > > > > > > > > >  Don't zero call-used registers upon function return.
> > > > > >
> > > > > > Does a return via EH unwinding also constitute a function return?  I
> > > > > > think you may want to have a finally handler or support in the unwinder
> > > > > > for this?  Then there's abnormal return via longjmp & friends, I guess
> > > > > > there's nothing that can be done there besides patching glibc?
> > > > >
> > > > > Abnormal returns, like EH unwinding and longjmp, aren't covered by this
> > > > > patch. Only normal returns are covered.
> > > >
> > > > What's the point then?  Also specifically thinking about spill slots.
> > > >
> > >
> > > The goal of this patch is to zero caller-saved registers upon normal
> > > function return.  Abnormal returns and spill slots are outside of the
> > > scope of this patch.
> >
> > Sure, I can write a patch that spills some regs, writes zeros to them
> > and then restores them.  And the patch will fulfil what it was designed
> > to do.
> >
> > Still I need to come up with a reason that this is a useful feature
> > by its own for it to be accepted.
> >
> > I am asking for that reason.  What's the reason for the "goal of this
> > patch"?  Why's that a useful goal on its own?
> >
>
> Hi Victor,
>
> Can you provide some background information about how/why this feature
> is used?
>

From The SECURE project and GCC in GCC Cauldron 2018:

Speaker: Graham Markall

The SECURE project is a 15 month program funded by Innovate UK, to
take well known security techniques from academia and make them
generally available in standard compilers, specfically GCC and LLVM.
An explicit objective is for those techniques to be incorporated in
the upstream versions of compilers. The Cauldron takes place in the
final month of the project and this talk will present the technical
details of some of the techniques implemented, and review those that
are yet to be implemented. A particular focus of this talk will be on
verifying that the implemetnation is correct, which can be a bigger
challenge than the implementation.

Techniques to be covered in the project include the following:

Stack and register erasure. Ensuring that on return from a function,
no data is left lying on the stack or in registers. Particular
challenges are in dealing with inlining, shrink wrapping and caching.

This patch implemens register erasure.


-- 
H.J.

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-05 14:45                           ` H.J. Lu
@ 2020-08-05 15:00                             ` Qing Zhao
  2020-08-05 18:53                             ` Richard Biener
  1 sibling, 0 replies; 188+ messages in thread
From: Qing Zhao @ 2020-08-05 15:00 UTC (permalink / raw)
  To: H.J. Lu, Richard Biener
  Cc: Uros Bizjak, Jakub Jelinek, GCC Patches, Kees Cook,
	Rodriguez Bahena, Victor



> On Aug 5, 2020, at 9:45 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> 
> On Wed, Aug 5, 2020 at 5:34 AM H.J. Lu <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>> wrote:
>> 
>> On Wed, Aug 5, 2020 at 5:31 AM Richard Biener <rguenther@suse.de> wrote:
>>> 
>>>>>>>>>>> 
>>>>>>>>>>> [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
>>>>>>>>>>> command-line option and
>>>>>>>>>>> zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
>>>>>>>>>>> 
>>>>>>>>>>> 1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
>>>>>>>>>>> 
>>>>>>>>>>> Don't zero call-used registers upon function return.
>>>>>>> 
>>>>>>> Does a return via EH unwinding also constitute a function return?  I
>>>>>>> think you may want to have a finally handler or support in the unwinder
>>>>>>> for this?  Then there's abnormal return via longjmp & friends, I guess
>>>>>>> there's nothing that can be done there besides patching glibc?
>>>>>> 
>>>>>> Abnormal returns, like EH unwinding and longjmp, aren't covered by this
>>>>>> patch. Only normal returns are covered.
>>>>> 
>>>>> What's the point then?  Also specifically thinking about spill slots.
>>>>> 
>>>> 
>>>> The goal of this patch is to zero caller-saved registers upon normal
>>>> function return.  Abnormal returns and spill slots are outside of the
>>>> scope of this patch.
>>> 
>>> Sure, I can write a patch that spills some regs, writes zeros to them
>>> and then restores them.  And the patch will fulfil what it was designed
>>> to do.
>>> 
>>> Still I need to come up with a reason that this is a useful feature
>>> by its own for it to be accepted.
>>> 
>>> I am asking for that reason.  What's the reason for the "goal of this
>>> patch"?  Why's that a useful goal on its own?
>>> 
>> 
>> Hi Victor,
>> 
>> Can you provide some background information about how/why this feature
>> is used?
>> 
> 
> From The SECURE project and GCC in GCC Cauldron 2018:
> 
> Speaker: Graham Markall
> 
> The SECURE project is a 15 month program funded by Innovate UK, to
> take well known security techniques from academia and make them
> generally available in standard compilers, specfically GCC and LLVM.
> An explicit objective is for those techniques to be incorporated in
> the upstream versions of compilers. The Cauldron takes place in the
> final month of the project and this talk will present the technical
> details of some of the techniques implemented, and review those that
> are yet to be implemented. A particular focus of this talk will be on
> verifying that the implemetnation is correct, which can be a bigger
> challenge than the implementation.
> 
> Techniques to be covered in the project include the following:
> 
> Stack and register erasure. Ensuring that on return from a function,
> no data is left lying on the stack or in registers. Particular
> challenges are in dealing with inlining, shrink wrapping and caching.
> 
> This patch implemens register erasure.

In addition to the above, Victor mentioned a paper that can provide good background information
For this patch:

"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"

https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>

The abstract of this paper is:

"With the implementation of W ⊕ X security model on computer system, 
Return-Oriented Programming(ROP) has become the primary exploitation
 technique for adversaries. Although many solutions that defend against ROP 
exploits have been proposed, they still suffer from various shortcomings.
 In this paper, we propose a new way to mitigate ROP attacks that are based 
on return instructions. We clean the scratch registers which are also the
 parameter registers based on the features of ROP malicious code and calling 
convention. A prototype is implemented on x64-based Linux platform based on Pin.
 Preliminary experimental results show that our method can efficiently mitigate 
conventional ROP attacks."

Qing
 
> 
> 
> -- 
> H.J.


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-05 14:45                           ` H.J. Lu
  2020-08-05 15:00                             ` Qing Zhao
@ 2020-08-05 18:53                             ` Richard Biener
  2020-08-05 19:08                               ` H.J. Lu
  2020-08-05 20:22                               ` Qing Zhao
  1 sibling, 2 replies; 188+ messages in thread
From: Richard Biener @ 2020-08-05 18:53 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Qing Zhao, Uros Bizjak, Jakub Jelinek, GCC Patches, Kees Cook,
	Rodriguez Bahena, Victor

On August 5, 2020 4:45:00 PM GMT+02:00, "H.J. Lu" <hjl.tools@gmail.com> wrote:
>On Wed, Aug 5, 2020 at 5:34 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Wed, Aug 5, 2020 at 5:31 AM Richard Biener <rguenther@suse.de>
>wrote:
>> >
>> > On Wed, 5 Aug 2020, H.J. Lu wrote:
>> >
>> > > On Wed, Aug 5, 2020 at 12:06 AM Richard Biener
><rguenther@suse.de> wrote:
>> > > >
>> > > > On Tue, 4 Aug 2020, H.J. Lu wrote:
>> > > >
>> > > > > On Tue, Aug 4, 2020 at 12:35 AM Richard Biener
><rguenther@suse.de> wrote:
>> > > > > >
>> > > > > > On Mon, 3 Aug 2020, Qing Zhao wrote:
>> > > > > >
>> > > > > > > Hi, Uros,
>> > > > > > >
>> > > > > > > Thanks a lot for your review on X86 parts.
>> > > > > > >
>> > > > > > > Hi, Richard,
>> > > > > > >
>> > > > > > > Could you please take a look at the middle-end part to
>see whether the
>> > > > > > > rewritten addressed your previous concern?
>> > > > > >
>> > > > > > I have a few comments below - I'm not sure I'm qualified to
>fully
>> > > > > > review the rest though.
>> > > > > >
>> > > > > > > Thanks a lot.
>> > > > > > >
>> > > > > > > Qing
>> > > > > > >
>> > > > > > >
>> > > > > > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak
><ubizjak@gmail.com> wrote:
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao
><QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Richard and Uros,
>> > > > > > > > >
>> > > > > > > > > Could you please review the change that H.J and I
>rewrote based on your comments in the previous round of discussion?
>> > > > > > > > >
>> > > > > > > > > This patch is a nice security enhancement for GCC
>that has been requested by security people for quite some time.
>> > > > > > > > >
>> > > > > > > > > Thanks a lot for your time.
>> > > > > > > >
>> > > > > > > > I'll be away from the keyboard for the next week, but
>the patch needs a middle end approval first.
>> > > > > > > >
>> > > > > > > > That said, x86 parts looks OK.
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > > > Uros.
>> > > > > > > > > Qing
>> > > > > > > > >
>> > > > > > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via
>Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>>
>wrote:
>> > > > > > > > > >
>> > > > > > > > > > Hi, Gcc team,
>> > > > > > > > > >
>> > > > > > > > > > This patch is a follow-up on the previous patch and
>corresponding discussion:
>> > > > > > > > > >
>https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html
><https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>
><https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html
><https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
>> > > > > > > > > >
>> > > > > > > > > > From the previous round of discussion, the major
>issues raised were:
>> > > > > > > > > >
>> > > > > > > > > > A. should be rewritten by using regsets
>infrastructure.
>> > > > > > > > > > B. Put the patch into middle-end instead of x86
>backend.
>> > > > > > > > > >
>> > > > > > > > > > This new patch is rewritten based on the above 2
>comments.  The major changes compared to the previous patch are:
>> > > > > > > > > >
>> > > > > > > > > > 1. Change the names of the option and attribute
>from
>> > > > > > > > > >
>-mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and
>zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
>> > > > > > > > > > to:
>> > > > > > > > > >
>-fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and 
>zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
>> > > > > > > > > > Add the new option and  new attribute in general.
>> > > > > > > > > > 2. The main code generation part is moved from i386
>backend to middle-end;
>> > > > > > > > > > 3. Add 4 target-hooks;
>> > > > > > > > > > 4. Implement these 4 target-hooks on i386 backend.
>> > > > > > > > > > 5. On a target that does not implement the target
>hook, issue error for the new option, issue warning for the new
>attribute.
>> > > > > > > > > >
>> > > > > > > > > > The patch is as following:
>> > > > > > > > > >
>> > > > > > > > > > [PATCH] Add
>-fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
>> > > > > > > > > > command-line option and
>> > > > > > > > > >
>zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function
>attribue:
>> > > > > > > > > >
>> > > > > > > > > >  1. -fzero-call-used-regs=skip and
>zero_call_used_regs("skip")
>> > > > > > > > > >
>> > > > > > > > > >  Don't zero call-used registers upon function
>return.
>> > > > > >
>> > > > > > Does a return via EH unwinding also constitute a function
>return?  I
>> > > > > > think you may want to have a finally handler or support in
>the unwinder
>> > > > > > for this?  Then there's abnormal return via longjmp &
>friends, I guess
>> > > > > > there's nothing that can be done there besides patching
>glibc?
>> > > > >
>> > > > > Abnormal returns, like EH unwinding and longjmp, aren't
>covered by this
>> > > > > patch. Only normal returns are covered.
>> > > >
>> > > > What's the point then?  Also specifically thinking about spill
>slots.
>> > > >
>> > >
>> > > The goal of this patch is to zero caller-saved registers upon
>normal
>> > > function return.  Abnormal returns and spill slots are outside of
>the
>> > > scope of this patch.
>> >
>> > Sure, I can write a patch that spills some regs, writes zeros to
>them
>> > and then restores them.  And the patch will fulfil what it was
>designed
>> > to do.
>> >
>> > Still I need to come up with a reason that this is a useful feature
>> > by its own for it to be accepted.
>> >
>> > I am asking for that reason.  What's the reason for the "goal of
>this
>> > patch"?  Why's that a useful goal on its own?
>> >
>>
>> Hi Victor,
>>
>> Can you provide some background information about how/why this
>feature
>> is used?
>>
>
From The SECURE project and GCC in GCC Cauldron 2018:
>
>Speaker: Graham Markall
>
>The SECURE project is a 15 month program funded by Innovate UK, to
>take well known security techniques from academia and make them
>generally available in standard compilers, specfically GCC and LLVM.
>An explicit objective is for those techniques to be incorporated in
>the upstream versions of compilers. The Cauldron takes place in the
>final month of the project and this talk will present the technical
>details of some of the techniques implemented, and review those that
>are yet to be implemented. A particular focus of this talk will be on
>verifying that the implemetnation is correct, which can be a bigger
>challenge than the implementation.
>
>Techniques to be covered in the project include the following:
>
>Stack and register erasure. Ensuring that on return from a function,
>no data is left lying on the stack or in registers. Particular
>challenges are in dealing with inlining, shrink wrapping and caching.
>
>This patch implemens register erasure.

Part of it, yes. While I can see abnormal transfer of control is difficult exception handling is used too wide spread to be ignored. What's the plan there? 

So can we also see the other parts? In particular I wonder whether exposing just register clearing (in this fine-grained manner) is required and useful rather than thinking of a better interface for the whole thing?

Richard. 

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-05 18:53                             ` Richard Biener
@ 2020-08-05 19:08                               ` H.J. Lu
  2020-08-05 20:22                               ` Qing Zhao
  1 sibling, 0 replies; 188+ messages in thread
From: H.J. Lu @ 2020-08-05 19:08 UTC (permalink / raw)
  To: Richard Biener
  Cc: Qing Zhao, Uros Bizjak, Jakub Jelinek, GCC Patches, Kees Cook,
	Rodriguez Bahena, Victor

On Wed, Aug 5, 2020 at 11:53 AM Richard Biener <rguenther@suse.de> wrote:
>
> On August 5, 2020 4:45:00 PM GMT+02:00, "H.J. Lu" <hjl.tools@gmail.com> wrote:
> >On Wed, Aug 5, 2020 at 5:34 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Wed, Aug 5, 2020 at 5:31 AM Richard Biener <rguenther@suse.de>
> >wrote:
> >> >
> >> > On Wed, 5 Aug 2020, H.J. Lu wrote:
> >> >
> >> > > On Wed, Aug 5, 2020 at 12:06 AM Richard Biener
> ><rguenther@suse.de> wrote:
> >> > > >
> >> > > > On Tue, 4 Aug 2020, H.J. Lu wrote:
> >> > > >
> >> > > > > On Tue, Aug 4, 2020 at 12:35 AM Richard Biener
> ><rguenther@suse.de> wrote:
> >> > > > > >
> >> > > > > > On Mon, 3 Aug 2020, Qing Zhao wrote:
> >> > > > > >
> >> > > > > > > Hi, Uros,
> >> > > > > > >
> >> > > > > > > Thanks a lot for your review on X86 parts.
> >> > > > > > >
> >> > > > > > > Hi, Richard,
> >> > > > > > >
> >> > > > > > > Could you please take a look at the middle-end part to
> >see whether the
> >> > > > > > > rewritten addressed your previous concern?
> >> > > > > >
> >> > > > > > I have a few comments below - I'm not sure I'm qualified to
> >fully
> >> > > > > > review the rest though.
> >> > > > > >
> >> > > > > > > Thanks a lot.
> >> > > > > > >
> >> > > > > > > Qing
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak
> ><ubizjak@gmail.com> wrote:
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao
> ><QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> napisala:
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > Richard and Uros,
> >> > > > > > > > >
> >> > > > > > > > > Could you please review the change that H.J and I
> >rewrote based on your comments in the previous round of discussion?
> >> > > > > > > > >
> >> > > > > > > > > This patch is a nice security enhancement for GCC
> >that has been requested by security people for quite some time.
> >> > > > > > > > >
> >> > > > > > > > > Thanks a lot for your time.
> >> > > > > > > >
> >> > > > > > > > I'll be away from the keyboard for the next week, but
> >the patch needs a middle end approval first.
> >> > > > > > > >
> >> > > > > > > > That said, x86 parts looks OK.
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > > > Uros.
> >> > > > > > > > > Qing
> >> > > > > > > > >
> >> > > > > > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via
> >Gcc-patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>>
> >wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > Hi, Gcc team,
> >> > > > > > > > > >
> >> > > > > > > > > > This patch is a follow-up on the previous patch and
> >corresponding discussion:
> >> > > > > > > > > >
> >https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html
> ><https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>
> ><https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html
> ><https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html>>
> >> > > > > > > > > >
> >> > > > > > > > > > From the previous round of discussion, the major
> >issues raised were:
> >> > > > > > > > > >
> >> > > > > > > > > > A. should be rewritten by using regsets
> >infrastructure.
> >> > > > > > > > > > B. Put the patch into middle-end instead of x86
> >backend.
> >> > > > > > > > > >
> >> > > > > > > > > > This new patch is rewritten based on the above 2
> >comments.  The major changes compared to the previous patch are:
> >> > > > > > > > > >
> >> > > > > > > > > > 1. Change the names of the option and attribute
> >from
> >> > > > > > > > > >
> >-mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and
> >zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> >> > > > > > > > > > to:
> >> > > > > > > > > >
> >-fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and
> >zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> >> > > > > > > > > > Add the new option and  new attribute in general.
> >> > > > > > > > > > 2. The main code generation part is moved from i386
> >backend to middle-end;
> >> > > > > > > > > > 3. Add 4 target-hooks;
> >> > > > > > > > > > 4. Implement these 4 target-hooks on i386 backend.
> >> > > > > > > > > > 5. On a target that does not implement the target
> >hook, issue error for the new option, issue warning for the new
> >attribute.
> >> > > > > > > > > >
> >> > > > > > > > > > The patch is as following:
> >> > > > > > > > > >
> >> > > > > > > > > > [PATCH] Add
> >-fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> >> > > > > > > > > > command-line option and
> >> > > > > > > > > >
> >zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function
> >attribue:
> >> > > > > > > > > >
> >> > > > > > > > > >  1. -fzero-call-used-regs=skip and
> >zero_call_used_regs("skip")
> >> > > > > > > > > >
> >> > > > > > > > > >  Don't zero call-used registers upon function
> >return.
> >> > > > > >
> >> > > > > > Does a return via EH unwinding also constitute a function
> >return?  I
> >> > > > > > think you may want to have a finally handler or support in
> >the unwinder
> >> > > > > > for this?  Then there's abnormal return via longjmp &
> >friends, I guess
> >> > > > > > there's nothing that can be done there besides patching
> >glibc?
> >> > > > >
> >> > > > > Abnormal returns, like EH unwinding and longjmp, aren't
> >covered by this
> >> > > > > patch. Only normal returns are covered.
> >> > > >
> >> > > > What's the point then?  Also specifically thinking about spill
> >slots.
> >> > > >
> >> > >
> >> > > The goal of this patch is to zero caller-saved registers upon
> >normal
> >> > > function return.  Abnormal returns and spill slots are outside of
> >the
> >> > > scope of this patch.
> >> >
> >> > Sure, I can write a patch that spills some regs, writes zeros to
> >them
> >> > and then restores them.  And the patch will fulfil what it was
> >designed
> >> > to do.
> >> >
> >> > Still I need to come up with a reason that this is a useful feature
> >> > by its own for it to be accepted.
> >> >
> >> > I am asking for that reason.  What's the reason for the "goal of
> >this
> >> > patch"?  Why's that a useful goal on its own?
> >> >
> >>
> >> Hi Victor,
> >>
> >> Can you provide some background information about how/why this
> >feature
> >> is used?
> >>
> >
> >From The SECURE project and GCC in GCC Cauldron 2018:
> >
> >Speaker: Graham Markall
> >
> >The SECURE project is a 15 month program funded by Innovate UK, to
> >take well known security techniques from academia and make them
> >generally available in standard compilers, specfically GCC and LLVM.
> >An explicit objective is for those techniques to be incorporated in
> >the upstream versions of compilers. The Cauldron takes place in the
> >final month of the project and this talk will present the technical
> >details of some of the techniques implemented, and review those that
> >are yet to be implemented. A particular focus of this talk will be on
> >verifying that the implemetnation is correct, which can be a bigger
> >challenge than the implementation.
> >
> >Techniques to be covered in the project include the following:
> >
> >Stack and register erasure. Ensuring that on return from a function,
> >no data is left lying on the stack or in registers. Particular
> >challenges are in dealing with inlining, shrink wrapping and caching.
> >
> >This patch implemens register erasure.
>
> Part of it, yes. While I can see abnormal transfer of control is difficult exception handling is used too wide spread to be ignored. What's the plan there?

The initial usage is in Linux kernel where user space EH isn't an issue.
Further improvement can be investigated later.

> So can we also see the other parts? In particular I wonder whether exposing just register clearing (in this fine-grained manner) is required and useful rather than thinking of a better interface for the whole thing?
>
> Richard.

This patch is for caller-saved registers only.  Stack temporaries aren't
covered by this.  We can simply clear the stack first before releasing it
for function return.

-- 
H.J.

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-05 18:53                             ` Richard Biener
  2020-08-05 19:08                               ` H.J. Lu
@ 2020-08-05 20:22                               ` Qing Zhao
  2020-08-06  8:37                                 ` Richard Biener
  1 sibling, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-08-05 20:22 UTC (permalink / raw)
  To: Richard Biener, Kees Cook
  Cc: H.J. Lu, Uros Bizjak, Jakub Jelinek, GCC Patches,
	Rodriguez Bahena, Victor

>> 
>> From The SECURE project and GCC in GCC Cauldron 2018:
>> 
>> Speaker: Graham Markall
>> 
>> The SECURE project is a 15 month program funded by Innovate UK, to
>> take well known security techniques from academia and make them
>> generally available in standard compilers, specfically GCC and LLVM.
>> An explicit objective is for those techniques to be incorporated in
>> the upstream versions of compilers. The Cauldron takes place in the
>> final month of the project and this talk will present the technical
>> details of some of the techniques implemented, and review those that
>> are yet to be implemented. A particular focus of this talk will be on
>> verifying that the implemetnation is correct, which can be a bigger
>> challenge than the implementation.
>> 
>> Techniques to be covered in the project include the following:
>> 
>> Stack and register erasure. Ensuring that on return from a function,
>> no data is left lying on the stack or in registers. Particular
>> challenges are in dealing with inlining, shrink wrapping and caching.
>> 
>> This patch implemens register erasure.
> 
> Part of it, yes. While I can see abnormal transfer of control is difficult exception handling is used too wide spread to be ignored. What's the plan there? 
> 
> So can we also see the other parts? In particular I wonder whether exposing just register clearing (in this fine-grained manner) is required and useful rather than thinking of a better interface for the whole thing?

You mean to provide an integrated interface for both stack and register erasure for security purpose? 

However, Is stack erasure at function return really a better idea than zero-init auto-variables in the beginning of the function?

We had some discussion with Kees Cook several weeks ago on the idea of stack erasure at function return, Kees provided the following comments:

"But back to why I don't think it's the right approach:

Based on the performance measurements of pattern-init and zero-init
in Clang, MSVC, and the kernel plugin, it's clear that adding these
initializations has measurable performance cost. Doing it at function
exit means performing large unconditional wipes. Doing it at function
entry means initializations can be dead-store eliminated and highly
optimized. Given the current debates on the measurable performance
difference between pattern and zero initialization (even in the face of
existing dead-store elimination), I would expect wipe-on-function-exit to
be outside the acceptable tolerance for performance impact. (Additionally,
we've seen negative cache effects on wiping memory when the CPU is done
using it, though this is more pronounced in heap wiping. Zeroing at
free is about twice as expensive as zeroing at free time due to cache
temporality. This is true for the stack as well, but it's not as high.)”

From my understanding, the major issue with stack erasure at function result is the big performance overhead,
And these performance overhead cannot be reduced with compiler optimizations since those 
additional wiping insns are inserted at the end of the routine.

Based on the previous discussion with Kees, I don’t think that stack erasure at function return is a good idea,  
Instead, we might provide an alternative approach:  zero/pattern init to auto-variables. (This functionality has
Been available in LLVM already)
This will be another patch we want to add to GCC for the security purpose in general. 

So, I think for the current patch, -fzero-call-used-regs should be good enough. 

Any comments?

Qing





> 
> Richard. 


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-04  7:35               ` Richard Biener
  2020-08-04 18:23                 ` H.J. Lu
@ 2020-08-05 21:35                 ` Qing Zhao
  2020-08-06  8:31                   ` Richard Biener
  2020-08-06 22:32                   ` Qing Zhao
  1 sibling, 2 replies; 188+ messages in thread
From: Qing Zhao @ 2020-08-05 21:35 UTC (permalink / raw)
  To: Richard Biener
  Cc: Uros Bizjak, H. J. Lu, Jakub Jelinek, GCC Patches, Kees Cook,
	Rodriguez Bahena, Victor

Hi, Richard,

Thanks a lot for your careful review and detailed comments.  


> On Aug 4, 2020, at 2:35 AM, Richard Biener <rguenther@suse.de> wrote:
> 
> I have a few comments below - I'm not sure I'm qualified to fully
> review the rest though.

Could you let me know who will be the more qualified person to fully review the rest of middle-end change?

>>>>> [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
>>>>> command-line option and
>>>>> zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
>>>>> 
>>>>> 1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
>>>>> 
>>>>> Don't zero call-used registers upon function return.
> 
> Does a return via EH unwinding also constitute a function return?  I
> think you may want to have a finally handler or support in the unwinder
> for this?  Then there's abnormal return via longjmp & friends, I guess
> there's nothing that can be done there besides patching glibc?
> 
> In general I am missing reasoning as why to use -fzero-call-used-regs=
> in the documentation, that is, what is the thread model and what are
> the guarantees?  Is there any point zeroing registers when spill slots
> are left populated with stale register contents?  How do I (and why
> would I want to?) ensure that there's no information leak from the
> implementation of 'foo' to their callers?  Do I need to compile all
> of 'foo' and functions called from 'foo' with -fzero-call-used-regs=
> or is it enough to annotate API boundaries I want to proptect with
> zero_call_used_regs("...")?
> 
> Again - what's the intended use (and how does it fulful anything useful
> for that case)?

The major question of the above is:  what’s the motivation of the whole patch?
H.J.Lu and I have replied this question in separated emails, let’s continue with
this high-level discussion in that thread. 


>>>>> @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, tree name,
>>>>> return NULL_TREE;
>>>>> }
>>>>> 
>>>>> +/* Handle a "zero_call_used_regs" attribute; arguments as in
>>>>> +   struct attribute_spec.handler.  */
>>>>> +
>>>>> +static tree
>>>>> +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
>>>>> +                                   int ARG_UNUSED (flags),
>>>>> +                                   bool *no_add_attris)
>>>>> +{
>>>>> +  tree decl = *node;
>>>>> +  tree id = TREE_VALUE (args);
>>>>> +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
>>>>> +
>>>>> +  if (TREE_CODE (decl) != FUNCTION_DECL)
>>>>> +    {
>>>>> +      error_at (DECL_SOURCE_LOCATION (decl),
>>>>> +             "%qE attribute applies only to functions", name);
>>>>> +      *no_add_attris = true;
>>>>> +      return NULL_TREE;
>>>>> +    }
>>>>> +  else if (DECL_INITIAL (decl))
>>>>> +    {
>>>>> +      error_at (DECL_SOURCE_LOCATION (decl),
>>>>> +             "cannot set %qE attribute after definition", name);
> 
> Why's that?
This might not be needed, I will fix this in the next update.

>>>>> 
>>>>> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
>>>>> index 81bd2ee..ded1880 100644
>>>>> --- a/gcc/c/c-decl.c
>>>>> +++ b/gcc/c/c-decl.c
>>>>> @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
>>>>>        DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
>>>>>      }
>>>>> 
>>>>> +      /* Merge the zero_call_used_regs_type information.  */
>>>>> +      if (TREE_CODE (newdecl) == FUNCTION_DECL)
>>>>> +     DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS (olddecl);
>>>>> +
> 
> If you need this (see below) then likely cp/* needs similar adjustment
> so do other places in the middle-end (function cloning, etc)

Will check this in cp/* and function cloning etc to see whether the copying and merging are needed in other
places.

Another thought, if I use “lookup_attribute” of the function decl instead of checking these bits as you suggested
later,  all these copying and merging might not be necessary anymore. I will check on that. 
> 
>>>>> 
>>>>> +
>>>>> +/* Emit a sequence of insns to zero the call-used-registers for the current
>>>>> + * function.  */
> 
> No '*' on the continuation line

Okay, will fix this.

>>>>> +
>>>>> +  /* This array holds the zero rtx with the correponding machine mode.  */
>>>>> +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
>>>>> +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
>>>>> +    zero_rtx[i] = NULL_RTX;
>>>>> +
>>>>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>>>> +    {
>>>>> +      if (!this_target_hard_regs->x_call_used_regs[regno])
> 
> Use if (!call_used_regs[regno])
Okay.

> 
>>>>> +     continue;
>>>>> +      if (fixed_regs[regno])
>>>>> +     continue;
>>>>> +      if (is_live_reg_at_exit (regno))
>>>>> +     continue;
> 
> How can a call-used reg be live at exit?

Yes, this might not be needed, I will double check on this.

> 
>>>>> +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
>>>>> +     continue;
> 
> Why does the target need some extra say here?

Only target can decide which hard regs should be zeroed, and which hard regs are general purpose register. 

> 
>>>>> +      if (used_only && !df_regs_ever_live_p (regno))
> 
> So I suppose this does not include uses by callees of this function?

Yes, I think so. 
> 
>>>>> +     continue;
>>>>> +
>>>>> +      /* Now we can emit insn to zero this register.  */
>>>>> +      rtx reg, tmp;
>>>>> +
>>>>> +      machine_mode mode
>>>>> +     = targetm.calls.zero_call_used_regno_mode (regno,
>>>>> +                                                reg_raw_mode[regno]);
> 
> In what case does the target ever need to adjust this (we're dealing
> with hard-regs only?)?

For x86, for example, even though the GPR registers are 64-bit, we only need to zero the lower 32-bit. etc.

> 
>>>>> +      if (mode == VOIDmode)
>>>>> +     continue;
>>>>> +      if (!have_regs_of_mode[mode])
>>>>> +     continue;
> 
> When does this happen?

This might be removed. I will check. 
> 
>>>>> +
>>>>> +      reg = gen_rtx_REG (mode, regno);
>>>>> +      if (zero_rtx[(int)mode] == NULL_RTX)
>>>>> +     {
>>>>> +       zero_rtx[(int)mode] = reg;
>>>>> +       tmp = gen_rtx_SET (reg, const0_rtx);
>>>>> +       emit_insn (tmp);
>>>>> +     }
>>>>> +      else
>>>>> +     emit_move_insn (reg, zero_rtx[(int)mode]);
> 
> Not sure but I think the canonical zero to use is CONST0_RTX (mode)
> but I may be wrong.  

You mean “const0_rtx” should be “CONST0_RTX(mode)”? 
I will check on this.

> I'd rather have the target be able to specify
> some special instruction for zeroing here.  Some may have
> multi-reg set instructions for example.  That said, can't we
> defer the actual zeroing to the target in full and only compute
> a hard-reg-set of to-be zerored registers here and pass that
> to a target hook?

For vector regs, we have already provided this interface with 

targetm.calls.zero_all_vector_registers (used_only)

For integer registers, do we need such target hook too? 
If so, yes, it might be better to let the target decide how to zero the registers.

If Not, the current design might be good enough, right?

> 
>>>>> +
>>>>> +      emit_insn (targetm.calls.pro_epilogue_use (reg));
>>>>> +    }
>>>>> +
>>>>> +  return;
>>>>> +}
>>>>> +
>>>>> +
>>>>> /* Return a sequence to be used as the epilogue for the current function,
>>>>>  or NULL.  */
>>>>> 
>>>>> @@ -5819,6 +5961,9 @@ make_epilogue_seq (void)
>>>>> 
>>>>> start_sequence ();
>>>>> emit_note (NOTE_INSN_EPILOGUE_BEG);
>>>>> +
>>>>> +  gen_call_used_regs_seq ();
>>>>> +
> 
> The caller eventually performs shrink-wrapping - are you sure that
> doesn't mess up things?

My understanding is, in the standard epilogue, there is no handling of “call-used” registers.  Therefore, shrink-wrapping will not impact
“call-used” registers as well. 
Our patch only handles call-used registers, so, there should be no any interaction between this patch and shrink-wrapping.

> 
>>>>> 
>>>>> +
>>>>> + /* How to clear call-used registers upon function return.  */
>>>>> + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
>>>>> +
>>>>> + /* 11 unused bits.  */
> 
> So instead of wasting "precious" bits please use lookup_attribute
> in the single place you query this value (which is once per function).
> There's no need to complicate matters by trying to maintain the above.

Thanks for the suggestion.
Yes, I will try to use lookup_attribute in function.c instead of adding these bits. That will save us these
precious space.

Thanks again.

Qing

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-05 21:35                 ` Qing Zhao
@ 2020-08-06  8:31                   ` Richard Biener
  2020-08-06  8:41                     ` Jakub Jelinek
                                       ` (2 more replies)
  2020-08-06 22:32                   ` Qing Zhao
  1 sibling, 3 replies; 188+ messages in thread
From: Richard Biener @ 2020-08-06  8:31 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Uros Bizjak, H. J. Lu, Jakub Jelinek, GCC Patches, Kees Cook,
	Rodriguez Bahena, Victor, segher

On Wed, 5 Aug 2020, Qing Zhao wrote:

> Hi, Richard,
> 
> Thanks a lot for your careful review and detailed comments.  
> 
> 
> > On Aug 4, 2020, at 2:35 AM, Richard Biener <rguenther@suse.de> wrote:
> > 
> > I have a few comments below - I'm not sure I'm qualified to fully
> > review the rest though.
> 
> Could you let me know who will be the more qualified person to fully review the rest of middle-end change?

Jeff might be, but with the intended purpose (ROP mitigation AFAIU)
it would be nice for other target maintainers to chime in (Segher for
power maybe) for the question below...

> >>>>> [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> >>>>> command-line option and
> >>>>> zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> >>>>> 
> >>>>> 1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> >>>>> 
> >>>>> Don't zero call-used registers upon function return.
> > 
> > Does a return via EH unwinding also constitute a function return?  I
> > think you may want to have a finally handler or support in the unwinder
> > for this?  Then there's abnormal return via longjmp & friends, I guess
> > there's nothing that can be done there besides patching glibc?
> > 
> > In general I am missing reasoning as why to use -fzero-call-used-regs=
> > in the documentation, that is, what is the thread model and what are
> > the guarantees?  Is there any point zeroing registers when spill slots
> > are left populated with stale register contents?  How do I (and why
> > would I want to?) ensure that there's no information leak from the
> > implementation of 'foo' to their callers?  Do I need to compile all
> > of 'foo' and functions called from 'foo' with -fzero-call-used-regs=
> > or is it enough to annotate API boundaries I want to proptect with
> > zero_call_used_regs("...")?
> > 
> > Again - what's the intended use (and how does it fulful anything useful
> > for that case)?
> 
> The major question of the above is:  what’s the motivation of the whole patch?
> H.J.Lu and I have replied this question in separated emails, let’s continue with
> this high-level discussion in that thread. 
> 
> 
> >>>>> @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, tree name,
> >>>>> return NULL_TREE;
> >>>>> }
> >>>>> 
> >>>>> +/* Handle a "zero_call_used_regs" attribute; arguments as in
> >>>>> +   struct attribute_spec.handler.  */
> >>>>> +
> >>>>> +static tree
> >>>>> +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> >>>>> +                                   int ARG_UNUSED (flags),
> >>>>> +                                   bool *no_add_attris)
> >>>>> +{
> >>>>> +  tree decl = *node;
> >>>>> +  tree id = TREE_VALUE (args);
> >>>>> +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
> >>>>> +
> >>>>> +  if (TREE_CODE (decl) != FUNCTION_DECL)
> >>>>> +    {
> >>>>> +      error_at (DECL_SOURCE_LOCATION (decl),
> >>>>> +             "%qE attribute applies only to functions", name);
> >>>>> +      *no_add_attris = true;
> >>>>> +      return NULL_TREE;
> >>>>> +    }
> >>>>> +  else if (DECL_INITIAL (decl))
> >>>>> +    {
> >>>>> +      error_at (DECL_SOURCE_LOCATION (decl),
> >>>>> +             "cannot set %qE attribute after definition", name);
> > 
> > Why's that?
> This might not be needed, I will fix this in the next update.
> 
> >>>>> 
> >>>>> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> >>>>> index 81bd2ee..ded1880 100644
> >>>>> --- a/gcc/c/c-decl.c
> >>>>> +++ b/gcc/c/c-decl.c
> >>>>> @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
> >>>>>        DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
> >>>>>      }
> >>>>> 
> >>>>> +      /* Merge the zero_call_used_regs_type information.  */
> >>>>> +      if (TREE_CODE (newdecl) == FUNCTION_DECL)
> >>>>> +     DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS (olddecl);
> >>>>> +
> > 
> > If you need this (see below) then likely cp/* needs similar adjustment
> > so do other places in the middle-end (function cloning, etc)
> 
> Will check this in cp/* and function cloning etc to see whether the copying and merging are needed in other
> places.
> 
> Another thought, if I use “lookup_attribute” of the function decl instead of checking these bits as you suggested
> later,  all these copying and merging might not be necessary anymore. I will check on that. 
> > 
> >>>>> 
> >>>>> +
> >>>>> +/* Emit a sequence of insns to zero the call-used-registers for the current
> >>>>> + * function.  */
> > 
> > No '*' on the continuation line
> 
> Okay, will fix this.
> 
> >>>>> +
> >>>>> +  /* This array holds the zero rtx with the correponding machine mode.  */
> >>>>> +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
> >>>>> +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> >>>>> +    zero_rtx[i] = NULL_RTX;
> >>>>> +
> >>>>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> >>>>> +    {
> >>>>> +      if (!this_target_hard_regs->x_call_used_regs[regno])
> > 
> > Use if (!call_used_regs[regno])
> Okay.
> 
> > 
> >>>>> +     continue;
> >>>>> +      if (fixed_regs[regno])
> >>>>> +     continue;
> >>>>> +      if (is_live_reg_at_exit (regno))
> >>>>> +     continue;
> > 
> > How can a call-used reg be live at exit?
> 
> Yes, this might not be needed, I will double check on this.
> 
> > 
> >>>>> +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
> >>>>> +     continue;
> > 
> > Why does the target need some extra say here?
> 
> Only target can decide which hard regs should be zeroed, and which hard regs are general purpose register. 

I'm mostly questioning the plethora of target hooks added and whether
this details are a good granularity applying to more than just x86.
Did I suggest to compute a hardreg set that the middle-end says was
used and is not live and leave the rest to the target?

> > 
> >>>>> +      if (used_only && !df_regs_ever_live_p (regno))
> > 
> > So I suppose this does not include uses by callees of this function?
> 
> Yes, I think so. 
> > 
> >>>>> +     continue;
> >>>>> +
> >>>>> +      /* Now we can emit insn to zero this register.  */
> >>>>> +      rtx reg, tmp;
> >>>>> +
> >>>>> +      machine_mode mode
> >>>>> +     = targetm.calls.zero_call_used_regno_mode (regno,
> >>>>> +                                                reg_raw_mode[regno]);
> > 
> > In what case does the target ever need to adjust this (we're dealing
> > with hard-regs only?)?
> 
> For x86, for example, even though the GPR registers are 64-bit, we only need to zero the lower 32-bit. etc.

That's an optimization, yes.

> > 
> >>>>> +      if (mode == VOIDmode)
> >>>>> +     continue;
> >>>>> +      if (!have_regs_of_mode[mode])
> >>>>> +     continue;
> > 
> > When does this happen?
> 
> This might be removed. I will check. 
> > 
> >>>>> +
> >>>>> +      reg = gen_rtx_REG (mode, regno);
> >>>>> +      if (zero_rtx[(int)mode] == NULL_RTX)
> >>>>> +     {
> >>>>> +       zero_rtx[(int)mode] = reg;
> >>>>> +       tmp = gen_rtx_SET (reg, const0_rtx);
> >>>>> +       emit_insn (tmp);
> >>>>> +     }
> >>>>> +      else
> >>>>> +     emit_move_insn (reg, zero_rtx[(int)mode]);
> > 
> > Not sure but I think the canonical zero to use is CONST0_RTX (mode)
> > but I may be wrong.  
> 
> You mean “const0_rtx” should be “CONST0_RTX(mode)”? 
> I will check on this.
> 
> > I'd rather have the target be able to specify
> > some special instruction for zeroing here.  Some may have
> > multi-reg set instructions for example.  That said, can't we
> > defer the actual zeroing to the target in full and only compute
> > a hard-reg-set of to-be zerored registers here and pass that
> > to a target hook?

Ah, I did.

> For vector regs, we have already provided this interface with 
> 
> targetm.calls.zero_all_vector_registers (used_only)
> 
> For integer registers, do we need such target hook too? 
> If so, yes, it might be better to let the target decide how to zero the registers.
> 
> If Not, the current design might be good enough, right?

But why not simplify it all to a single hook

  targetm.calls.zero_regs (used-not-live-hardregset, gpr_only);

?

> > 
> >>>>> +
> >>>>> +      emit_insn (targetm.calls.pro_epilogue_use (reg));
> >>>>> +    }
> >>>>> +
> >>>>> +  return;
> >>>>> +}
> >>>>> +
> >>>>> +
> >>>>> /* Return a sequence to be used as the epilogue for the current function,
> >>>>>  or NULL.  */
> >>>>> 
> >>>>> @@ -5819,6 +5961,9 @@ make_epilogue_seq (void)
> >>>>> 
> >>>>> start_sequence ();
> >>>>> emit_note (NOTE_INSN_EPILOGUE_BEG);
> >>>>> +
> >>>>> +  gen_call_used_regs_seq ();
> >>>>> +
> > 
> > The caller eventually performs shrink-wrapping - are you sure that
> > doesn't mess up things?
> 
> My understanding is, in the standard epilogue, there is no handling of “call-used” registers.  Therefore, shrink-wrapping will not impact
> “call-used” registers as well. 
> Our patch only handles call-used registers, so, there should be no any interaction between this patch and shrink-wrapping.

I don't know (CCed Segher, he should eventually).

> > 
> >>>>> 
> >>>>> +
> >>>>> + /* How to clear call-used registers upon function return.  */
> >>>>> + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
> >>>>> +
> >>>>> + /* 11 unused bits.  */
> > 
> > So instead of wasting "precious" bits please use lookup_attribute
> > in the single place you query this value (which is once per function).
> > There's no need to complicate matters by trying to maintain the above.
> 
> Thanks for the suggestion.
> Yes, I will try to use lookup_attribute in function.c instead of adding these bits. That will save us these
> precious space.

Yes, I think this will simplify the code.

Richard.

> Thanks again.
> 
> Qing

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-05 20:22                               ` Qing Zhao
@ 2020-08-06  8:37                                 ` Richard Biener
  2020-08-06 15:45                                   ` Qing Zhao
  2020-08-06 20:45                                   ` Kees Cook
  0 siblings, 2 replies; 188+ messages in thread
From: Richard Biener @ 2020-08-06  8:37 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Kees Cook, H.J. Lu, Uros Bizjak, Jakub Jelinek, GCC Patches,
	Rodriguez Bahena, Victor

On Wed, 5 Aug 2020, Qing Zhao wrote:

> >> 
> >> From The SECURE project and GCC in GCC Cauldron 2018:
> >> 
> >> Speaker: Graham Markall
> >> 
> >> The SECURE project is a 15 month program funded by Innovate UK, to
> >> take well known security techniques from academia and make them
> >> generally available in standard compilers, specfically GCC and LLVM.
> >> An explicit objective is for those techniques to be incorporated in
> >> the upstream versions of compilers. The Cauldron takes place in the
> >> final month of the project and this talk will present the technical
> >> details of some of the techniques implemented, and review those that
> >> are yet to be implemented. A particular focus of this talk will be on
> >> verifying that the implemetnation is correct, which can be a bigger
> >> challenge than the implementation.
> >> 
> >> Techniques to be covered in the project include the following:
> >> 
> >> Stack and register erasure. Ensuring that on return from a function,
> >> no data is left lying on the stack or in registers. Particular
> >> challenges are in dealing with inlining, shrink wrapping and caching.
> >> 
> >> This patch implemens register erasure.
> > 
> > Part of it, yes. While I can see abnormal transfer of control is difficult exception handling is used too wide spread to be ignored. What's the plan there? 
> > 
> > So can we also see the other parts? In particular I wonder whether exposing just register clearing (in this fine-grained manner) is required and useful rather than thinking of a better interface for the whole thing?
> 
> You mean to provide an integrated interface for both stack and register 
> erasure for security purpose?
> 
> However, Is stack erasure at function return really a better idea than 
> zero-init auto-variables in the beginning of the function?
> 
> We had some discussion with Kees Cook several weeks ago on the idea of 
> stack erasure at function return, Kees provided the following comments:
> 
> "But back to why I don't think it's the right approach:
> 
> Based on the performance measurements of pattern-init and zero-init
> in Clang, MSVC, and the kernel plugin, it's clear that adding these
> initializations has measurable performance cost. Doing it at function
> exit means performing large unconditional wipes. Doing it at function
> entry means initializations can be dead-store eliminated and highly
> optimized. Given the current debates on the measurable performance
> difference between pattern and zero initialization (even in the face of
> existing dead-store elimination), I would expect wipe-on-function-exit to
> be outside the acceptable tolerance for performance impact. (Additionally,
> we've seen negative cache effects on wiping memory when the CPU is done
> using it, though this is more pronounced in heap wiping. Zeroing at
> free is about twice as expensive as zeroing at free time due to cache
> temporality. This is true for the stack as well, but it's not as high.)”
> 
> From my understanding, the major issue with stack erasure at function 
> result is the big performance overhead, And these performance overhead 
> cannot be reduced with compiler optimizations since those additional 
> wiping insns are inserted at the end of the routine.
> 
> Based on the previous discussion with Kees, I don’t think that stack 
> erasure at function return is a good idea, Instead, we might provide an 
> alternative approach:  zero/pattern init to auto-variables. (This 
> functionality has Been available in LLVM already) This will be another 
> patch we want to add to GCC for the security purpose in general.
> 
> So, I think for the current patch, -fzero-call-used-regs should be good 
> enough.
> 
> Any comments?

OK, so -fzero-call-used-regs is a ROP mitigation technique.  To me
it sounded more like a mitigation against information leaks which
then would be highly incomplete w/o spill slot clearing.  Like
we had that discussion on secure erase of memory that should not
be DSEd.

This needs to be reflected in the documentation and eventually
the option naming?  Like -frop-protection=... similar in spirit
to how we have -fcf-protection=... (though that as well is supposed
to provide ROP mitigation).

I'm not very familiar with ROP [mitigation] techinques, so I'm no
longer questioning usefulness of this patch but leave that to others
(and thus final approval).  I'm continuing to question the plethora
of target hooks you add and will ask for better user-level documentation.

Richard.

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-06  8:31                   ` Richard Biener
@ 2020-08-06  8:41                     ` Jakub Jelinek
  2020-08-06  9:31                       ` Uros Bizjak
  2020-08-06 14:56                     ` Qing Zhao
  2020-08-06 23:37                     ` Segher Boessenkool
  2 siblings, 1 reply; 188+ messages in thread
From: Jakub Jelinek @ 2020-08-06  8:41 UTC (permalink / raw)
  To: Richard Biener
  Cc: Qing Zhao, Uros Bizjak, H. J. Lu, GCC Patches, Kees Cook,
	Rodriguez Bahena, Victor, segher

On Thu, Aug 06, 2020 at 10:31:27AM +0200, Richard Biener wrote:
> > For x86, for example, even though the GPR registers are 64-bit, we only need to zero the lower 32-bit. etc.
> 
> That's an optimization, yes.

But, does the code need to care?
If one compiles:
void
foo ()
{
  register unsigned long long a __asm ("rax");
  register unsigned long long b __asm ("rsi");
  register unsigned long long c __asm ("r8");
  register unsigned long long d __asm ("r9");
  a = 0;
  b = 0;
  c = 0;
  d = 0;
  asm volatile ("" : : "r" (a), "r" (b), "r" (c), "r" (d));
}
then the backend uses *movdi_xor patterns which are emitted
as xorl instructions (i.e. just 32-bit).  If you need to emit them
at a spot where the flags register is or might be live, then
*movdi_internal is used instead, but that one will also emit
a 32-bit movl $0, %r8d etc. instruction (because (const_int 0) is
zero extended 32-bit integer).

	Jakub


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-06  8:41                     ` Jakub Jelinek
@ 2020-08-06  9:31                       ` Uros Bizjak
  0 siblings, 0 replies; 188+ messages in thread
From: Uros Bizjak @ 2020-08-06  9:31 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Qing Zhao, H. J. Lu, GCC Patches, Kees Cook,
	Rodriguez Bahena, Victor, Segher Boessenkool

On Thu, Aug 6, 2020 at 10:42 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Thu, Aug 06, 2020 at 10:31:27AM +0200, Richard Biener wrote:
> > > For x86, for example, even though the GPR registers are 64-bit, we only need to zero the lower 32-bit. etc.
> >
> > That's an optimization, yes.
>
> But, does the code need to care?

No, because this is only an implementation detail. The RTL code should
still use DImode clears. These are emitted using 32bit insns,
implicitly zero-extended to 64bits, so in effect they implement DImode
clears.

Uros.

> If one compiles:
> void
> foo ()
> {
>   register unsigned long long a __asm ("rax");
>   register unsigned long long b __asm ("rsi");
>   register unsigned long long c __asm ("r8");
>   register unsigned long long d __asm ("r9");
>   a = 0;
>   b = 0;
>   c = 0;
>   d = 0;
>   asm volatile ("" : : "r" (a), "r" (b), "r" (c), "r" (d));
> }
> then the backend uses *movdi_xor patterns which are emitted
> as xorl instructions (i.e. just 32-bit).  If you need to emit them
> at a spot where the flags register is or might be live, then
> *movdi_internal is used instead, but that one will also emit
> a 32-bit movl $0, %r8d etc. instruction (because (const_int 0) is
> zero extended 32-bit integer).
>
>         Jakub
>

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-06  8:31                   ` Richard Biener
  2020-08-06  8:41                     ` Jakub Jelinek
@ 2020-08-06 14:56                     ` Qing Zhao
  2020-08-06 23:37                     ` Segher Boessenkool
  2 siblings, 0 replies; 188+ messages in thread
From: Qing Zhao @ 2020-08-06 14:56 UTC (permalink / raw)
  To: Richard Biener
  Cc: Uros Bizjak, H. J. Lu, Jakub Jelinek, GCC Patches, Kees Cook,
	Rodriguez Bahena, Victor, segher



> On Aug 6, 2020, at 3:31 AM, Richard Biener <rguenther@suse.de> wrote:
> 
> On Wed, 5 Aug 2020, Qing Zhao wrote:
> 
>> Hi, Richard,
>> 
>> Thanks a lot for your careful review and detailed comments.  
>> 
>> 
>>> On Aug 4, 2020, at 2:35 AM, Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de>> wrote:
>>> 
>>> I have a few comments below - I'm not sure I'm qualified to fully
>>> review the rest though.
>> 
>> Could you let me know who will be the more qualified person to fully review the rest of middle-end change?
> 
> Jeff might be, but with the intended purpose (ROP mitigation AFAIU)
> it would be nice for other target maintainers to chime in (Segher for
> power maybe) for the question below...
> 
>>>>>>> +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
>>>>>>> +     continue;
>>> 
>>> Why does the target need some extra say here?
>> 
>> Only target can decide which hard regs should be zeroed, and which hard regs are general purpose register. 
> 
> I'm mostly questioning the plethora of target hooks added and whether
> this details are a good granularity applying to more than just x86.
> Did I suggest to compute a hardreg set that the middle-end says was
> used and is not live and leave the rest to the target?

Yes, I agree that there might be too much details exposed to middle-end in the current design. 

A single target hook as you suggested:
 targetm.calls.zero_regs (used-not-live-hardregset, gpr_only);

Might be a cleaner design.


Thanks.

Qing


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-06  8:37                                 ` Richard Biener
@ 2020-08-06 15:45                                   ` Qing Zhao
  2020-08-06 20:45                                   ` Kees Cook
  1 sibling, 0 replies; 188+ messages in thread
From: Qing Zhao @ 2020-08-06 15:45 UTC (permalink / raw)
  To: Richard Biener
  Cc: Kees Cook, H.J. Lu, Uros Bizjak, Jakub Jelinek, GCC Patches,
	Rodriguez Bahena, Victor



> On Aug 6, 2020, at 3:37 AM, Richard Biener <rguenther@suse.de> wrote:
> 
> On Wed, 5 Aug 2020, Qing Zhao wrote:
> 
>>>> 
>>>> From The SECURE project and GCC in GCC Cauldron 2018:
>>>> 
>>>> Speaker: Graham Markall
>>>> 
>>>> The SECURE project is a 15 month program funded by Innovate UK, to
>>>> take well known security techniques from academia and make them
>>>> generally available in standard compilers, specfically GCC and LLVM.
>>>> An explicit objective is for those techniques to be incorporated in
>>>> the upstream versions of compilers. The Cauldron takes place in the
>>>> final month of the project and this talk will present the technical
>>>> details of some of the techniques implemented, and review those that
>>>> are yet to be implemented. A particular focus of this talk will be on
>>>> verifying that the implemetnation is correct, which can be a bigger
>>>> challenge than the implementation.
>>>> 
>>>> Techniques to be covered in the project include the following:
>>>> 
>>>> Stack and register erasure. Ensuring that on return from a function,
>>>> no data is left lying on the stack or in registers. Particular
>>>> challenges are in dealing with inlining, shrink wrapping and caching.
>>>> 
>>>> This patch implemens register erasure.
>>> 
>>> Part of it, yes. While I can see abnormal transfer of control is difficult exception handling is used too wide spread to be ignored. What's the plan there? 
>>> 
>>> So can we also see the other parts? In particular I wonder whether exposing just register clearing (in this fine-grained manner) is required and useful rather than thinking of a better interface for the whole thing?
>> 
>> You mean to provide an integrated interface for both stack and register 
>> erasure for security purpose?
>> 
>> However, Is stack erasure at function return really a better idea than 
>> zero-init auto-variables in the beginning of the function?
>> 
>> We had some discussion with Kees Cook several weeks ago on the idea of 
>> stack erasure at function return, Kees provided the following comments:
>> 
>> "But back to why I don't think it's the right approach:
>> 
>> Based on the performance measurements of pattern-init and zero-init
>> in Clang, MSVC, and the kernel plugin, it's clear that adding these
>> initializations has measurable performance cost. Doing it at function
>> exit means performing large unconditional wipes. Doing it at function
>> entry means initializations can be dead-store eliminated and highly
>> optimized. Given the current debates on the measurable performance
>> difference between pattern and zero initialization (even in the face of
>> existing dead-store elimination), I would expect wipe-on-function-exit to
>> be outside the acceptable tolerance for performance impact. (Additionally,
>> we've seen negative cache effects on wiping memory when the CPU is done
>> using it, though this is more pronounced in heap wiping. Zeroing at
>> free is about twice as expensive as zeroing at free time due to cache
>> temporality. This is true for the stack as well, but it's not as high.)”
>> 
>> From my understanding, the major issue with stack erasure at function 
>> result is the big performance overhead, And these performance overhead 
>> cannot be reduced with compiler optimizations since those additional 
>> wiping insns are inserted at the end of the routine.
>> 
>> Based on the previous discussion with Kees, I don’t think that stack 
>> erasure at function return is a good idea, Instead, we might provide an 
>> alternative approach:  zero/pattern init to auto-variables. (This 
>> functionality has Been available in LLVM already) This will be another 
>> patch we want to add to GCC for the security purpose in general.
>> 
>> So, I think for the current patch, -fzero-call-used-regs should be good 
>> enough.
>> 
>> Any comments?
> 
> OK, so -fzero-call-used-regs is a ROP mitigation technique.  To me
> it sounded more like a mitigation against information leaks which
> then would be highly incomplete w/o spill slot clearing.

With the “spill slot clearing”, do you mean the “stack erasure” or something else?

From the paper 

"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"

https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>

The call-used registers are used by the ROP hackers as following:

"Based on the practical experience of reading and writing ROP code. we find the features of ROP attacks as follows.

First, the destination of using gadget chains in usual is performing system call or system function to perform 
malicious behaviour such as file access, network access and W ⊕ X disable. In most cases, the adversary
 would like to disable W ⊕ X. Because once W ⊕ X has been disabled, shellcode can be executed directly
 instead of rewritting shellcode to ROP chains which may cause some troubles for the adversary. In upper 
example, the system call is number 59 which is “execve” system call.

Second, if the adversary performs ROP attacks using system call instruction, no matter on x86 or x64 
architecture, the register would be used to pass parameter. Or if the adversary performs ROP attacks 
using system function such as “read” or “mprotect”, on x64 system, the register would still be used to 
pass parameters, as mentioned in subsection B and C.”

We can see that call-used registers might be used by the ROP hackers to pass parameters to the system call.
If compiler can clean these registers before routine “return", then ROP attack will be invalid. 


>  Like
> we had that discussion on secure erase of memory that should not
> be DSEd.
> 
> This needs to be reflected in the documentation and eventually
> the option naming?  Like -frop-protection=... similar in spirit
> to how we have -fcf-protection=... (though that as well is supposed
> to provide ROP mitigation).

How about -frop-mitigation=[skip|used-gpr|all-gpr|used|all]?
> 
> I'm not very familiar with ROP [mitigation] techinques, so I'm no
> longer questioning usefulness of this patch but leave that to others
> (and thus final approval).  I'm continuing to question the plethora
> of target hooks you add and will ask for better user-level documentation.

Will think this more and come up with a better user-level documentation .

thanks.

Qing
> 
> Richard.


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-06  8:37                                 ` Richard Biener
  2020-08-06 15:45                                   ` Qing Zhao
@ 2020-08-06 20:45                                   ` Kees Cook
  2020-08-07  6:21                                     ` Richard Biener
  1 sibling, 1 reply; 188+ messages in thread
From: Kees Cook @ 2020-08-06 20:45 UTC (permalink / raw)
  To: Richard Biener
  Cc: Qing Zhao, H.J. Lu, Uros Bizjak, Jakub Jelinek, GCC Patches,
	Rodriguez Bahena, Victor

On Thu, Aug 06, 2020 at 10:37:43AM +0200, Richard Biener wrote:
> OK, so -fzero-call-used-regs is a ROP mitigation technique.  To me
> it sounded more like a mitigation against information leaks which
> then would be highly incomplete w/o spill slot clearing.  Like
> we had that discussion on secure erase of memory that should not
> be DSEd.

I've viewed stack erasure as separate from register clearing. The
"when" of stack erasure tends to define which things are being defended
against. If the stack is being erased on function entry, you're defending
against all the various "uninitialized" variable attacks (which can be
info exposures, flow control redirection, etc). If it's on function exit,
this is more aimed at avoiding stale data and minimizing what's available
during an attack (and it also provides similar "uninit" defenses, just
in a different way). And FWIW, past benchmarks on this appear to indicate
erase-on-entry is more cache-friendly.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-05 21:35                 ` Qing Zhao
  2020-08-06  8:31                   ` Richard Biener
@ 2020-08-06 22:32                   ` Qing Zhao
  1 sibling, 0 replies; 188+ messages in thread
From: Qing Zhao @ 2020-08-06 22:32 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jakub Jelinek, Kees Cook, Uros Bizjak, Rodriguez Bahena, Victor,
	GCC Patches

Hi, Richard,


> On Aug 5, 2020, at 4:35 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> 
>> 
>>>>>> +     continue;
>>>>>> +      if (fixed_regs[regno])
>>>>>> +     continue;
>>>>>> +      if (is_live_reg_at_exit (regno))
>>>>>> +     continue;
>> 
>> How can a call-used reg be live at exit?
> 
> Yes, this might not be needed, I will double check on this.

Just double checked this. And it turned out that this condition cannot be deleted.

a call-used reg might be the register that holds the return value and return to caller (so it’s live at exit).
For example, the EAX register of i386 is a call-used register and at the same time, it is the register that holds the return value.

Hope this is clear.

thanks.

Qing


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-06  8:31                   ` Richard Biener
  2020-08-06  8:41                     ` Jakub Jelinek
  2020-08-06 14:56                     ` Qing Zhao
@ 2020-08-06 23:37                     ` Segher Boessenkool
  2020-08-07 16:06                       ` Qing Zhao
  2020-08-19 20:05                       ` Qing Zhao
  2 siblings, 2 replies; 188+ messages in thread
From: Segher Boessenkool @ 2020-08-06 23:37 UTC (permalink / raw)
  To: Richard Biener
  Cc: Qing Zhao, Uros Bizjak, H. J. Lu, Jakub Jelinek, GCC Patches,
	Kees Cook, Rodriguez Bahena, Victor

Hi!

On Thu, Aug 06, 2020 at 10:31:27AM +0200, Richard Biener wrote:
> Jeff might be, but with the intended purpose (ROP mitigation AFAIU)
> it would be nice for other target maintainers to chime in (Segher for
> power maybe) for the question below...

It would be nice if this described anywhere what the benefit of this is,
including actual hard numbers.  I only see it is very costly, and I see
no benefit whatsoever.

> > >>>>> [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > >>>>> command-line option and
> > >>>>> zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:

"call-used" is such a bad name.  "call-clobbered" is better already, but
"volatile" (over calls) is most obvious I think.

There are at least four different kinds of volatile registers:

1) Argument registers are volatile, on most ABIs.
2) The *linker* (or dynamic linker!) may insert code that needs some
   registers for itself;
3) Registers only used for scratch space;
4) Registers used for returning the function value.

And these can overlap, and differ per function.

> > > Again - what's the intended use (and how does it fulful anything useful
> > > for that case)?

Yes, exactly.

> > >>>>> +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
> > >>>>> +     continue;
> > > 
> > > Why does the target need some extra say here?
> > 
> > Only target can decide which hard regs should be zeroed, and which hard regs are general purpose register. 
> 
> I'm mostly questioning the plethora of target hooks added and whether
> this details are a good granularity applying to more than just x86.
> Did I suggest to compute a hardreg set that the middle-end says was
> used and is not live and leave the rest to the target?

It probably would be much easier to just have the target do *all* of
this, in one hook, or maybe even in the existing epilogue stuff.  The
resulting binary code will be very slow no matter what, so this should
not matter much at all.

> > >>>>> +      machine_mode mode
> > >>>>> +     = targetm.calls.zero_call_used_regno_mode (regno,
> > >>>>> +                                                reg_raw_mode[regno]);
> > > 
> > > In what case does the target ever need to adjust this (we're dealing
> > > with hard-regs only?)?
> > 
> > For x86, for example, even though the GPR registers are 64-bit, we only need to zero the lower 32-bit. etc.
> 
> That's an optimization, yes.

I gues what is meant here is that the usual x86-64 insns to clear the
low 32 bits of a register actually clear the whole register?  It is a
huge security leak otherwise.  And, the generic code has nothing to do
with this, define hooks that ask the target to clear stuff, instead?

> > >>>>> +      reg = gen_rtx_REG (mode, regno);
> > >>>>> +      if (zero_rtx[(int)mode] == NULL_RTX)
> > >>>>> +     {
> > >>>>> +       zero_rtx[(int)mode] = reg;
> > >>>>> +       tmp = gen_rtx_SET (reg, const0_rtx);
> > >>>>> +       emit_insn (tmp);
> > >>>>> +     }
> > >>>>> +      else
> > >>>>> +     emit_move_insn (reg, zero_rtx[(int)mode]);
> > > 
> > > Not sure but I think the canonical zero to use is CONST0_RTX (mode)
> > > but I may be wrong.  
> > 
> > You mean “const0_rtx” should be “CONST0_RTX(mode)”? 
> > I will check on this.

If it is a CONST_INT, you should use const0_rtx; otherwise,
CONST0_RTX (mode) .  I have no idea what zero_rtx is, but there is
const_tiny_rtx already, and you shouldn't use that directly either.

> But why not simplify it all to a single hook
> 
>   targetm.calls.zero_regs (used-not-live-hardregset, gpr_only);
> 
> ?

Yeah.  With a much better name though (it should say what it is for, or
describe at a *high level* what it does).

> > >>>>> start_sequence ();
> > >>>>> emit_note (NOTE_INSN_EPILOGUE_BEG);
> > >>>>> +
> > >>>>> +  gen_call_used_regs_seq ();
> > >>>>> +
> > > 
> > > The caller eventually performs shrink-wrapping - are you sure that
> > > doesn't mess up things?
> > 
> > My understanding is, in the standard epilogue, there is no handling of “call-used” registers.  Therefore, shrink-wrapping will not impact
> > “call-used” registers as well. 
> > Our patch only handles call-used registers, so, there should be no any interaction between this patch and shrink-wrapping.
> 
> I don't know (CCed Segher, he should eventually).

Shrink-wrapping often deals with the non-volatile registers, so that
doesn't matter much for this patch series.  But the epilogue can use
some volatile registers as well, including to hold sensitive info.  And
of course everything is different if you use separate shrink-wrapping,
but that work is done already when you get here (so it is too late?)


Anyway.  This all needs a good description in the user manual (is there?
I couldn't find any), explaining what exactly it does (user-visible),
and when you would want to use it, etc.  We need that before we can
review anything else in this patch sanely.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-06 20:45                                   ` Kees Cook
@ 2020-08-07  6:21                                     ` Richard Biener
  2020-08-07 16:15                                       ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Richard Biener @ 2020-08-07  6:21 UTC (permalink / raw)
  To: Kees Cook
  Cc: Qing Zhao, H.J. Lu, Uros Bizjak, Jakub Jelinek, GCC Patches,
	Rodriguez Bahena, Victor

On Thu, 6 Aug 2020, Kees Cook wrote:

> On Thu, Aug 06, 2020 at 10:37:43AM +0200, Richard Biener wrote:
> > OK, so -fzero-call-used-regs is a ROP mitigation technique.  To me
> > it sounded more like a mitigation against information leaks which
> > then would be highly incomplete w/o spill slot clearing.  Like
> > we had that discussion on secure erase of memory that should not
> > be DSEd.
> 
> I've viewed stack erasure as separate from register clearing. The
> "when" of stack erasure tends to define which things are being defended
> against. If the stack is being erased on function entry, you're defending
> against all the various "uninitialized" variable attacks (which can be
> info exposures, flow control redirection, etc). If it's on function exit,
> this is more aimed at avoiding stale data and minimizing what's available
> during an attack (and it also provides similar "uninit" defenses, just
> in a different way). And FWIW, past benchmarks on this appear to indicate
> erase-on-entry is more cache-friendly.

So I originally thought this was about leaking security sensitive data
to callers and thus we want to define API entries to not leak any
data from callees other than via the ABI defined return values or
global memory the callee chooses to populate.  Clearing registers
not involved in returning data is one part but then contents of such
registers could also reside in spill slots which means you have to
clear those as well.  And yes, even local automatic variables of the
callee fall into the category and thus 'stack-erasure' would be
required.  To appropriately have such a "security boundary" at
function return you _do_ have to do the clearing at function return
though.

But it's a completely different topic and it seems the patch was
not intended to help the folks that also ask for "secure"_memset
the compiler isn't supposed to optimize away as dead.

Richard.

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-07-28 20:05         ` PING " Qing Zhao
  2020-07-31 17:57           ` Uros Bizjak
@ 2020-08-07 13:20           ` Alexandre Oliva
  2020-08-07 17:04             ` Qing Zhao
  1 sibling, 1 reply; 188+ messages in thread
From: Alexandre Oliva @ 2020-08-07 13:20 UTC (permalink / raw)
  To: Qing Zhao via Gcc-patches
  Cc: Richard Biener, Uros Bizjak, Qing Zhao, Jakub Jelinek, Kees Cook,
	Rodriguez Bahena, Victor

On Jul 28, 2020, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:

>> 2. The main code generation part is moved from i386 backend to middle-end;
>> 3. Add 4 target-hooks;
>> 4. Implement these 4 target-hooks on i386 backend. 
>> 5. On a target that does not implement the target hook, issue error

I wonder...  How important is it that the registers be zeroed, rather
than just avoid leaking internal state from the function?

It occurred to me that we could implement this in an entirely
machine-independent way by just arranging for the option to change the
calling conventions for all registers that are not used by return to be
regarded as call-saved.  Then the prologue logic would save the incoming
value of the registers, and the epilogue would restore them, and we're
all set.  It might even cover propagation of exceptions out of the
function.


Even if zeroing registers is desirable, it might still be possible to
build upon the above to do that in a machine-independent fashion, using
the annotations used to output call frame info to identify the slots in
which the to-be-zeroed registers were saved, and store zeros there,
either by modifying the save insns, or by adding extra stores to the end
of the prologue, at least as a default implementation for a target hook,
that could be overridden with something that does the job in more
efficient but target-specific ways.


-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-06 23:37                     ` Segher Boessenkool
@ 2020-08-07 16:06                       ` Qing Zhao
  2020-08-07 22:59                         ` Segher Boessenkool
  2020-08-19 20:05                       ` Qing Zhao
  1 sibling, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-08-07 16:06 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Richard Biener, Uros Bizjak, H. J. Lu, Jakub Jelinek,
	GCC Patches, Kees Cook, Rodriguez Bahena, Victor

Hi, Segher,

Thanks for your comments.

> On Aug 6, 2020, at 6:37 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Hi!
> 
> On Thu, Aug 06, 2020 at 10:31:27AM +0200, Richard Biener wrote:
>> Jeff might be, but with the intended purpose (ROP mitigation AFAIU)
>> it would be nice for other target maintainers to chime in (Segher for
>> power maybe) for the question below...
> 
> It would be nice if this described anywhere what the benefit of this is,
> including actual hard numbers.  I only see it is very costly, and I see
> no benefit whatsoever.

I will add the motivation of this patch clearly in the next patch update. 
Here, for your reference, As I mentioned in other emails you might miss,
From my understanding (I am not a security expert though), this patch should serve two purpose:

1. Erase the registers upon return to avoid information leak;
2. ROP mitigation, for details on this, please refer to paper:

"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"

https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>

From the above paper, The call-used registers are used by the ROP hackers as following:

"Based on the practical experience of reading and writing ROP code. we find the features of ROP attacks as follows.

First, the destination of using gadget chains in usual is performing system call or system function to perform 
malicious behaviour such as file access, network access and W ⊕ X disable. In most cases, the adversary
would like to disable W ⊕ X. Because once W ⊕ X has been disabled, shellcode can be executed directly
instead of rewritting shellcode to ROP chains which may cause some troubles for the adversary. In upper 
example, the system call is number 59 which is “execve” system call.

Second, if the adversary performs ROP attacks using system call instruction, no matter on x86 or x64 
architecture, the register would be used to pass parameter. Or if the adversary performs ROP attacks 
using system function such as “read” or “mprotect”, on x64 system, the register would still be used to 
pass parameters, as mentioned in subsection B and C.”

We can see that call-used registers might be used by the ROP hackers to pass parameters to the system call.
If compiler can clean these registers before routine “return", then ROP attack will be invalid. 

Yes, there will be performance overhead from adding these register wiping insns. However, it’s necessary to
add overhead for security purpose.
Of course, on the other hand, We need to consider to minimize the performance overhead in our implementation. 


> 
>>>>>>>> [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
>>>>>>>> command-line option and
>>>>>>>> zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> 
> "call-used" is such a bad name.  "call-clobbered" is better already, but
> "volatile" (over calls) is most obvious I think.

In our GCC compiler source code, we used the name “call-used” a lot, of course, “call-clobbered” is
also used frequently.  Do these names refer to the same set of registers, i.e, the register set that  
will be corrupted by function call?
If so, I am okay with name “call-clobbered” if this name sounds better. 

> 
> There are at least four different kinds of volatile registers:
> 
> 1) Argument registers are volatile, on most ABIs.
These are the registers that  need to be cleaned up upon function return for the ROP mitigation described in the paper
mentioned above.

> 2) The *linker* (or dynamic linker!) may insert code that needs some
>   registers for itself;
> 3) Registers only used for scratch space;
> 4) Registers used for returning the function value.

I think that the above 1,3,4 should be all covered by “call_used_regs”. 

Not sure about 2, could you explain a little bit more on 2 (The linker may insert code that needs some register for itself)? 

> 
> And these can overlap, and differ per function.
> 
>>>> Again - what's the intended use (and how does it fulful anything useful
>>>> for that case)?
> 
> Yes, exactly.
Please see my responds in the beginning. 

> 
>>>>>>>> +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
>>>>>>>> +     continue;
>>>> 
>>>> Why does the target need some extra say here?
>>> 
>>> Only target can decide which hard regs should be zeroed, and which hard regs are general purpose register. 
>> 
>> I'm mostly questioning the plethora of target hooks added and whether
>> this details are a good granularity applying to more than just x86.
>> Did I suggest to compute a hardreg set that the middle-end says was
>> used and is not live and leave the rest to the target?
> 
> It probably would be much easier to just have the target do *all* of
> this, in one hook, or maybe even in the existing epilogue stuff.  The
> resulting binary code will be very slow no matter what, so this should
> not matter much at all.

I have agreed that moving the zeroing regs part entirely to target. Middle-end will only compute a hard regs set that need to be
zeroed and pass it to target.

> 
>>>>>>>> +      machine_mode mode
>>>>>>>> +     = targetm.calls.zero_call_used_regno_mode (regno,
>>>>>>>> +                                                reg_raw_mode[regno]);
>>>> 
>>>> In what case does the target ever need to adjust this (we're dealing
>>>> with hard-regs only?)?
>>> 
>>> For x86, for example, even though the GPR registers are 64-bit, we only need to zero the lower 32-bit. etc.
>> 
>> That's an optimization, yes.
> 
> I gues what is meant here is that the usual x86-64 insns to clear the
> low 32 bits of a register actually clear the whole register?

This is my understanding. H.J.Lu might provide better explanation if needed.

>  It is a
> huge security leak otherwise.  And, the generic code has nothing to do
> with this, define hooks that ask the target to clear stuff, instead?

Yes, I think that these kind of details are not good to be exposed to middle-end.

> 
>>>>>>>> +      reg = gen_rtx_REG (mode, regno);
>>>>>>>> +      if (zero_rtx[(int)mode] == NULL_RTX)
>>>>>>>> +     {
>>>>>>>> +       zero_rtx[(int)mode] = reg;
>>>>>>>> +       tmp = gen_rtx_SET (reg, const0_rtx);
>>>>>>>> +       emit_insn (tmp);
>>>>>>>> +     }
>>>>>>>> +      else
>>>>>>>> +     emit_move_insn (reg, zero_rtx[(int)mode]);
>>>> 
>>>> Not sure but I think the canonical zero to use is CONST0_RTX (mode)
>>>> but I may be wrong.  
>>> 
>>> You mean “const0_rtx” should be “CONST0_RTX(mode)”? 
>>> I will check on this.
> 
> If it is a CONST_INT, you should use const0_rtx; otherwise,
> CONST0_RTX (mode) .  I have no idea what zero_rtx is, but there is
> const_tiny_rtx already, and you shouldn't use that directly either.

Okay.

> 
>> But why not simplify it all to a single hook
>> 
>>  targetm.calls.zero_regs (used-not-live-hardregset, gpr_only);
>> 
>> ?
> 
> Yeah.  With a much better name though (it should say what it is for, or
> describe at a *high level* what it does).
Okay.

> 
>>>>>>>> start_sequence ();
>>>>>>>> emit_note (NOTE_INSN_EPILOGUE_BEG);
>>>>>>>> +
>>>>>>>> +  gen_call_used_regs_seq ();
>>>>>>>> +
>>>> 
>>>> The caller eventually performs shrink-wrapping - are you sure that
>>>> doesn't mess up things?
>>> 
>>> My understanding is, in the standard epilogue, there is no handling of “call-used” registers.  Therefore, shrink-wrapping will not impact
>>> “call-used” registers as well. 
>>> Our patch only handles call-used registers, so, there should be no any interaction between this patch and shrink-wrapping.
>> 
>> I don't know (CCed Segher, he should eventually).
> 
> Shrink-wrapping often deals with the non-volatile registers, so that
> doesn't matter much for this patch series.

Yes, that was my understanding as well. 

>  But the epilogue can use
> some volatile registers as well, including to hold sensitive info.  And
> of course everything is different if you use separate shrink-wrapping,
> but that work is done already when you get here (so it is too late?)

Could you please explain this part a little bit more?

> 
> 
> Anyway.  This all needs a good description in the user manual (is there?
> I couldn't find any), explaining what exactly it does (user-visible),
> and when you would want to use it, etc.  We need that before we can
> review anything else in this patch sanely.
Will do.

Qing
> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-07  6:21                                     ` Richard Biener
@ 2020-08-07 16:15                                       ` Qing Zhao
  0 siblings, 0 replies; 188+ messages in thread
From: Qing Zhao @ 2020-08-07 16:15 UTC (permalink / raw)
  To: Richard Biener
  Cc: Kees Cook, H.J. Lu, Uros Bizjak, Jakub Jelinek, GCC Patches,
	Rodriguez Bahena, Victor



> On Aug 7, 2020, at 1:21 AM, Richard Biener <rguenther@suse.de> wrote:
> 
> On Thu, 6 Aug 2020, Kees Cook wrote:
> 
>> On Thu, Aug 06, 2020 at 10:37:43AM +0200, Richard Biener wrote:
>>> OK, so -fzero-call-used-regs is a ROP mitigation technique.  To me
>>> it sounded more like a mitigation against information leaks which
>>> then would be highly incomplete w/o spill slot clearing.  Like
>>> we had that discussion on secure erase of memory that should not
>>> be DSEd.
>> 
>> I've viewed stack erasure as separate from register clearing. The
>> "when" of stack erasure tends to define which things are being defended
>> against. If the stack is being erased on function entry, you're defending
>> against all the various "uninitialized" variable attacks (which can be
>> info exposures, flow control redirection, etc). If it's on function exit,
>> this is more aimed at avoiding stale data and minimizing what's available
>> during an attack (and it also provides similar "uninit" defenses, just
>> in a different way). And FWIW, past benchmarks on this appear to indicate
>> erase-on-entry is more cache-friendly.
> 
> So I originally thought this was about leaking security sensitive data
> to callers and thus we want to define API entries to not leak any
> data from callees other than via the ABI defined return values or
> global memory the callee chooses to populate.  Clearing registers
> not involved in returning data is one part but then contents of such
> registers could also reside in spill slots which means you have to
> clear those as well.  And yes, even local automatic variables of the
> callee fall into the category and thus 'stack-erasure' would be
> required.  To appropriately have such a "security boundary" at
> function return you _do_ have to do the clearing at function return
> though.

In the following slides of The Secure Project and GCC:

https://gmarkall.files.wordpress.com/2018/09/secure_and_gcc.pdf <https://gmarkall.files.wordpress.com/2018/09/secure_and_gcc.pdf>

It  was mentioned that the the stack erase patch For GCC would be submitted to gcc upstream soon (in 2018).
I am wondering whether that patch has been submitted and discussed already?

Qing

> 
> But it's a completely different topic and it seems the patch was
> not intended to help the folks that also ask for "secure"_memset
> the compiler isn't supposed to optimize away as dead.
> 
> Richard.


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-07 13:20           ` Alexandre Oliva
@ 2020-08-07 17:04             ` Qing Zhao
  2020-08-11  2:39               ` Alexandre Oliva
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-08-07 17:04 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Qing Zhao via Gcc-patches, Richard Biener, Uros Bizjak,
	Jakub Jelinek, Kees Cook, Rodriguez Bahena, Victor

Hi, Alexandre,

Thank you for the comments and suggestions.

> On Aug 7, 2020, at 8:20 AM, Alexandre Oliva <oliva@adacore.com> wrote:
> 
> On Jul 28, 2020, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
>>> 2. The main code generation part is moved from i386 backend to middle-end;
>>> 3. Add 4 target-hooks;
>>> 4. Implement these 4 target-hooks on i386 backend. 
>>> 5. On a target that does not implement the target hook, issue error
> 
> I wonder...  How important is it that the registers be zeroed, rather
> than just avoid leaking internal state from the function?

As I explained in other emails about the motivation of this patch:
 
From my understanding (I am not a security expert though), this patch should serve two purpose:

1. Erase the registers upon return to avoid information leak from the function;
2. ROP mitigation, for details on this, please refer to paper:

"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"

https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>

From the above paper, The call-used registers are used by the ROP hackers as following:

"Based on the practical experience of reading and writing ROP code. we find the features of ROP attacks as follows.

First, the destination of using gadget chains in usual is performing system call or system function to perform 
malicious behaviour such as file access, network access and W ⊕ X disable. In most cases, the adversary
would like to disable W ⊕ X. Because once W ⊕ X has been disabled, shellcode can be executed directly
instead of rewritting shellcode to ROP chains which may cause some troubles for the adversary. In upper 
example, the system call is number 59 which is “execve” system call.

Second, if the adversary performs ROP attacks using system call instruction, no matter on x86 or x64 
architecture, the register would be used to pass parameter. Or if the adversary performs ROP attacks 
using system function such as “read” or “mprotect”, on x64 system, the register would still be used to 
pass parameters, as mentioned in subsection B and C.”

We can see that call-used registers might be used by the ROP hackers to pass parameters to the system call.
If compiler can clean these registers before routine “return", then ROP attack will be invalid. 

So, I believe that the call-used registers (especially those registers that pass parameters) need to be zeroed
In order to mitigate the ROP attack. 

> 
> It occurred to me that we could implement this in an entirely
> machine-independent way by just arranging for the option to change the
> calling conventions for all registers that are not used by return to be
> regarded as call-saved.  Then the prologue logic would save the incoming
> value of the registers, and the epilogue would restore them, and we're
> all set.  It might even cover propagation of exceptions out of the
> function.
> 
The above approach will have the following two issues:
1. the performance overhead will double (because there will be both “save” and “restore” insns in the prologue and epilogue)
2. The ROP mitigation purpose cannot be addressed.

> 
> Even if zeroing registers is desirable, it might still be possible to
> build upon the above to do that in a machine-independent fashion, using
> the annotations used to output call frame info to identify the slots in
> which the to-be-zeroed registers were saved, and store zeros there,
> either by modifying the save insns, or by adding extra stores to the end
> of the prologue, at least as a default implementation for a target hook,
> that could be overridden with something that does the job in more
> efficient but target-specific ways.

One of the major thing we have to consider for the implementation of this patch is, 
minimizing the performance overhead as much as possible.

I think that moving how to zeroing the registers part to each target will be a better solution since each target has
Better idea on how to use the most efficient insns to do the work.

Thanks.

Qing

> 
> 
> -- 
> Alexandre Oliva, happy hacker
> https://FSFLA.org/blogs/lxo/
> Free Software Activist
> GNU Toolchain Engineer


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-07 16:06                       ` Qing Zhao
@ 2020-08-07 22:59                         ` Segher Boessenkool
  2020-08-10 16:34                           ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-08-07 22:59 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Biener, Uros Bizjak, H. J. Lu, Jakub Jelinek,
	GCC Patches, Kees Cook, Rodriguez Bahena, Victor

Hi!

On Fri, Aug 07, 2020 at 11:06:38AM -0500, Qing Zhao wrote:
> > It would be nice if this described anywhere what the benefit of this is,
> > including actual hard numbers.  I only see it is very costly, and I see
> > no benefit whatsoever.
> 
> I will add the motivation of this patch clearly in the next patch update. 
> Here, for your reference, As I mentioned in other emails you might miss,

Well, the GCC ML archive doesn't cross month boundaries, so things are
hard to look up if I have deleted my own copy already :-(

> From my understanding (I am not a security expert though), this patch should serve two purpose:
> 
> 1. Erase the registers upon return to avoid information leak;

But only some of the registers.

> 2. ROP mitigation, for details on this, please refer to paper:
> 
> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"
> 
> https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>

Do you have a link to this that people can actually read?

> From the above paper, The call-used registers are used by the ROP hackers as following:
> 
> "Based on the practical experience of reading and writing ROP code. we find the features of ROP attacks as follows.
> 
> First, the destination of using gadget chains in usual is performing system call or system function to perform 
> malicious behaviour such as file access, network access and W ⊕ X disable. In most cases, the adversary
> would like to disable W ⊕ X.

That makes things easier, for sure, but is just a nicety really.

> Because once W ⊕ X has been disabled, shellcode can be executed directly
> instead of rewritting shellcode to ROP chains which may cause some troubles for the adversary. In upper 
> example, the system call is number 59 which is “execve” system call.
> 
> Second, if the adversary performs ROP attacks using system call instruction, no matter on x86 or x64 
> architecture, the register would be used to pass parameter. Or if the adversary performs ROP attacks 
> using system function such as “read” or “mprotect”, on x64 system, the register would still be used to 
> pass parameters, as mentioned in subsection B and C.”
> 
> We can see that call-used registers might be used by the ROP hackers to pass parameters to the system call.
> If compiler can clean these registers before routine “return", then ROP attack will be invalid. 

So the idea is that clearing (or otherwise interfering with) the registers
used for parameter passing makes making useful ROP chains harder?

> Yes, there will be performance overhead from adding these register wiping insns. However, it’s necessary to
> add overhead for security purpose.

The point is the balance between how expensive it is, vs. how much it
makes it harder to exploit the code.

But of course any user can make that judgment themselves.  For us it
mostly matters what the cost is to targets that use it, to targets that
do not use it, and to the generic code, vs. what value we give to our
users :-)

> > "call-used" is such a bad name.  "call-clobbered" is better already, but
> > "volatile" (over calls) is most obvious I think.
> 
> In our GCC compiler source code, we used the name “call-used” a lot, of course, “call-clobbered” is
> also used frequently.  Do these names refer to the same set of registers, i.e, the register set that  
> will be corrupted by function call?

Anything that isn't "call-saved" or "fixed" is called "call-used",
essentially.  (And the relation with "fixed" isn't always clear).

> If so, I am okay with name “call-clobbered” if this name sounds better. 

It's more obvious, at least to me.

> > There are at least four different kinds of volatile registers:
> > 
> > 1) Argument registers are volatile, on most ABIs.
> These are the registers that  need to be cleaned up upon function return for the ROP mitigation described in the paper
> mentioned above.
> 
> > 2) The *linker* (or dynamic linker!) may insert code that needs some
> >   registers for itself;
> > 3) Registers only used for scratch space;
> > 4) Registers used for returning the function value.
> 
> I think that the above 1,3,4 should be all covered by “call_used_regs”. 

1 and 4 are the *same* (or overlap) on most ABIs.  3 can be as well, it
depends what the compiler is allowed to do; normally, if the compiler
wants a register, the parameter passing regs are among the cheapest it
can use.

2 you cannot touch usefully at all, for your purposes.

> Not sure about 2, could you explain a little bit more on 2 (The linker may insert code that needs some register for itself)? 

Sure.  The linker can decide it needs to insert some code to restore a
"global pointer" or similar in the function return path (or anything
else -- it just has to follow the ABI, which the generic compiler does
not know enough about at all).

> I have agreed that moving the zeroing regs part entirely to target. Middle-end will only compute a hard regs set that need to be
> zeroed and pass it to target.

The registers you *want* to interfere with are the parameter passing
registers, minus the ones used for the return value of the current
function; not *all* call-clobbered registers.

The generic compiler does not have enough information at all to do this
as far as I can see, and it would fit much better to what the backend
does anyway?

> >  It is a
> > huge security leak otherwise.  And, the generic code has nothing to do
> > with this, define hooks that ask the target to clear stuff, instead?
> 
> Yes, I think that these kind of details are not good to be exposed to middle-end.

I think you should make a hook that just does the whole thing.  There is
nothing useful (or even correct) the generic code can do.  (The command
line flag to do this could be generic, and the hook to actually generate
the code for it as well of course, but other than that, there are so
many more differences between targets, subtargets, and OSes here, and
most of those not expressed anywhere else yet, that it doesn't seem
worth it to artificially make the generic code handle any of this.  For
comparison, pretty much all of the "normal" prologue/epilogue handling
is done in target code already).

> >> But why not simplify it all to a single hook
> >> 
> >>  targetm.calls.zero_regs (used-not-live-hardregset, gpr_only);
> >> 
> >> ?
> > 
> > Yeah.  With a much better name though (it should say what it is for, or
> > describe at a *high level* what it does).
> Okay.

So everything else I write here ius just a very long-winded way of
saying "Yes.  This." to this :-)

> >  But the epilogue can use
> > some volatile registers as well, including to hold sensitive info.  And
> > of course everything is different if you use separate shrink-wrapping,
> > but that work is done already when you get here (so it is too late?)
> 
> Could you please explain this part a little bit more?

For example, on PowerPC, to restore the return address we first have to
load it into a general purpose register (and then move it to LR).
Usually r0 is used, and r0 is call-clobbered (but not used for parameter
passing or return value passing).

The return address of course is very sensitive information (exposing any
return address makes ASLR useless immediately).  But this isn't in the
scope of this protection, I see.

Thanks for the explanations, much appreciated,


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-07 22:59                         ` Segher Boessenkool
@ 2020-08-10 16:34                           ` Qing Zhao
  2020-08-10 19:51                             ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-08-10 16:34 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Richard Biener, Uros Bizjak, H. J. Lu, Jakub Jelinek,
	GCC Patches, Kees Cook, Rodriguez Bahena, Victor

Hi, 

> On Aug 7, 2020, at 5:59 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
>> From my understanding (I am not a security expert though), this patch should serve two purpose:
>> 
>> 1. Erase the registers upon return to avoid information leak;
> 
> But only some of the registers.

All the call-used registers could be erased upon return with -fzero-call-used-regs=all.
> 
>> 2. ROP mitigation, for details on this, please refer to paper:
>> 
>> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"
>> 
>> https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>
> 
> Do you have a link to this that people can actually read?

Sorry, I cannot find a free copy online. Looks like that I can only read the whole paper through ieee. ( I read the PDF file
through our company’s account).

> 
>> From the above paper, The call-used registers are used by the ROP hackers as following:
>> 
>> "Based on the practical experience of reading and writing ROP code. we find the features of ROP attacks as follows.
>> 
>> First, the destination of using gadget chains in usual is performing system call or system function to perform 
>> malicious behaviour such as file access, network access and W ⊕ X disable. In most cases, the adversary
>> would like to disable W ⊕ X.
> 
> That makes things easier, for sure, but is just a nicety really.
> 
>> Because once W ⊕ X has been disabled, shellcode can be executed directly
>> instead of rewritting shellcode to ROP chains which may cause some troubles for the adversary. In upper 
>> example, the system call is number 59 which is “execve” system call.
>> 
>> Second, if the adversary performs ROP attacks using system call instruction, no matter on x86 or x64 
>> architecture, the register would be used to pass parameter. Or if the adversary performs ROP attacks 
>> using system function such as “read” or “mprotect”, on x64 system, the register would still be used to 
>> pass parameters, as mentioned in subsection B and C.”
>> 
>> We can see that call-used registers might be used by the ROP hackers to pass parameters to the system call.
>> If compiler can clean these registers before routine “return", then ROP attack will be invalid. 
> 
> So the idea is that clearing (or otherwise interfering with) the registers
> used for parameter passing makes making useful ROP chains harder?

Yes, that’s my understanding.

> 
>> Yes, there will be performance overhead from adding these register wiping insns. However, it’s necessary to
>> add overhead for security purpose.
> 
> The point is the balance between how expensive it is, vs. how much it
> makes it harder to exploit the code.
> 
> But of course any user can make that judgment themselves.  For us it
> mostly matters what the cost is to targets that use it, to targets that
> do not use it, and to the generic code, vs. what value we give to our
> users :-)

We need to minimize the performance overhead during the implementation. 
At the same time, provide users options to minimize the overhead at the same time (for example the function level 
attribute, and the different level of zeros).

> 
>>> "call-used" is such a bad name.  "call-clobbered" is better already, but
>>> "volatile" (over calls) is most obvious I think.
>> 
>> In our GCC compiler source code, we used the name “call-used” a lot, of course, “call-clobbered” is
>> also used frequently.  Do these names refer to the same set of registers, i.e, the register set that  
>> will be corrupted by function call?
> 
> Anything that isn't "call-saved" or "fixed" is called "call-used",
> essentially.  (And the relation with "fixed" isn't always clear).
> 
>> If so, I am okay with name “call-clobbered” if this name sounds better. 
> 
> It's more obvious, at least to me.

Okay. 

> 
>>> There are at least four different kinds of volatile registers:
>>> 
>>> 1) Argument registers are volatile, on most ABIs.
>> These are the registers that  need to be cleaned up upon function return for the ROP mitigation described in the paper
>> mentioned above.
>> 
>>> 2) The *linker* (or dynamic linker!) may insert code that needs some
>>>  registers for itself;
>>> 3) Registers only used for scratch space;
>>> 4) Registers used for returning the function value.
>> 
>> I think that the above 1,3,4 should be all covered by “call_used_regs”. 
> 
> 1 and 4 are the *same* (or overlap) on most ABIs.  3 can be as well, it
> depends what the compiler is allowed to do; normally, if the compiler
> wants a register, the parameter passing regs are among the cheapest it
> can use.
So, are theyall covered by “call_used_reg” in GCC? 

> 2 you cannot touch usefully at all, for your purposes.
Okay.
> 
>> Not sure about 2, could you explain a little bit more on 2 (The linker may insert code that needs some register for itself)? 
> 
> Sure.  The linker can decide it needs to insert some code to restore a
> "global pointer" or similar in the function return path (or anything
> else -- it just has to follow the ABI, which the generic compiler does
> not know enough about at all).
Therefore, does the compiler know which registers with be needed by linker?

> 
>> I have agreed that moving the zeroing regs part entirely to target. Middle-end will only compute a hard regs set that need to be
>> zeroed and pass it to target.
> 
> The registers you *want* to interfere with are the parameter passing
> registers, minus the ones used for the return value of the current
> function; not *all* call-clobbered registers.

For the paper I mentioned, Yes, I agree with you. We only need to zero those registers that pass parameters. 
In addition to this purpose, shall we also consider the purpose of avoid information leaking through registers by erasing registers upon
function return?

> 
> The generic compiler does not have enough information at all to do this
> as far as I can see, and it would fit much better to what the backend
> does anyway?
You mean that the middle-end does not have enough information on which registers are passing parameters and which registers are returning
value? Only the back-ends have such information?

> 
>>> It is a
>>> huge security leak otherwise.  And, the generic code has nothing to do
>>> with this, define hooks that ask the target to clear stuff, instead?
>> 
>> Yes, I think that these kind of details are not good to be exposed to middle-end.
> 
> I think you should make a hook that just does the whole thing.  There is
> nothing useful (or even correct) the generic code can do.  (The command
> line flag to do this could be generic, and the hook to actually generate
> the code for it as well of course, but other than that, there are so
> many more differences between targets, subtargets, and OSes here, and
> most of those not expressed anywhere else yet, that it doesn't seem
> worth it to artificially make the generic code handle any of this.  For
> comparison, pretty much all of the "normal" prologue/epilogue handling
> is done in target code already).

Yes, agreed. 

> 
>>>> But why not simplify it all to a single hook
>>>> 
>>>> targetm.calls.zero_regs (used-not-live-hardregset, gpr_only);
>>>> 
>>>> ?
>>> 
>>> Yeah.  With a much better name though (it should say what it is for, or
>>> describe at a *high level* what it does).
>> Okay.
> 
> So everything else I write here ius just a very long-winded way of
> saying "Yes.  This." to this :-)

Okay.

> 
>>> But the epilogue can use
>>> some volatile registers as well, including to hold sensitive info.  And
>>> of course everything is different if you use separate shrink-wrapping,
>>> but that work is done already when you get here (so it is too late?)
>> 
>> Could you please explain this part a little bit more?
> 
> For example, on PowerPC, to restore the return address we first have to
> load it into a general purpose register (and then move it to LR).
> Usually r0 is used, and r0 is call-clobbered (but not used for parameter
> passing or return value passing).
> 
> The return address of course is very sensitive information (exposing any
> return address makes ASLR useless immediately).  But this isn't in the
> scope of this protection, I see.

So, before returning, if we clean the content of r0, is it correct? Is it safer from the security point of view?

Thanks a lot for your info.

Qing
> 
> Thanks for the explanations, much appreciated,
> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-10 16:34                           ` Qing Zhao
@ 2020-08-10 19:51                             ` Qing Zhao
  0 siblings, 0 replies; 188+ messages in thread
From: Qing Zhao @ 2020-08-10 19:51 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Jakub Jelinek, Richard Biener, Kees Cook, Uros Bizjak,
	Rodriguez Bahena, Victor, GCC Patches


>> 
>>> If so, I am okay with name “call-clobbered” if this name sounds better. 
>> 
>> It's more obvious, at least to me.

In the current option list of GCC:  https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#Code-Gen-Options <https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#Code-Gen-Options>

There is one available option whose name is: -fcall-used-reg


-fcall-used-reg

Treat the register named reg as an allocable register that is clobbered by function calls. It may be allocated for temporaries or variables that do not live across a call. Functions compiled this way do not save and restore the register reg.

It is an error to use this flag with the frame pointer or stack pointer. Use of this flag for other registers that have fixed pervasive roles in the machine’s execution model produces disastrous results.

This flag does not have a negative form, because it specifies a three-way choice.

So, the name of this option adopted “call-used” instead of “call-clobbered”.  I think that for consistency, it might be still better to use “-fzero-call-used-regs” instead of “-fzero-call-clobbered-regs”?

Qing





> Okay. 
> 


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-07 17:04             ` Qing Zhao
@ 2020-08-11  2:39               ` Alexandre Oliva
  2020-08-11  5:57                 ` Kees Cook
  2020-08-11 17:30                 ` Qing Zhao
  0 siblings, 2 replies; 188+ messages in thread
From: Alexandre Oliva @ 2020-08-11  2:39 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Qing Zhao via Gcc-patches, Richard Biener, Uros Bizjak,
	Jakub Jelinek, Kees Cook, Rodriguez Bahena, Victor

On Aug  7, 2020, Qing Zhao <QING.ZHAO@ORACLE.COM> wrote:

> So, I believe that the call-used registers (especially those registers
> that pass parameters) need to be zeroed
> In order to mitigate the ROP attack. 

Erhm, I don't get why it's important that they be zeroed.  It seems to
me that restoring their original values, or setting them to random
values, would be just as good defenses from having them set within the
function to perform a ROP attack than zeroing them.  The point is to get
rid of whatever value the attacker chose within the function.  One could
even argue that restoring the caller value is better than setting to
zero, because the result is not predictable from within the function.

OTOH, there's the flip side, that the function *could* be changed so as
to modify the stack slot in which the register is saved, if there's
hostile code running.  (it wouldn't be modified by "normal" code)

Code that sets the register to zero in the epilogue would be much harder
for an attacker to change indeed.


> I think that moving how to zeroing the registers part to each target
> will be a better solution since each target has
> Better idea on how to use the most efficient insns to do the work.

It's certainly good to allow machine-specific optimized code sequences,
but it would certainly be desirable to have a machine-independent
fallback.  It doesn't seem exceedingly hard to loop over the registers
and emit a (set (reg:M N) (const_int 0)) for each one that is to be
zeroed out.


-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-11  2:39               ` Alexandre Oliva
@ 2020-08-11  5:57                 ` Kees Cook
  2020-08-11 17:30                 ` Qing Zhao
  1 sibling, 0 replies; 188+ messages in thread
From: Kees Cook @ 2020-08-11  5:57 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Qing Zhao, Qing Zhao via Gcc-patches, Richard Biener,
	Uros Bizjak, Jakub Jelinek, Rodriguez Bahena, Victor

On Mon, Aug 10, 2020 at 11:39:26PM -0300, Alexandre Oliva wrote:
> Erhm, I don't get why it's important that they be zeroed.  It seems to
> me that restoring their original values, or setting them to random
> values, would be just as good defenses from having them set within the

In the performance analysis I looked at a while ago, doing the
register-self-xor is extremely fast to run (IIRC the cycle counts on x86
were absolutely tiny), and it's smaller for code size which minimized
the overall image footprint.

> [...]
> Code that sets the register to zero in the epilogue would be much harder
> for an attacker to change indeed.

Yes, a fixed value is a significantly better defensive position to take
for ROP. And specifically zero _tends_ to be the safest choice as it's
less "useful" to be used as a size, index, or pointer. And, no, it is
not perfect, but nothing can be if we're dealing with trying to defend
against arbitrary ROP gadget finding (or uninitialized stack contents,
where the same argument for "zero is best" also holds[1]).

-Kees

[1] https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-11  2:39               ` Alexandre Oliva
  2020-08-11  5:57                 ` Kees Cook
@ 2020-08-11 17:30                 ` Qing Zhao
  2020-08-24 10:50                   ` Richard Biener
  1 sibling, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-08-11 17:30 UTC (permalink / raw)
  To: Alexandre Oliva, Richard Biener
  Cc: Qing Zhao via Gcc-patches, Uros Bizjak, Jakub Jelinek, Kees Cook,
	Rodriguez Bahena, Victor

Hi, Alexandre,

CC’ing Richard for his comments on this.


> On Aug 10, 2020, at 9:39 PM, Alexandre Oliva <oliva@adacore.com> wrote:
>> I think that moving how to zeroing the registers part to each target
>> will be a better solution since each target has
>> Better idea on how to use the most efficient insns to do the work.
> 
> It's certainly good to allow machine-specific optimized code sequences,
> but it would certainly be desirable to have a machine-independent
> fallback.  It doesn't seem exceedingly hard to loop over the registers
> and emit a (set (reg:M N) (const_int 0)) for each one that is to be
> zeroed out.

The current implementation already includes such machine-independent code, it should be very easy to add this.

Richard, what’s your opinion on this?
Do we need a machine-independent implementation to zeroing the registers for the default when the target does not provide a optimized
Implementation?

Thanks.

Qing

> 
> 


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-06 23:37                     ` Segher Boessenkool
  2020-08-07 16:06                       ` Qing Zhao
@ 2020-08-19 20:05                       ` Qing Zhao
  2020-08-19 22:57                         ` Segher Boessenkool
  1 sibling, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-08-19 20:05 UTC (permalink / raw)
  To: Segher Boessenkool, Richard Biener, Jeff Law
  Cc: Uros Bizjak, H. J. Lu, Jakub Jelinek, GCC Patches, Kees Cook,
	Rodriguez Bahena, Victor

Hi,

Based on all the previous discussion and more extensive study on ROP and its mitigation techniques these days, I came up with the following
High-level proposal as requested, please take a look and let me know what I should change in this high-level design:

> On Aug 6, 2020, at 6:37 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Anyway.  This all needs a good description in the user manual (is there?
> I couldn't find any), explaining what exactly it does (user-visible),
> and when you would want to use it, etc.  We need that before we can
> review anything else in this patch sanely.
> 
> 
> Segher

zeroing call-used registers for security purpose

8/19/2020
Qing Zhao
=========================================

**Motivation:
There are two purposes of this patch:

1. ROP mitigation:

ROP (Return-oriented programming, https://en.wikipedia.org/wiki/Return-oriented_programming) is 
one of the most popular code reuse attack technique, which executes gadget chains to perform malicious tasks.
A gadget is a carefully chosen machine instruction sequence that is already present in the machines' memory. 
Each gadget typically ends in a return instruction and is located in a subroutine within the existing program 
and/or shared library code.

There are two variations that use gadgets that end with indirect call (COP, Call Oriented Programming )
 and jump instruction (JOP, Jump-Oriented Programming). However, performing ROP without return 
instructions in reality is difficult because the gadgets of COP and JOP that can form a completed chain 
are almost nonexistent. 

As a result, gadgets based on return instructions remain the most popular.

One important feature of ROP attack is (Clean the Scratch Registers:A Way to Mitigate Return-Oriented
Programming Attacks https://ieeexplore.ieee.org/document/8445132):
the destination of using gadget chains usually call system functions to perform malicious behaviour,
on many of the mordern architectures, the registers would be used to pass parameters for those 
system functions.

So, cleaning the scratch registers that are used to pass parameters at return instructions should 
effectively mitigate ROP attack. 

2. Register Erasure:

In the SECURE project and GCC (https://gcc.gnu.org/wiki/cauldron2018#secure)

One of the well known security techniques is stack and register erasure. 
Ensuring that on return from a function, no data is left lying on the stack or in registers.

As mentioned in the slides (https://gmarkall.files.wordpress.com/2018/09/secure_and_gcc.pdf), 
there is a seperate project that tried to resolve the stack erasure problem. and the patch for
 stack erasure had been ready to submit. That specific patch does not handle register erasure problem. 

So, we will also address the register erasure problem with this patch along with the ROP mitigation. 

** Questions and Answers:

Q1. Which registers should be set to zeros at the return of the function?
A. the caller-saved, i.e, call-used, or call-clobbered registers.
   For ROP mitigation purpose, only the call-used registers that pass
parameters need to be zeroed. 
   For register erasure purpose, all the call-used registers might need to
be zeroed. we can provide multiple levels to user for controling the runtime
overhead. 

Q2. Why zeroing the registers other than randomalize them?
A. (From Kees Cook)
    In the performance analysis I looked at a while ago, doing the
register-self-xor is extremely fast to run (IIRC the cycle counts on x86
were absolutely tiny), and it's smaller for code size which minimized
the overall image footprint.
    a fixed value is a significantly better defensive position to take
for ROP. And specifically zero _tends_ to be the safest choice as it's
less "useful" to be used as a size, index, or pointer. And, no, it is
not perfect, but nothing can be if we're dealing with trying to defend
against arbitrary ROP gadget finding (or uninitialized stack contents,
where the same argument for "zero is best" also holds[1]).

-Kees
([1]https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html)

    So, from both run-time performance and code-size aspects, setting the
registers to zero is a better approach. 

** Proposal:

We will provide a new feature into GCC for the above security purposes. 

Add -fzero-call-used-regs=[skip|rop-mitigation|used-gpr|all-gpr|used|all] command-line option
and 
zero_call_used_regs("skip|used-arg-gpr|used-arg|arg|used-gpr|all-gpr|used|all") function attribues:

    1. -mzero-call-used-regs=skip and zero_call_used_regs("skip")

    Don't zero call-used registers upon function return. This is the default behavior.

    2. -mzero-call-used-regs=used-arg-gpr and zero_call_used_regs("used-arg-gpr")

    Zero used call-used general purpose registers that are used to pass parameters upon function return.
    
    3. -mzero-call-used-regs=used-arg and zero_call_used_regs("used-arg")

    Zero used call-used registers that are used to pass parameters upon function return.

    4. -mzero-call-used-regs=arg and zero_call_used_regs("arg")

    Zero all call-used registers that are used to pass parameters upon function return.

    5. -mzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")

    Zero used call-used general purpose registers upon function return.

    6. -mzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")

    Zero all call-used general purpose registers upon function return.

    7. -mzero-call-used-regs=used and zero_call_used_regs("used")

    Zero used call-used registers upon function return.

    8. -mzero-call-used-regs=all and zero_call_used_regs("all")

    Zero all call-used registers upon function return.


Zero call-used registers at function return to increase the program
security by either mitigating Return-Oriented Programming (ROP) or 
preventing information leak through registers.  

@samp{skip}, which is the default, doesn't zero call-used registers. 

@samp{used-arg-gpr} zeros used call-used general purpose registers that 
pass parameters. @samp{used-arg} zeros used call-used registers that 
pass parameters. @samp{arg} zeros all call-used registers that pass
parameters. These 3 choices are used for ROP mitigation. 

@samp{used-gpr} zeros call-used general purpose registers 
which are used in function.  @samp{all-gpr} zeros all
call-used registers.  @samp{used} zeros call-used registers which
are used in function.  @samp{all} zeros all call-used registers. 
These 4 choices are used for preventing information leak through 
registers. 

You can control this behavior for a specific function by using the function
attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.



^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-19 20:05                       ` Qing Zhao
@ 2020-08-19 22:57                         ` Segher Boessenkool
  2020-08-19 23:27                           ` Qing Zhao
  2020-08-24 14:36                           ` Rodriguez Bahena, Victor
  0 siblings, 2 replies; 188+ messages in thread
From: Segher Boessenkool @ 2020-08-19 22:57 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Biener, Jeff Law, Uros Bizjak, H. J. Lu, Jakub Jelinek,
	GCC Patches, Kees Cook, Rodriguez Bahena, Victor

Hi!

On Wed, Aug 19, 2020 at 03:05:36PM -0500, Qing Zhao wrote:
> So, cleaning the scratch registers that are used to pass parameters at return instructions should 
> effectively mitigate ROP attack. 

But that is *very* expensive, in general.  Instead of doing just a
return instruction (which effectively costs 0 cycles, and is just one
insn), you now have to zero all call-clobbered register at every return
(typically many returns per function, and you are talking 10+ registers
even if only considering the simple integer registers).

Numbers on how expensive this is (for what arch, in code size and in
execution time) would be useful.  If it is so expensive that no one will
use it, it helps security at most none at all :-(

> Q1. Which registers should be set to zeros at the return of the function?
> A. the caller-saved, i.e, call-used, or call-clobbered registers.
>    For ROP mitigation purpose, only the call-used registers that pass
> parameters need to be zeroed. 
>    For register erasure purpose, all the call-used registers might need to
> be zeroed. we can provide multiple levels to user for controling the runtime
> overhead. 

The call-clobbered regs are the only ones you *can* touch.  That does
not mean you should clear them all (it doesn't help much at all in some
cases).  Only the backend knows.

>     So, from both run-time performance and code-size aspects, setting the
> registers to zero is a better approach. 

From a security perspective, this isn't clear though.  But that is a lot
of extra research ;-)


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-19 22:57                         ` Segher Boessenkool
@ 2020-08-19 23:27                           ` Qing Zhao
  2020-08-24 14:47                             ` Rodriguez Bahena, Victor
  2020-08-24 17:49                             ` Segher Boessenkool
  2020-08-24 14:36                           ` Rodriguez Bahena, Victor
  1 sibling, 2 replies; 188+ messages in thread
From: Qing Zhao @ 2020-08-19 23:27 UTC (permalink / raw)
  To: Segher Boessenkool, victor Rodriguez Bahena
  Cc: Richard Biener, Jeff Law, Uros Bizjak, H. J. Lu, Jakub Jelinek,
	GCC Patches, Kees Cook



> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Hi!
> 
> On Wed, Aug 19, 2020 at 03:05:36PM -0500, Qing Zhao wrote:
>> So, cleaning the scratch registers that are used to pass parameters at return instructions should 
>> effectively mitigate ROP attack. 
> 
> But that is *very* expensive, in general.  Instead of doing just a
> return instruction (which effectively costs 0 cycles, and is just one
> insn), you now have to zero all call-clobbered register at every return
> (typically many returns per function, and you are talking 10+ registers
> even if only considering the simple integer registers).

Yes, the run-time overhead and also the code-size overhead are major concerns. We should minimize the overhead
as much as we can during implementation. However, such overhead cannot be completely avoided for the security purpose. 

In order to reduce the overhead for the ROP mitigation, I added 3 new values for -fzero-call-used-regs=used-arg-grp|used-arg|arg

For “used-arg-grp”, we only zero the integer registers that are used in the routine and can pass parameters; this should provide ROP mitigation
with the minimum overhead. 

For “used-arg”, in addition to “used-arg-grp”, the other registers (for example, FP registers) that can pass parameters will be zeroed. But I am not
very sure whether this option is really needed in practical. 

For “arg”, in addition to “used-arg”, all registers that pass parameters will be zeroed. Same as “used-arg”, I am not very sure whether we need this option
Or not. 

> 
> Numbers on how expensive this is (for what arch, in code size and in
> execution time) would be useful.  If it is so expensive that no one will
> use it, it helps security at most none at all :-(

CLEAR Linux project has been using a similar patch since GCC 8, the option it used is an equivalent to -fzero-call-used-regs=used-gpr.

-fzero-call-used-regs=used-arg-gpr in this new proposal will have smaller overhead than the one currently being used in CLEAR Linux.

Victor, do you have any data on the overhead of the option that currently is used by CLEAR project?

> 
>> Q1. Which registers should be set to zeros at the return of the function?
>> A. the caller-saved, i.e, call-used, or call-clobbered registers.
>>   For ROP mitigation purpose, only the call-used registers that pass
>> parameters need to be zeroed. 
>>   For register erasure purpose, all the call-used registers might need to
>> be zeroed. we can provide multiple levels to user for controling the runtime
>> overhead. 
> 
> The call-clobbered regs are the only ones you *can* touch.  That does
> not mean you should clear them all (it doesn't help much at all in some
> cases).  Only the backend knows.

I think that for ROP mitigation purpose, we only need to clear the call-used (i.e, call-clobbered) registers that are used in the current routine and
can pass parameters. 

But for preventing information leak from callee registers, we might need to clear all the call-used registers at return.


> 
>>    So, from both run-time performance and code-size aspects, setting the
>> registers to zero is a better approach. 
> 
> From a security perspective, this isn't clear though.  But that is a lot
> of extra research ;-)

There has been quite some discussion on this topic at

https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html <https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html>

From those old discussion, we can see that zero value should be good enough for the security purpose (though it’s not perfect).

Qing

> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-11 17:30                 ` Qing Zhao
@ 2020-08-24 10:50                   ` Richard Biener
  2020-08-24 14:48                     ` Qing Zhao
  2020-08-25  5:16                     ` Alexandre Oliva
  0 siblings, 2 replies; 188+ messages in thread
From: Richard Biener @ 2020-08-24 10:50 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Alexandre Oliva, Qing Zhao via Gcc-patches, Uros Bizjak,
	Jakub Jelinek, Kees Cook, Rodriguez Bahena, Victor

On Tue, 11 Aug 2020, Qing Zhao wrote:

> Hi, Alexandre,
> 
> CC’ing Richard for his comments on this.
> 
> 
> > On Aug 10, 2020, at 9:39 PM, Alexandre Oliva <oliva@adacore.com> wrote:
> >> I think that moving how to zeroing the registers part to each target
> >> will be a better solution since each target has
> >> Better idea on how to use the most efficient insns to do the work.
> > 
> > It's certainly good to allow machine-specific optimized code sequences,
> > but it would certainly be desirable to have a machine-independent
> > fallback.  It doesn't seem exceedingly hard to loop over the registers
> > and emit a (set (reg:M N) (const_int 0)) for each one that is to be
> > zeroed out.
> 
> The current implementation already includes such machine-independent code, it should be very easy to add this.
> 
> Richard, what’s your opinion on this?
> Do we need a machine-independent implementation to zeroing the registers for the default when the target does not provide a optimized
> Implementation?

Well, at least silently doing nothing when the option is used would be 
bad.  So at least a diagnostic would be required.  Note since the
option is quite elaborate on what (sub-)set of regs is supposed to be
cleared I'm not sure an implementation not involving any target hook
is possible?

Richard.

> Thanks.
> 
> Qing
> 
> > 
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-19 22:57                         ` Segher Boessenkool
  2020-08-19 23:27                           ` Qing Zhao
@ 2020-08-24 14:36                           ` Rodriguez Bahena, Victor
  1 sibling, 0 replies; 188+ messages in thread
From: Rodriguez Bahena, Victor @ 2020-08-24 14:36 UTC (permalink / raw)
  To: Segher Boessenkool, Qing Zhao
  Cc: Richard Biener, Jeff Law, Uros Bizjak, H. J. Lu, Jakub Jelinek,
	GCC Patches, Kees Cook



-----Original Message-----
From: Segher Boessenkool <segher@kernel.crashing.org>
Date: Wednesday, August 19, 2020 at 5:58 PM
To: Qing Zhao <QING.ZHAO@ORACLE.COM>
Cc: Richard Biener <richard.guenther@gmail.com>, Jeff Law <law@redhat.com>, Uros Bizjak <ubizjak@gmail.com>, "H. J. Lu" <hjl.tools@gmail.com>, Jakub Jelinek <jakub@redhat.com>, GCC Patches <gcc-patches@gcc.gnu.org>, Kees Cook <keescook@chromium.org>, "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com>
Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

    Hi!

    On Wed, Aug 19, 2020 at 03:05:36PM -0500, Qing Zhao wrote:
    > So, cleaning the scratch registers that are used to pass parameters at return instructions should 
    > effectively mitigate ROP attack. 

    But that is *very* expensive, in general.  Instead of doing just a
    return instruction (which effectively costs 0 cycles, and is just one
    insn), you now have to zero all call-clobbered register at every return
    (typically many returns per function, and you are talking 10+ registers
    even if only considering the simple integer registers).

    Numbers on how expensive this is (for what arch, in code size and in
    execution time) would be useful.  If it is so expensive that no one will
    use it, it helps security at most none at all :-(

It is used in some operating systems and packages such as 

https://github.com/clearlinux-pkgs/gettext/blob/master/gettext.spec#L138

export CFLAGS="$CFLAGS -O3 -ffat-lto-objects -flto=4 -fstack-protector-strong -mzero-caller-saved-regs=used "

There is no record that this flag creates a considerable penalty in execution time.

    > Q1. Which registers should be set to zeros at the return of the function?
    > A. the caller-saved, i.e, call-used, or call-clobbered registers.
    >    For ROP mitigation purpose, only the call-used registers that pass
    > parameters need to be zeroed. 
    >    For register erasure purpose, all the call-used registers might need to
    > be zeroed. we can provide multiple levels to user for controling the runtime
    > overhead. 

    The call-clobbered regs are the only ones you *can* touch.  That does
    not mean you should clear them all (it doesn't help much at all in some
    cases).  Only the backend knows.

    >     So, from both run-time performance and code-size aspects, setting the
    > registers to zero is a better approach. 

    From a security perspective, this isn't clear though.  But that is a lot
    of extra research ;-)

The paper from IEEE provide a clear example on how to use mzero-caller

I think the patch has a solid background and there are multiple projects that highlight the importance of cleaning as technique to prevent security issues in ROP attacks

Regards

Victor Rodriguez



    Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-19 23:27                           ` Qing Zhao
@ 2020-08-24 14:47                             ` Rodriguez Bahena, Victor
  2020-08-24 17:59                               ` Segher Boessenkool
  2020-08-24 17:49                             ` Segher Boessenkool
  1 sibling, 1 reply; 188+ messages in thread
From: Rodriguez Bahena, Victor @ 2020-08-24 14:47 UTC (permalink / raw)
  To: Qing Zhao, Segher Boessenkool
  Cc: Richard Biener, Jeff Law, Uros Bizjak, H. J. Lu, Jakub Jelinek,
	GCC Patches, Kees Cook



From: Qing Zhao <QING.ZHAO@ORACLE.COM>
Date: Wednesday, August 19, 2020 at 6:28 PM
To: Segher Boessenkool <segher@kernel.crashing.org>, "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com>
Cc: Richard Biener <richard.guenther@gmail.com>, Jeff Law <law@redhat.com>, Uros Bizjak <ubizjak@gmail.com>, "H. J. Lu" <hjl.tools@gmail.com>, Jakub Jelinek <jakub@redhat.com>, GCC Patches <gcc-patches@gcc.gnu.org>, Kees Cook <keescook@chromium.org>
Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]




On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org<mailto:segher@kernel.crashing.org>> wrote:

Hi!

On Wed, Aug 19, 2020 at 03:05:36PM -0500, Qing Zhao wrote:

So, cleaning the scratch registers that are used to pass parameters at return instructions should
effectively mitigate ROP attack.

But that is *very* expensive, in general.  Instead of doing just a
return instruction (which effectively costs 0 cycles, and is just one
insn), you now have to zero all call-clobbered register at every return
(typically many returns per function, and you are talking 10+ registers
even if only considering the simple integer registers).

Yes, the run-time overhead and also the code-size overhead are major concerns. We should minimize the overhead
as much as we can during implementation. However, such overhead cannot be completely avoided for the security purpose.

In order to reduce the overhead for the ROP mitigation, I added 3 new values for -fzero-call-used-regs=used-arg-grp|used-arg|arg

For “used-arg-grp”, we only zero the integer registers that are used in the routine and can pass parameters; this should provide ROP mitigation
with the minimum overhead.

For “used-arg”, in addition to “used-arg-grp”, the other registers (for example, FP registers) that can pass parameters will be zeroed. But I am not
very sure whether this option is really needed in practical.

For “arg”, in addition to “used-arg”, all registers that pass parameters will be zeroed. Same as “used-arg”, I am not very sure whether we need this option
Or not.


Numbers on how expensive this is (for what arch, in code size and in
execution time) would be useful.  If it is so expensive that no one will
use it, it helps security at most none at all :-(

CLEAR Linux project has been using a similar patch since GCC 8, the option it used is an equivalent to -fzero-call-used-regs=used-gpr.

-fzero-call-used-regs=used-arg-gpr in this new proposal will have smaller overhead than the one currently being used in CLEAR Linux.

Victor, do you have any data on the overhead of the option that currently is used by CLEAR project?


This is a quick list of packages compiled with similar flag as you mention

https://gist.github.com/bryteise/f3469f318e82c626d20a83f557d897a2

The spec files can be located at:

https://github.com/clearlinux-pkgs

I don’t have any data on the overhead, the patch as you mention was implemented since GCC8 (2018) . The distro has been measure by community since then. I was looking for any major drop detected by community after this patches but I was not able to find it.

Maybe it will be worth to ask in the Clear Linux community project mailing list

Regards

Victor Rodriguez


Q1. Which registers should be set to zeros at the return of the function?
A. the caller-saved, i.e, call-used, or call-clobbered registers.
  For ROP mitigation purpose, only the call-used registers that pass
parameters need to be zeroed.
  For register erasure purpose, all the call-used registers might need to
be zeroed. we can provide multiple levels to user for controling the runtime
overhead.

The call-clobbered regs are the only ones you *can* touch.  That does
not mean you should clear them all (it doesn't help much at all in some
cases).  Only the backend knows.

I think that for ROP mitigation purpose, we only need to clear the call-used (i.e, call-clobbered) registers that are used in the current routine and
can pass parameters.

But for preventing information leak from callee registers, we might need to clear all the call-used registers at return.





   So, from both run-time performance and code-size aspects, setting the
registers to zero is a better approach.

From a security perspective, this isn't clear though.  But that is a lot
of extra research ;-)

There has been quite some discussion on this topic at

https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html

From those old discussion, we can see that zero value should be good enough for the security purpose (though it’s not perfect).

Qing



I saw the same discussion on latest ELC/OSSNA conference this year by LLVM community. The flag is getting a lot of attraction

Regards

Victor Rodriguez



Segher



^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-24 10:50                   ` Richard Biener
@ 2020-08-24 14:48                     ` Qing Zhao
  2020-08-25  5:16                     ` Alexandre Oliva
  1 sibling, 0 replies; 188+ messages in thread
From: Qing Zhao @ 2020-08-24 14:48 UTC (permalink / raw)
  To: Richard Biener
  Cc: Alexandre Oliva, Qing Zhao via Gcc-patches, Uros Bizjak,
	Jakub Jelinek, Kees Cook, Rodriguez Bahena, Victor



> On Aug 24, 2020, at 5:50 AM, Richard Biener <rguenther@suse.de> wrote:
> 
> On Tue, 11 Aug 2020, Qing Zhao wrote:
> 
>> Hi, Alexandre,
>> 
>> CC’ing Richard for his comments on this.
>> 
>> 
>>> On Aug 10, 2020, at 9:39 PM, Alexandre Oliva <oliva@adacore.com> wrote:
>>>> I think that moving how to zeroing the registers part to each target
>>>> will be a better solution since each target has
>>>> Better idea on how to use the most efficient insns to do the work.
>>> 
>>> It's certainly good to allow machine-specific optimized code sequences,
>>> but it would certainly be desirable to have a machine-independent
>>> fallback.  It doesn't seem exceedingly hard to loop over the registers
>>> and emit a (set (reg:M N) (const_int 0)) for each one that is to be
>>> zeroed out.
>> 
>> The current implementation already includes such machine-independent code, it should be very easy to add this.
>> 
>> Richard, what’s your opinion on this?
>> Do we need a machine-independent implementation to zeroing the registers for the default when the target does not provide a optimized
>> Implementation?
> 
> Well, at least silently doing nothing when the option is used would be 
> bad.  So at least a diagnostic would be required.

Yes, this is the current behavior in the current implementation. 
>  Note since the
> option is quite elaborate on what (sub-)set of regs is supposed to be
> cleared I'm not sure an implementation not involving any target hook
> is possible?

Agreed.

Thanks 

Qing
> 
> Richard.
> 
>> Thanks.
>> 
>> Qing
>> 
>>> 
>>> 
>> 
>> 
> 
> -- 
> Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de>>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-19 23:27                           ` Qing Zhao
  2020-08-24 14:47                             ` Rodriguez Bahena, Victor
@ 2020-08-24 17:49                             ` Segher Boessenkool
  2020-08-24 18:02                               ` Qing Zhao
  1 sibling, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-08-24 17:49 UTC (permalink / raw)
  To: Qing Zhao
  Cc: victor Rodriguez Bahena, Richard Biener, Jeff Law, Uros Bizjak,
	H. J. Lu, Jakub Jelinek, GCC Patches, Kees Cook

On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
> > On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > Numbers on how expensive this is (for what arch, in code size and in
> > execution time) would be useful.  If it is so expensive that no one will
> > use it, it helps security at most none at all :-(

Without numbers on this, no one can determine if it is a good tradeoff
for them.  And we (the GCC people) cannot know if it will be useful for
enough users that it will be worth the effort for us.  Which is why I
keep hammering on this point.

(The other side of the coin is how much this helps prevent exploitation;
numbers on that would be good to see, too.)

> >>    So, from both run-time performance and code-size aspects, setting the
> >> registers to zero is a better approach. 
> > 
> > From a security perspective, this isn't clear though.  But that is a lot
> > of extra research ;-)
> 
> There has been quite some discussion on this topic at
> 
> https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html <https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html>
> 
> From those old discussion, we can see that zero value should be good enough for the security purpose (though it’s not perfect).

And there has been zero proof or even any arguments from the security
angle, only "anything other than 0 is too expensive", which isn't
obviously true either (it isn't even cheaper than other small numbers,
on many archs).

A large fraction of function arguments is zero in valid executions, so
zeroing them out to try to prevent exploitation attempts might not help
so much.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-24 14:47                             ` Rodriguez Bahena, Victor
@ 2020-08-24 17:59                               ` Segher Boessenkool
  2020-08-24 18:48                                 ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-08-24 17:59 UTC (permalink / raw)
  To: Rodriguez Bahena, Victor
  Cc: Qing Zhao, Richard Biener, Jeff Law, Uros Bizjak, H. J. Lu,
	Jakub Jelinek, GCC Patches, Kees Cook

[ Please quote correctly.  I fixed this up a bit. ]

On Mon, Aug 24, 2020 at 02:47:22PM +0000, Rodriguez Bahena, Victor wrote:
> > The call-clobbered regs are the only ones you *can* touch.  That does
> > not mean you should clear them all (it doesn't help much at all in some
> > cases).  Only the backend knows.
> 
> I think that for ROP mitigation purpose, we only need to clear the call-used (i.e, call-clobbered) registers that are used in the current routine and
> can pass parameters.

Which is more than you *can* do as well (consider return value registers
for example; there are more cases, in general; only the backend code can
know what is safe to do).


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-24 17:49                             ` Segher Boessenkool
@ 2020-08-24 18:02                               ` Qing Zhao
  2020-08-24 20:20                                 ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-08-24 18:02 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: victor Rodriguez Bahena, Richard Biener, Jeff Law, Uros Bizjak,
	H. J. Lu, Jakub Jelinek, GCC Patches, Kees Cook



> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> Numbers on how expensive this is (for what arch, in code size and in
>>> execution time) would be useful.  If it is so expensive that no one will
>>> use it, it helps security at most none at all :-(
> 
> Without numbers on this, no one can determine if it is a good tradeoff
> for them.  And we (the GCC people) cannot know if it will be useful for
> enough users that it will be worth the effort for us.  Which is why I
> keep hammering on this point.
I can collect some run-time overhead data on this, do you have a recommendation on what test suite I can use
For this testing? (Is CPU2017 good enough)?

> 
> (The other side of the coin is how much this helps prevent exploitation;
> numbers on that would be good to see, too.)

This can be well showed from the paper:

"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"

https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>

Please take a look at this paper. 

> 
>>>>   So, from both run-time performance and code-size aspects, setting the
>>>> registers to zero is a better approach. 
>>> 
>>> From a security perspective, this isn't clear though.  But that is a lot
>>> of extra research ;-)
>> 
>> There has been quite some discussion on this topic at
>> 
>> https://urldefense.com/v3/__https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html__;!!GqivPVa7Brio!PFjWvu3miQeS8XQehbw1moYxXTbbRvu9MTbjQxtxad_YQQGSdZg97Dl8-c2w5Y32$  <https://urldefense.com/v3/__https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html__;!!GqivPVa7Brio!PFjWvu3miQeS8XQehbw1moYxXTbbRvu9MTbjQxtxad_YQQGSdZg97Dl8-c2w5Y32$ >
>> 
>> From those old discussion, we can see that zero value should be good enough for the security purpose (though it’s not perfect).
> 
> And there has been zero proof or even any arguments from the security
> angle, only "anything other than 0 is too expensive", which isn't
> obviously true either (it isn't even cheaper than other small numbers,
> on many archs).
> 
> A large fraction of function arguments is zero in valid executions, so
> zeroing them out to try to prevent exploitation attempts might not help
> so much.

Please take a look at the paper:
"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"

https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>

From the study, zeroing out the registers mitigate the ROP very well.

thanks.

Qing



> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-24 17:59                               ` Segher Boessenkool
@ 2020-08-24 18:48                                 ` Qing Zhao
  2020-08-24 20:26                                   ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-08-24 18:48 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Rodriguez Bahena, Victor, Richard Biener, Jeff Law, Uros Bizjak,
	H. J. Lu, Jakub Jelinek, GCC Patches, Kees Cook



> On Aug 24, 2020, at 12:59 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> [ Please quote correctly.  I fixed this up a bit. ]
> 
> On Mon, Aug 24, 2020 at 02:47:22PM +0000, Rodriguez Bahena, Victor wrote:
>>> The call-clobbered regs are the only ones you *can* touch.  That does
>>> not mean you should clear them all (it doesn't help much at all in some
>>> cases).  Only the backend knows.
>> 
>> I think that for ROP mitigation purpose, we only need to clear the call-used (i.e, call-clobbered) registers that are used in the current routine and
>> can pass parameters.
> 
> Which is more than you *can* do as well (consider return value registers
> for example; there are more cases, in general; only the backend code can
> know what is safe to do).

Yes, So, we agreed to move the code generation implementation part into backend.

In Middle-end, we will only compute the hard register set based on call abi information and data flow information, also handle the command line option.

Qing
> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-24 18:02                               ` Qing Zhao
@ 2020-08-24 20:20                                 ` Segher Boessenkool
  2020-08-24 20:43                                   ` Qing Zhao
  2020-08-25 21:54                                   ` Qing Zhao
  0 siblings, 2 replies; 188+ messages in thread
From: Segher Boessenkool @ 2020-08-24 20:20 UTC (permalink / raw)
  To: Qing Zhao
  Cc: victor Rodriguez Bahena, Richard Biener, Jeff Law, Uros Bizjak,
	H. J. Lu, Jakub Jelinek, GCC Patches, Kees Cook

Hi!

On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
> > On Aug 24, 2020, at 12:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
> >>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >>> Numbers on how expensive this is (for what arch, in code size and in
> >>> execution time) would be useful.  If it is so expensive that no one will
> >>> use it, it helps security at most none at all :-(
> > 
> > Without numbers on this, no one can determine if it is a good tradeoff
> > for them.  And we (the GCC people) cannot know if it will be useful for
> > enough users that it will be worth the effort for us.  Which is why I
> > keep hammering on this point.
> I can collect some run-time overhead data on this, do you have a recommendation on what test suite I can use
> For this testing? (Is CPU2017 good enough)?

I would use something more real-life, not 12 small pieces of code.

> > (The other side of the coin is how much this helps prevent exploitation;
> > numbers on that would be good to see, too.)
> 
> This can be well showed from the paper:
> 
> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"
> 
> https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>
> 
> Please take a look at this paper. 

As I told you before, that isn't open information, I cannot reply to
any of that.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-24 18:48                                 ` Qing Zhao
@ 2020-08-24 20:26                                   ` Segher Boessenkool
  2020-08-24 20:49                                     ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-08-24 20:26 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Rodriguez Bahena, Victor, Richard Biener, Jeff Law, Uros Bizjak,
	H. J. Lu, Jakub Jelinek, GCC Patches, Kees Cook

On Mon, Aug 24, 2020 at 01:48:02PM -0500, Qing Zhao wrote:
> 
> 
> > On Aug 24, 2020, at 12:59 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > 
> > [ Please quote correctly.  I fixed this up a bit. ]
> > 
> > On Mon, Aug 24, 2020 at 02:47:22PM +0000, Rodriguez Bahena, Victor wrote:
> >>> The call-clobbered regs are the only ones you *can* touch.  That does
> >>> not mean you should clear them all (it doesn't help much at all in some
> >>> cases).  Only the backend knows.
> >> 
> >> I think that for ROP mitigation purpose, we only need to clear the call-used (i.e, call-clobbered) registers that are used in the current routine and
> >> can pass parameters.
> > 
> > Which is more than you *can* do as well (consider return value registers
> > for example; there are more cases, in general; only the backend code can
> > know what is safe to do).
> 
> Yes, So, we agreed to move the code generation implementation part into backend.
> 
> In Middle-end, we will only compute the hard register set based on call abi information and data flow information, also handle the command line option.

You cannot in general figure out what registers you can clobber without
asking the backend.  You can figure out some that you *cannot* clobber,
but that isn't very useful.

Do you want to do this before or after the epilogue code is generated?


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-24 20:20                                 ` Segher Boessenkool
@ 2020-08-24 20:43                                   ` Qing Zhao
  2020-08-25  6:41                                     ` Uros Bizjak
  2020-09-04 15:26                                     ` Segher Boessenkool
  2020-08-25 21:54                                   ` Qing Zhao
  1 sibling, 2 replies; 188+ messages in thread
From: Qing Zhao @ 2020-08-24 20:43 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: victor Rodriguez Bahena, Richard Biener, Jeff Law, Uros Bizjak,
	H. J. Lu, Jakub Jelinek, GCC Patches, Kees Cook



> On Aug 24, 2020, at 3:20 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Hi!
> 
> On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
>>> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
>>>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>>> Numbers on how expensive this is (for what arch, in code size and in
>>>>> execution time) would be useful.  If it is so expensive that no one will
>>>>> use it, it helps security at most none at all :-(
>>> 
>>> Without numbers on this, no one can determine if it is a good tradeoff
>>> for them.  And we (the GCC people) cannot know if it will be useful for
>>> enough users that it will be worth the effort for us.  Which is why I
>>> keep hammering on this point.
>> I can collect some run-time overhead data on this, do you have a recommendation on what test suite I can use
>> For this testing? (Is CPU2017 good enough)?
> 
> I would use something more real-life, not 12 small pieces of code.

Then, what kind of real-life benchmark you are suggesting? 

> 
>>> (The other side of the coin is how much this helps prevent exploitation;
>>> numbers on that would be good to see, too.)
>> 
>> This can be well showed from the paper:
>> 
>> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"
>> 
>> https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$  <https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$ >
>> 
>> Please take a look at this paper. 
> 
> As I told you before, that isn't open information, I cannot reply to
> any of that.

A little confused here, what’s you mean by “open information”? Is the information in a published paper not open information?

Qing
> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-24 20:26                                   ` Segher Boessenkool
@ 2020-08-24 20:49                                     ` Qing Zhao
  2020-09-04 15:18                                       ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-08-24 20:49 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Rodriguez Bahena, Victor, Richard Biener, Jeff Law, Uros Bizjak,
	H. J. Lu, Jakub Jelinek, GCC Patches, Kees Cook



> On Aug 24, 2020, at 3:26 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Mon, Aug 24, 2020 at 01:48:02PM -0500, Qing Zhao wrote:
>> 
>> 
>>> On Aug 24, 2020, at 12:59 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> 
>>> [ Please quote correctly.  I fixed this up a bit. ]
>>> 
>>> On Mon, Aug 24, 2020 at 02:47:22PM +0000, Rodriguez Bahena, Victor wrote:
>>>>> The call-clobbered regs are the only ones you *can* touch.  That does
>>>>> not mean you should clear them all (it doesn't help much at all in some
>>>>> cases).  Only the backend knows.
>>>> 
>>>> I think that for ROP mitigation purpose, we only need to clear the call-used (i.e, call-clobbered) registers that are used in the current routine and
>>>> can pass parameters.
>>> 
>>> Which is more than you *can* do as well (consider return value registers
>>> for example; there are more cases, in general; only the backend code can
>>> know what is safe to do).
>> 
>> Yes, So, we agreed to move the code generation implementation part into backend.
>> 
>> In Middle-end, we will only compute the hard register set based on call abi information and data flow information, also handle the command line option.
> 
> You cannot in general figure out what registers you can clobber without
> asking the backend.  You can figure out some that you *cannot* clobber,
> but that isn't very useful.
> 
> Do you want to do this before or after the epilogue code is generated?

static rtx_insn *
make_epilogue_seq (void)
{
  if (!targetm.have_epilogue ())
    return NULL;

  start_sequence ();
  emit_note (NOTE_INSN_EPILOGUE_BEG);

 +++++ gen_call_used_regs_seq ();                     // this is the place to emit the zeroing insn sequence

  rtx_insn *seq = targetm.gen_epilogue ();
…
}

Any comment on this?

thanks.

Qing




> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-24 10:50                   ` Richard Biener
  2020-08-24 14:48                     ` Qing Zhao
@ 2020-08-25  5:16                     ` Alexandre Oliva
  2020-08-25 14:19                       ` Jeff Law
  1 sibling, 1 reply; 188+ messages in thread
From: Alexandre Oliva @ 2020-08-25  5:16 UTC (permalink / raw)
  To: Richard Biener
  Cc: Qing Zhao, Qing Zhao via Gcc-patches, Uros Bizjak, Jakub Jelinek,
	Kees Cook, Rodriguez Bahena, Victor

On Aug 24, 2020, Richard Biener <rguenther@suse.de> wrote:

> since the option is quite elaborate on what (sub-)set of regs is
> supposed to be cleared I'm not sure an implementation not involving
> any target hook is possible?

I don't think this follows.  Machine-independent code has a pretty good
notion of what registers are call-saved or call-clobbered, which ones
could be changed in this regard for function-specific calling
conventions, which ones may be used by a function to hold its return
value, which ones are used within a function...

It *should* be possible to introduce this in machine-independent code,
emitting insns to set registers to zero and regarding them as holding
values to be returned from the function.  Machine-specific code could
use more efficient insns to get the same result, but I can't see good
reason to not have a generic fallback implementation with at least a
best-effort attempt to offer the desired feature.


Now, this is for the regular return path.  Is zeroing registers in
exception-propagation paths not relevant?

I thought it is, and I think we could have generic code that identifies
the registers that ought to be zeroed, issues CFI notes to get them
zeroed in the exception path, and requests a target hook to emit the
insns to zero them in the regular return path.

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-24 20:43                                   ` Qing Zhao
@ 2020-08-25  6:41                                     ` Uros Bizjak
  2020-08-25 14:05                                       ` Qing Zhao
  2020-09-04 15:26                                     ` Segher Boessenkool
  1 sibling, 1 reply; 188+ messages in thread
From: Uros Bizjak @ 2020-08-25  6:41 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Segher Boessenkool, victor Rodriguez Bahena, Richard Biener,
	Jeff Law, H. J. Lu, Jakub Jelinek, GCC Patches, Kees Cook

On Mon, Aug 24, 2020 at 10:43 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
> > On Aug 24, 2020, at 3:20 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >
> > Hi!
> >
> > On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
> >>> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >>> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
> >>>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >>>>> Numbers on how expensive this is (for what arch, in code size and in
> >>>>> execution time) would be useful.  If it is so expensive that no one will
> >>>>> use it, it helps security at most none at all :-(
> >>>
> >>> Without numbers on this, no one can determine if it is a good tradeoff
> >>> for them.  And we (the GCC people) cannot know if it will be useful for
> >>> enough users that it will be worth the effort for us.  Which is why I
> >>> keep hammering on this point.
> >> I can collect some run-time overhead data on this, do you have a recommendation on what test suite I can use
> >> For this testing? (Is CPU2017 good enough)?
> >
> > I would use something more real-life, not 12 small pieces of code.
>
> Then, what kind of real-life benchmark you are suggesting?
>
> >
> >>> (The other side of the coin is how much this helps prevent exploitation;
> >>> numbers on that would be good to see, too.)
> >>
> >> This can be well showed from the paper:
> >>
> >> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"
> >>
> >> https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$  <https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$ >
> >>
> >> Please take a look at this paper.
> >
> > As I told you before, that isn't open information, I cannot reply to
> > any of that.
>
> A little confused here, what’s you mean by “open information”? Is the information in a published paper not open information?

No, because it is behind a paywall.

Uros.

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-25  6:41                                     ` Uros Bizjak
@ 2020-08-25 14:05                                       ` Qing Zhao
  2020-08-25 22:31                                         ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-08-25 14:05 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Segher Boessenkool, victor Rodriguez Bahena, Richard Biener,
	Jeff Law, H. J. Lu, Jakub Jelinek, GCC Patches, Kees Cook



> On Aug 25, 2020, at 1:41 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
>>> 
>>>>> (The other side of the coin is how much this helps prevent exploitation;
>>>>> numbers on that would be good to see, too.)
>>>> 
>>>> This can be well showed from the paper:
>>>> 
>>>> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"
>>>> 
>>>> https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$ <https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$ >
>>>> 
>>>> Please take a look at this paper.
>>> 
>>> As I told you before, that isn't open information, I cannot reply to
>>> any of that.
>> 
>> A little confused here, what’s you mean by “open information”? Is the information in a published paper not open information?
> 
> No, because it is behind a paywall.

Still don’t understand here:  this paper has been published in the proceeding of “ 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)”.
If you want to read the complete version online, you need to pay for it.

However, it’s still a published paper, and the information inside it should be “open information”. 

So, what’s the definition of “open information” you have?

I downloaded a PDF copy of this paper through my company’s paid account.  But I am not sure whether it’s legal for me to attach it to this mailing list?

Qing


> 
> Uros.


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-25  5:16                     ` Alexandre Oliva
@ 2020-08-25 14:19                       ` Jeff Law
  2020-08-26 12:02                         ` Alexandre Oliva
  0 siblings, 1 reply; 188+ messages in thread
From: Jeff Law @ 2020-08-25 14:19 UTC (permalink / raw)
  To: Alexandre Oliva, Richard Biener
  Cc: Jakub Jelinek, Kees Cook, Uros Bizjak, Rodriguez Bahena, Victor,
	Qing Zhao via Gcc-patches

On Tue, 2020-08-25 at 02:16 -0300, Alexandre Oliva wrote:
> On Aug 24, 2020, Richard Biener <rguenther@suse.de> wrote:
> 
> > since the option is quite elaborate on what (sub-)set of regs is
> > supposed to be cleared I'm not sure an implementation not involving
> > any target hook is possible?
> 
> I don't think this follows.  Machine-independent code has a pretty good
> notion of what registers are call-saved or call-clobbered, which ones
> could be changed in this regard for function-specific calling
> conventions, which ones may be used by a function to hold its return
> value, which ones are used within a function...
> 
> It *should* be possible to introduce this in machine-independent code,
> emitting insns to set registers to zero and regarding them as holding
> values to be returned from the function.  Machine-specific code could
> use more efficient insns to get the same result, but I can't see good
> reason to not have a generic fallback implementation with at least a
> best-effort attempt to offer the desired feature.
I think part of the problem here is you have to worry about stubs which can
change caller-saved registers.  Return path stubs aren't particularly common, but
they do exist -- 32 bit hpux for example :(

Jeff


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-24 20:20                                 ` Segher Boessenkool
  2020-08-24 20:43                                   ` Qing Zhao
@ 2020-08-25 21:54                                   ` Qing Zhao
  2020-09-03 14:29                                     ` Qing Zhao
  1 sibling, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-08-25 21:54 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: victor Rodriguez Bahena, Richard Biener, Jeff Law, Uros Bizjak,
	H. J. Lu, Jakub Jelinek, GCC Patches, Kees Cook



> On Aug 24, 2020, at 3:20 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Hi!
> 
> On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
>>> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
>>>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>>> Numbers on how expensive this is (for what arch, in code size and in
>>>>> execution time) would be useful.  If it is so expensive that no one will
>>>>> use it, it helps security at most none at all :-(
>>> 
>>> Without numbers on this, no one can determine if it is a good tradeoff
>>> for them.  And we (the GCC people) cannot know if it will be useful for
>>> enough users that it will be worth the effort for us.  Which is why I
>>> keep hammering on this point.
>> I can collect some run-time overhead data on this, do you have a recommendation on what test suite I can use
>> For this testing? (Is CPU2017 good enough)?
> 
> I would use something more real-life, not 12 small pieces of code.

There is some basic information about the benchmarks of CPU2017 in below link:

https://www.spec.org/cpu2017/Docs/overview.html#suites <https://www.spec.org/cpu2017/Docs/overview.html#suites>

GCC itself is one of the benchmarks in CPU2017 (502.gcc_r). And 526.blender_r is even larger than 502.gcc_r. 
And there are several other quite big benchmarks as well (perlbench, xalancbmk, parest, imagick, etc).

thanks.

Qing

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-25 14:05                                       ` Qing Zhao
@ 2020-08-25 22:31                                         ` Qing Zhao
  0 siblings, 0 replies; 188+ messages in thread
From: Qing Zhao @ 2020-08-25 22:31 UTC (permalink / raw)
  To: Uros Bizjak, Segher Boessenkool
  Cc: Jakub Jelinek, Kees Cook, Segher Boessenkool,
	victor Rodriguez Bahena, GCC Patches



> On Aug 25, 2020, at 9:05 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> 
> 
>> On Aug 25, 2020, at 1:41 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> 
>>>> 
>>>>>> (The other side of the coin is how much this helps prevent exploitation;
>>>>>> numbers on that would be good to see, too.)
>>>>> 
>>>>> This can be well showed from the paper:
>>>>> 
>>>>> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"
>>>>> 
>>>>> https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$ <https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$ >
>>>>> 
>>>>> Please take a look at this paper.
>>>> 
>>>> As I told you before, that isn't open information, I cannot reply to
>>>> any of that.
>>> 
>>> A little confused here, what’s you mean by “open information”? Is the information in a published paper not open information?
>> 
>> No, because it is behind a paywall.
> 
> Still don’t understand here:  this paper has been published in the proceeding of “ 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)”.
> If you want to read the complete version online, you need to pay for it.
> 
> However, it’s still a published paper, and the information inside it should be “open information”. 
> 
> So, what’s the definition of “open information” you have?
> 
> I downloaded a PDF copy of this paper through my company’s paid account.  But I am not sure whether it’s legal for me to attach it to this mailing list?

After consulting, it turned out that I was not allowed to further forward the copy I downloaded through my company’s account to this alias. 
There is some more information on this paper online though:

https://www.semanticscholar.org/paper/Clean-the-Scratch-Registers:-A-Way-to-Mitigate-Rong-Xie/6f2ce4fd31baa0f6c02f9eb5c57b90d39fe5fa13

All the figures and tables in this paper are available in this link. 

In which, Figure 1 is an illustration  of a typical ROP attack, please pay special attention on the “Gadgets”, which are carefully chosen machine instruction sequences that are already present in the machine's memory, Each gadget typically ends in a return instruction and is located in a subroutine within the existing program and/or shared library code. Chained together, these gadgets allow an attacker to perform arbitrary operations on a machine employing defenses that thwart simpler attacks.

The paper identified the important features of ROP attack as following:

"First, the destination of using gadget chains in usual is performing system call or system fucntion to perform malicious behaviour such as file access, network access and W ⊕ X disable. In most cases, the adversary would like to disable W ⊕ X. Because once W ⊕ X has been disabled, shellcode can be executed directly instead of rewritting shellcode to ROP chains which may cause some troubles for the adversary. 

Second, if the adversary performs ROP attacks using system call instruction, no matter on x86 or x64 architecture, the register would be used to pass parameter. Or if the adversary performs ROP attacks using system function such as “read” or “mprotect”, on x64 system, the register would still be used to pass parameters, as mentioned in subsection B and C.”
As a result, the paper proposed the idea to zeroing scratch registers that pass parameters at the “return” insns to mitigate the ROP attack. 

Table III, Table IV and Table V are the results of “zeroing scratch register mitigate ROP attack”. From the tables, zeroing scratch registers can successfully mitigate the ROP on all those benchmarks. 

Table VI is the performance overhead of their implementation, it looks like very high, average 16.2X runtime overhead.  However, this implementation is not use compiler to statically generate zeroing sequence, instead, it used "dynamic binary instrumentation at runtime “ to check every instruction to 
1. Set/unset flags to check which scratch registers are used in the routine;
2. Whether the instruction is return instruction or not, if it’s is return, insert the zeroing used scratch register sequence before the “return” insn. 

Due to the above run-time dynamic instrumentation method, the high runtime overhead is expecting, I think.

If we use GCC to statically check the “used” information and add zeroing sequence before return insn, the run-time overhead will be much smaller. 

I will provide run-time overhead information with the 2nd version of the patch by using CPU2017 applications.

thanks.

Qing


> Qing
> 
> 
>> 
>> Uros.


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-25 14:19                       ` Jeff Law
@ 2020-08-26 12:02                         ` Alexandre Oliva
  2020-08-26 17:58                           ` Qing Zhao
  2020-08-26 18:36                           ` Jeff Law
  0 siblings, 2 replies; 188+ messages in thread
From: Alexandre Oliva @ 2020-08-26 12:02 UTC (permalink / raw)
  To: Jeff Law
  Cc: Richard Biener, Jakub Jelinek, Kees Cook, Uros Bizjak,
	Rodriguez Bahena, Victor, Qing Zhao via Gcc-patches

On Aug 25, 2020, Jeff Law <law@redhat.com> wrote:

> On Tue, 2020-08-25 at 02:16 -0300, Alexandre Oliva wrote:
>> On Aug 24, 2020, Richard Biener <rguenther@suse.de> wrote:
>> 
>> > since the option is quite elaborate on what (sub-)set of regs is
>> > supposed to be cleared I'm not sure an implementation not involving
>> > any target hook is possible?
>> 
>> I don't think this follows.  Machine-independent code has a pretty good
>> notion of what registers are call-saved or call-clobbered, which ones
>> could be changed in this regard for function-specific calling
>> conventions, which ones may be used by a function to hold its return
>> value, which ones are used within a function...
>> 
>> It *should* be possible to introduce this in machine-independent code,
>> emitting insns to set registers to zero and regarding them as holding
>> values to be returned from the function.  Machine-specific code could
>> use more efficient insns to get the same result, but I can't see good
>> reason to not have a generic fallback implementation with at least a
>> best-effort attempt to offer the desired feature.
> I think part of the problem here is you have to worry about stubs which can
> change caller-saved registers.  Return path stubs aren't particularly common, but
> they do exist -- 32 bit hpux for example :(

This suggests that such targets might have to implement the
target-specific hook to deal with this, but does it detract in any way
from the notion of having generic code to fall back to on targets that
do NOT require any special handling?

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-26 12:02                         ` Alexandre Oliva
@ 2020-08-26 17:58                           ` Qing Zhao
  2020-08-28  7:47                             ` Alexandre Oliva
  2020-08-26 18:36                           ` Jeff Law
  1 sibling, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-08-26 17:58 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Jeff Law, Jakub Jelinek, Richard Biener, Kees Cook, Uros Bizjak,
	Rodriguez Bahena, Victor, Qing Zhao via Gcc-patches



> On Aug 26, 2020, at 7:02 AM, Alexandre Oliva <oliva@adacore.com> wrote:
> 
> On Aug 25, 2020, Jeff Law <law@redhat.com <mailto:law@redhat.com>> wrote:
> 
>> On Tue, 2020-08-25 at 02:16 -0300, Alexandre Oliva wrote:
>>> On Aug 24, 2020, Richard Biener <rguenther@suse.de> wrote:
>>> 
>>>> since the option is quite elaborate on what (sub-)set of regs is
>>>> supposed to be cleared I'm not sure an implementation not involving
>>>> any target hook is possible?
>>> 
>>> I don't think this follows.  Machine-independent code has a pretty good
>>> notion of what registers are call-saved or call-clobbered, which ones
>>> could be changed in this regard for function-specific calling
>>> conventions, which ones may be used by a function to hold its return
>>> value, which ones are used within a function...
>>> 
>>> It *should* be possible to introduce this in machine-independent code,
>>> emitting insns to set registers to zero and regarding them as holding
>>> values to be returned from the function.  Machine-specific code could
>>> use more efficient insns to get the same result, but I can't see good
>>> reason to not have a generic fallback implementation with at least a
>>> best-effort attempt to offer the desired feature.
>> I think part of the problem here is you have to worry about stubs which can
>> change caller-saved registers.  Return path stubs aren't particularly common, but
>> they do exist -- 32 bit hpux for example :(
> 
> This suggests that such targets might have to implement the
> target-specific hook to deal with this, but does it detract in any way
> from the notion of having generic code to fall back to on targets that
> do NOT require any special handling?

There are two issues I can see with adding a default generator in middle end:

1. In order to determine where a target should not use the generic code to emit the zeroing sequence, 
a new target hook to determine this has to be added;

2. In order to avoid the generated zeroing insns (which are simply insns that set registers) being deleted, 
We have to define a new insn “pro_epilogue_use” in the target. 
So, any target that want to use the default generator in middle end, must provide such a new target hook.

Based on the above 2, I don’t think that adding the default generator in middle end is a good idea.

Qing

> 
> -- 
> Alexandre Oliva, happy hacker
> https://urldefense.com/v3/__https://FSFLA.org/blogs/lxo/__;!!GqivPVa7Brio!Pee3_l4yYpNOUbnymMqrEM68oDGk-2Q3zebqLnQ255SX5go78t8Sq1RmM72wJP3a$ <https://urldefense.com/v3/__https://FSFLA.org/blogs/lxo/__;!!GqivPVa7Brio!Pee3_l4yYpNOUbnymMqrEM68oDGk-2Q3zebqLnQ255SX5go78t8Sq1RmM72wJP3a$> 
> Free Software Activist
> GNU Toolchain Engineer


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-26 12:02                         ` Alexandre Oliva
  2020-08-26 17:58                           ` Qing Zhao
@ 2020-08-26 18:36                           ` Jeff Law
  1 sibling, 0 replies; 188+ messages in thread
From: Jeff Law @ 2020-08-26 18:36 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Richard Biener, Jakub Jelinek, Kees Cook, Uros Bizjak,
	Rodriguez Bahena, Victor, Qing Zhao via Gcc-patches

On Wed, 2020-08-26 at 09:02 -0300, Alexandre Oliva wrote:
> On Aug 25, 2020, Jeff Law <law@redhat.com> wrote:
> 
> > On Tue, 2020-08-25 at 02:16 -0300, Alexandre Oliva wrote:
> > > On Aug 24, 2020, Richard Biener <rguenther@suse.de> wrote:
> > > 
> > > > since the option is quite elaborate on what (sub-)set of regs is
> > > > supposed to be cleared I'm not sure an implementation not involving
> > > > any target hook is possible?
> > > 
> > > I don't think this follows.  Machine-independent code has a pretty good
> > > notion of what registers are call-saved or call-clobbered, which ones
> > > could be changed in this regard for function-specific calling
> > > conventions, which ones may be used by a function to hold its return
> > > value, which ones are used within a function...
> > > 
> > > It *should* be possible to introduce this in machine-independent code,
> > > emitting insns to set registers to zero and regarding them as holding
> > > values to be returned from the function.  Machine-specific code could
> > > use more efficient insns to get the same result, but I can't see good
> > > reason to not have a generic fallback implementation with at least a
> > > best-effort attempt to offer the desired feature.
> > I think part of the problem here is you have to worry about stubs which can
> > change caller-saved registers.  Return path stubs aren't particularly common, but
> > they do exist -- 32 bit hpux for example :(
> 
> This suggests that such targets might have to implement the
> target-specific hook to deal with this, but does it detract in any way
> from the notion of having generic code to fall back to on targets that
> do NOT require any special handling?
Agreed.  Sorry if I wasn't clear that generic code + a hook should be sufficient.

jeff


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-26 17:58                           ` Qing Zhao
@ 2020-08-28  7:47                             ` Alexandre Oliva
  2020-08-28 15:21                               ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Alexandre Oliva @ 2020-08-28  7:47 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Jeff Law, Jakub Jelinek, Richard Biener, Kees Cook, Uros Bizjak,
	Rodriguez Bahena, Victor, Qing Zhao via Gcc-patches

On Aug 26, 2020, Qing Zhao <qing.zhao@oracle.com> wrote:

> There are two issues I can see with adding a default generator in middle end:

> 1. In order to determine where a target should not use the generic
> code to emit the zeroing sequence,
> a new target hook to determine this has to be added;

Yeah, a target hook whose default is the generic code, and that targets
that need it, or that benefit from it, can override.  That's the point
of hooks, to enable overriding.  Why should this be an issue?

> 2. In order to avoid the generated zeroing insns (which are simply
> insns that set registers) being deleted,
> We have to define a new insn “pro_epilogue_use” in the target. 

Why won't a naked USE pattern do?  We already issue those in generic
code, for regs holding return values.  If we were to pretend that other
registers are also holding zeros as values to be returned, why shouldn't
that work for them as well?

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-28  7:47                             ` Alexandre Oliva
@ 2020-08-28 15:21                               ` Qing Zhao
  2020-08-28 15:33                                 ` H.J. Lu
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-08-28 15:21 UTC (permalink / raw)
  To: Alexandre Oliva, H.J. Lu
  Cc: Jeff Law, Jakub Jelinek, Richard Biener, Kees Cook, Uros Bizjak,
	Rodriguez Bahena, Victor, Qing Zhao via Gcc-patches



> On Aug 28, 2020, at 2:47 AM, Alexandre Oliva <oliva@adacore.com> wrote:
> 
> On Aug 26, 2020, Qing Zhao <qing.zhao@oracle.com> wrote:
> 
>> There are two issues I can see with adding a default generator in middle end:
> 
>> 1. In order to determine where a target should not use the generic
>> code to emit the zeroing sequence,
>> a new target hook to determine this has to be added;
> 
> Yeah, a target hook whose default is the generic code, and that targets
> that need it, or that benefit from it, can override.  That's the point
> of hooks, to enable overriding.  Why should this be an issue?

A default handler will be invoked for all the targets. So, if the target does not provide any 
target-specific handler to override it. The default handler should be correct on this target. 

So, the default handler should be correct on all the targets assuming no override happening. 

Correct me if I am wrong with the above understanding.

Then, for example, for the 32 bit hpux, is a default handler without any special target handling 
correct on it? My understanding from the previous discussion is, we need some special handling 
On 32 bit hpux to make it correct, So, in order to make the default handler correct on 32 bit hpux,
We need to add another target hook, for example, targetm.has_return_stubs() to check whether
A target has such feature, then in the default handler, we can call this new target hook to check and
Then make sure the default handler is correct on 32 bit hpux. 

There might be other targets that might need other special handlings which we currently don’t know
Yet. Do we need to identify all those targets and all those special features, and then add new 
Target hook for each of the identified special feature?

Yes, theoretically, it’s doable to run testing on all the targets and to see which targets need special
Handling and what kind of special handling we need, however, is doing this really necessary?


> 
>> 2. In order to avoid the generated zeroing insns (which are simply
>> insns that set registers) being deleted,
>> We have to define a new insn “pro_epilogue_use” in the target. 
> 
> Why won't a naked USE pattern do?  We already issue those in generic
> code, for regs holding return values.  If we were to pretend that other
> registers are also holding zeros as values to be returned, why shouldn't
> that work for them as well?

From the current implementation based on X86, I see the following comments:

;; As USE insns aren't meaningful after reload, this is used instead
;; to prevent deleting instructions setting registers for PIC code
(define_insn "pro_epilogue_use"
  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]

My understanding is, the “USE” will not be useful after reload. So a new “pro_eplogue_use” should
be added.

HongJiu, could you please provide more information on this?

Thanks.

Qing

> 
> -- 
> Alexandre Oliva, happy hacker
> https://urldefense.com/v3/__https://FSFLA.org/blogs/lxo/__;!!GqivPVa7Brio!NzNvCeA4fLoYPOD4RHTzKJd3QtgXG8bY2zXVcztQohMQRn5yROpYDp9CRbjjtcRV$ 
> Free Software Activist
> GNU Toolchain Engineer


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-28 15:21                               ` Qing Zhao
@ 2020-08-28 15:33                                 ` H.J. Lu
  0 siblings, 0 replies; 188+ messages in thread
From: H.J. Lu @ 2020-08-28 15:33 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Alexandre Oliva, Jeff Law, Jakub Jelinek, Richard Biener,
	Kees Cook, Uros Bizjak, Rodriguez Bahena, Victor,
	Qing Zhao via Gcc-patches

On Fri, Aug 28, 2020 at 8:22 AM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
> > On Aug 28, 2020, at 2:47 AM, Alexandre Oliva <oliva@adacore.com> wrote:
> >
> > On Aug 26, 2020, Qing Zhao <qing.zhao@oracle.com> wrote:
> >
> >> There are two issues I can see with adding a default generator in middle end:
> >
> >> 1. In order to determine where a target should not use the generic
> >> code to emit the zeroing sequence,
> >> a new target hook to determine this has to be added;
> >
> > Yeah, a target hook whose default is the generic code, and that targets
> > that need it, or that benefit from it, can override.  That's the point
> > of hooks, to enable overriding.  Why should this be an issue?
>
> A default handler will be invoked for all the targets. So, if the target does not provide any
> target-specific handler to override it. The default handler should be correct on this target.
>
> So, the default handler should be correct on all the targets assuming no override happening.
>
> Correct me if I am wrong with the above understanding.
>
> Then, for example, for the 32 bit hpux, is a default handler without any special target handling
> correct on it? My understanding from the previous discussion is, we need some special handling
> On 32 bit hpux to make it correct, So, in order to make the default handler correct on 32 bit hpux,
> We need to add another target hook, for example, targetm.has_return_stubs() to check whether
> A target has such feature, then in the default handler, we can call this new target hook to check and
> Then make sure the default handler is correct on 32 bit hpux.
>
> There might be other targets that might need other special handlings which we currently don’t know
> Yet. Do we need to identify all those targets and all those special features, and then add new
> Target hook for each of the identified special feature?
>
> Yes, theoretically, it’s doable to run testing on all the targets and to see which targets need special
> Handling and what kind of special handling we need, however, is doing this really necessary?
>
>
> >
> >> 2. In order to avoid the generated zeroing insns (which are simply
> >> insns that set registers) being deleted,
> >> We have to define a new insn “pro_epilogue_use” in the target.
> >
> > Why won't a naked USE pattern do?  We already issue those in generic
> > code, for regs holding return values.  If we were to pretend that other
> > registers are also holding zeros as values to be returned, why shouldn't
> > that work for them as well?
>
> From the current implementation based on X86, I see the following comments:
>
> ;; As USE insns aren't meaningful after reload, this is used instead
> ;; to prevent deleting instructions setting registers for PIC code
> (define_insn "pro_epilogue_use"
>   [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
>
> My understanding is, the “USE” will not be useful after reload. So a new “pro_eplogue_use” should
> be added.
>
> HongJiu, could you please provide more information on this?

pro_epilogue_use is needed.  Otherwise, these zeroing instructions
will be removed after reload.

-- 
H.J.

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-25 21:54                                   ` Qing Zhao
@ 2020-09-03 14:29                                     ` Qing Zhao
  2020-09-03 15:08                                       ` Qing Zhao
  2020-09-03 17:13                                       ` Kees Cook
  0 siblings, 2 replies; 188+ messages in thread
From: Qing Zhao @ 2020-09-03 14:29 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Jakub Jelinek, Kees Cook, Uros Bizjak, victor Rodriguez Bahena,
	GCC Patches

Hi,

Per request, I collected runtime performance data and code size data with CPU2017 on a X86 platform. 

*** Machine info:
model name>-----: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
$ lscpu | grep NUMA
NUMA node(s):          2
NUMA node0 CPU(s):     0-21,44-65
NUMA node1 CPU(s):     22-43,66-87

***CPU2017 benchmarks: 
all the benchmarks with C/C++, 9 Integer benchmarks, 10 FP benchmarks. 

***Configures:
Intrate and fprate, 22 copies. 

***Compiler options:
no : 				-g -O2 -march=native
used_gpr_arg:  	no + -fzero-call-used-regs=used-gpr-arg
used_arg:  	 	no + -fzero-call-used-regs=used-arg
all_arg:			no + -fzero-call-used-regs=all-arg
used_gpr:		no + -fzero-call-used-regs=used-gpr
all_gpr:			no + -fzero-call-used-regs=all-gpr
used:			no + -fzero-call-used-regs=used
all:				no + -fzero-call-used-regs=all

***each benchmark runs 3 times. 

***runtime performance data:
Please see the attached csv file


From the data, we can see that:
On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
Looks like the overhead of zeroing vector registers is much bigger. 

For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.

***code size increase data:

Please see the attached file 


From the data, we can see that:
The code size impact in general is very small, the biggest is “all_arg”, which is 1.06% for integer benchmark, and 1.13% for FP benchmarks.

So, from the data collected, I think that the run-time overhead and code size increase from this option are very reasonable. 

Let me know you comments and opinions.

thanks.

Qing

> On Aug 25, 2020, at 4:54 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> 
> 
>> On Aug 24, 2020, at 3:20 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> 
>> Hi!
>> 
>> On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
>>>> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
>>>>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>>>> Numbers on how expensive this is (for what arch, in code size and in
>>>>>> execution time) would be useful.  If it is so expensive that no one will
>>>>>> use it, it helps security at most none at all :-(
>>>> 
>>>> Without numbers on this, no one can determine if it is a good tradeoff
>>>> for them.  And we (the GCC people) cannot know if it will be useful for
>>>> enough users that it will be worth the effort for us.  Which is why I
>>>> keep hammering on this point.
>>> I can collect some run-time overhead data on this, do you have a recommendation on what test suite I can use
>>> For this testing? (Is CPU2017 good enough)?
>> 
>> I would use something more real-life, not 12 small pieces of code.
> 
> There is some basic information about the benchmarks of CPU2017 in below link:
> 
> https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$ <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$> <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$ <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$> >
> 
> GCC itself is one of the benchmarks in CPU2017 (502.gcc_r). And 526.blender_r is even larger than 502.gcc_r. 
> And there are several other quite big benchmarks as well (perlbench, xalancbmk, parest, imagick, etc).
> 
> thanks.
> 
> Qing


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-03 14:29                                     ` Qing Zhao
@ 2020-09-03 15:08                                       ` Qing Zhao
  2020-09-03 16:19                                         ` Qing Zhao
  2020-09-03 17:13                                       ` Kees Cook
  1 sibling, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-03 15:08 UTC (permalink / raw)
  To: Segher Boessenkool, Qing Zhao via Gcc-patches
  Cc: Jakub Jelinek, Uros Bizjak, Kees Cook, victor Rodriguez Bahena


Hi,

Looks like both attached .csv files were deleted during the email delivery procedure. Not sure what’s the reason for this.

Then I have to copy the text file here for you reference:

****benchmarks:
C       500.perlbench_r  
C       502.gcc_r     
C       505.mcf_r       
C++     520.omnetpp_r    
C++     523.xalancbmk_r  
C       525.x264_r        
C++     531.deepsjeng_r    
C++     541.leela_r        
C       557.xz_r       
                      

C++/C/Fortran   507.cactuBSSN_r      
C++     508.namd_r    
C++     510.parest_r     
C++/C   511.povray_r   
C       519.lbm_r     
Fortran/C       521.wrf_r 
C++/C   526.blender_r   
Fortran/C       527.cam4_r  
C       538.imagick_r  
C       544.nab_r    

***runtime overhead data and code size overhead data, I converted then to PDF files, hopefully this time I can attach it with the email:

thanks.

Qing






> On Sep 3, 2020, at 9:29 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> Hi,
> 
> Per request, I collected runtime performance data and code size data with CPU2017 on a X86 platform. 
> 
> *** Machine info:
> model name>-----: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> $ lscpu | grep NUMA
> NUMA node(s):          2
> NUMA node0 CPU(s):     0-21,44-65
> NUMA node1 CPU(s):     22-43,66-87
> 
> ***CPU2017 benchmarks: 
> all the benchmarks with C/C++, 9 Integer benchmarks, 10 FP benchmarks. 
> 
> ***Configures:
> Intrate and fprate, 22 copies. 
> 
> ***Compiler options:
> no : 				-g -O2 -march=native
> used_gpr_arg:  	no + -fzero-call-used-regs=used-gpr-arg
> used_arg:  	 	no + -fzero-call-used-regs=used-arg
> all_arg:			no + -fzero-call-used-regs=all-arg
> used_gpr:		no + -fzero-call-used-regs=used-gpr
> all_gpr:			no + -fzero-call-used-regs=all-gpr
> used:			no + -fzero-call-used-regs=used
> all:				no + -fzero-call-used-regs=all
> 
> ***each benchmark runs 3 times. 
> 
> ***runtime performance data:
> Please see the attached csv file
> 
> 
> From the data, we can see that:
> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
> Looks like the overhead of zeroing vector registers is much bigger. 
> 
> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
> 
> ***code size increase data:
> 
> Please see the attached file 
> 
> 
> From the data, we can see that:
> The code size impact in general is very small, the biggest is “all_arg”, which is 1.06% for integer benchmark, and 1.13% for FP benchmarks.
> 
> So, from the data collected, I think that the run-time overhead and code size increase from this option are very reasonable. 
> 
> Let me know you comments and opinions.
> 
> thanks.
> 
> Qing
> 
>> On Aug 25, 2020, at 4:54 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>> 
>> 
>> 
>>> On Aug 24, 2020, at 3:20 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> 
>>> Hi!
>>> 
>>> On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
>>>>> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>>> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
>>>>>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>>>>> Numbers on how expensive this is (for what arch, in code size and in
>>>>>>> execution time) would be useful.  If it is so expensive that no one will
>>>>>>> use it, it helps security at most none at all :-(
>>>>> 
>>>>> Without numbers on this, no one can determine if it is a good tradeoff
>>>>> for them.  And we (the GCC people) cannot know if it will be useful for
>>>>> enough users that it will be worth the effort for us.  Which is why I
>>>>> keep hammering on this point.
>>>> I can collect some run-time overhead data on this, do you have a recommendation on what test suite I can use
>>>> For this testing? (Is CPU2017 good enough)?
>>> 
>>> I would use something more real-life, not 12 small pieces of code.
>> 
>> There is some basic information about the benchmarks of CPU2017 in below link:
>> 
>> https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$<https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$> <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$<https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$> >
>> 
>> GCC itself is one of the benchmarks in CPU2017 (502.gcc_r). And 526.blender_r is even larger than 502.gcc_r. 
>> And there are several other quite big benchmarks as well (perlbench, xalancbmk, parest, imagick, etc).
>> 
>> thanks.
>> 
>> Qing


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-03 15:08                                       ` Qing Zhao
@ 2020-09-03 16:19                                         ` Qing Zhao
  0 siblings, 0 replies; 188+ messages in thread
From: Qing Zhao @ 2020-09-03 16:19 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Qing Zhao via Gcc-patches, Jakub Jelinek, Uros Bizjak, Kees Cook,
	victor Rodriguez Bahena

Looks like that the PDF attachments do not work with this alias either. 
H.J. LU helped me to upload the performance data and code size data to the following wiki page:

https://gitlab.com/x86-gcc/gcc/-/wikis/Zero-call-used-registers-data

Please refer to this link for the data.

thanks.

Qing

> On Sep 3, 2020, at 10:08 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> 
> Hi,
> 
> Looks like both attached .csv files were deleted during the email delivery procedure. Not sure what’s the reason for this.
> 
> Then I have to copy the text file here for you reference:
> 
> ****benchmarks:
> C       500.perlbench_r  
> C       502.gcc_r     
> C       505.mcf_r       
> C++     520.omnetpp_r    
> C++     523.xalancbmk_r  
> C       525.x264_r        
> C++     531.deepsjeng_r    
> C++     541.leela_r        
> C       557.xz_r       
> 
> 
> C++/C/Fortran   507.cactuBSSN_r      
> C++     508.namd_r    
> C++     510.parest_r     
> C++/C   511.povray_r   
> C       519.lbm_r     
> Fortran/C       521.wrf_r 
> C++/C   526.blender_r   
> Fortran/C       527.cam4_r  
> C       538.imagick_r  
> C       544.nab_r    
> 
> ***runtime overhead data and code size overhead data, I converted then to PDF files, hopefully this time I can attach it with the email:
> 
> thanks.
> 
> Qing
> 
> 
> 
> 
> 
> 
>> On Sep 3, 2020, at 9:29 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>> 
>> Hi,
>> 
>> Per request, I collected runtime performance data and code size data with CPU2017 on a X86 platform. 
>> 
>> *** Machine info:
>> model name>-----: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>> $ lscpu | grep NUMA
>> NUMA node(s):          2
>> NUMA node0 CPU(s):     0-21,44-65
>> NUMA node1 CPU(s):     22-43,66-87
>> 
>> ***CPU2017 benchmarks: 
>> all the benchmarks with C/C++, 9 Integer benchmarks, 10 FP benchmarks. 
>> 
>> ***Configures:
>> Intrate and fprate, 22 copies. 
>> 
>> ***Compiler options:
>> no : 				-g -O2 -march=native
>> used_gpr_arg:  	no + -fzero-call-used-regs=used-gpr-arg
>> used_arg:  	 	no + -fzero-call-used-regs=used-arg
>> all_arg:			no + -fzero-call-used-regs=all-arg
>> used_gpr:		no + -fzero-call-used-regs=used-gpr
>> all_gpr:			no + -fzero-call-used-regs=all-gpr
>> used:			no + -fzero-call-used-regs=used
>> all:				no + -fzero-call-used-regs=all
>> 
>> ***each benchmark runs 3 times. 
>> 
>> ***runtime performance data:
>> Please see the attached csv file
>> 
>> 
>> From the data, we can see that:
>> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
>> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
>> Looks like the overhead of zeroing vector registers is much bigger. 
>> 
>> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
>> 
>> ***code size increase data:
>> 
>> Please see the attached file 
>> 
>> 
>> From the data, we can see that:
>> The code size impact in general is very small, the biggest is “all_arg”, which is 1.06% for integer benchmark, and 1.13% for FP benchmarks.
>> 
>> So, from the data collected, I think that the run-time overhead and code size increase from this option are very reasonable. 
>> 
>> Let me know you comments and opinions.
>> 
>> thanks.
>> 
>> Qing
>> 
>>> On Aug 25, 2020, at 4:54 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>> 
>>> 
>>> 
>>>> On Aug 24, 2020, at 3:20 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>> 
>>>> Hi!
>>>> 
>>>> On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
>>>>>> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>>>> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
>>>>>>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>>>>>> Numbers on how expensive this is (for what arch, in code size and in
>>>>>>>> execution time) would be useful.  If it is so expensive that no one will
>>>>>>>> use it, it helps security at most none at all :-(
>>>>>> 
>>>>>> Without numbers on this, no one can determine if it is a good tradeoff
>>>>>> for them.  And we (the GCC people) cannot know if it will be useful for
>>>>>> enough users that it will be worth the effort for us.  Which is why I
>>>>>> keep hammering on this point.
>>>>> I can collect some run-time overhead data on this, do you have a recommendation on what test suite I can use
>>>>> For this testing? (Is CPU2017 good enough)?
>>>> 
>>>> I would use something more real-life, not 12 small pieces of code.
>>> 
>>> There is some basic information about the benchmarks of CPU2017 in below link:
>>> 
>>> https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$<https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$> <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$<https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$> >
>>> 
>>> GCC itself is one of the benchmarks in CPU2017 (502.gcc_r). And 526.blender_r is even larger than 502.gcc_r. 
>>> And there are several other quite big benchmarks as well (perlbench, xalancbmk, parest, imagick, etc).
>>> 
>>> thanks.
>>> 
>>> Qing
> 


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-03 14:29                                     ` Qing Zhao
  2020-09-03 15:08                                       ` Qing Zhao
@ 2020-09-03 17:13                                       ` Kees Cook
  2020-09-03 17:43                                         ` Qing Zhao
                                                           ` (2 more replies)
  1 sibling, 3 replies; 188+ messages in thread
From: Kees Cook @ 2020-09-03 17:13 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Segher Boessenkool, Jakub Jelinek, Uros Bizjak,
	victor Rodriguez Bahena, GCC Patches

On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
> Looks like the overhead of zeroing vector registers is much bigger. 
> 
> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.

That looks great; thanks for doing those tests!

(And it seems like these benchmarks are kind of a "worst case" scenario
with regard to performance, yes? As in it's mostly tight call loops?)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-03 17:13                                       ` Kees Cook
@ 2020-09-03 17:43                                         ` Qing Zhao
  2020-09-04  1:23                                           ` Rodriguez Bahena, Victor
  2020-09-03 17:48                                         ` Ramana Radhakrishnan
  2020-09-04 15:43                                         ` Segher Boessenkool
  2 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-03 17:43 UTC (permalink / raw)
  To: Kees Cook
  Cc: Segher Boessenkool, Jakub Jelinek, Uros Bizjak,
	victor Rodriguez Bahena, GCC Patches



> On Sep 3, 2020, at 12:13 PM, Kees Cook <keescook@chromium.org> wrote:
> 
> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
>> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
>> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
>> Looks like the overhead of zeroing vector registers is much bigger. 
>> 
>> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
> 
> That looks great; thanks for doing those tests!
> 
> (And it seems like these benchmarks are kind of a "worst case" scenario
> with regard to performance, yes? As in it's mostly tight call loops?)

The top 3 benchmarks that have the most overhead from this option are: 531.deepsjeng_r, 541.leela_r, and 511.povray_r.
All of them are C++ benchmarks. 
I guess that the most important reason is  the smaller routine size in general (especially at the hot execution path or loops).
As a result, the overhead of these additional zeroing instructions in each routine will be relatively higher.  

Qing

> 
> -- 
> Kees Cook


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-03 17:13                                       ` Kees Cook
  2020-09-03 17:43                                         ` Qing Zhao
@ 2020-09-03 17:48                                         ` Ramana Radhakrishnan
  2020-09-03 19:20                                           ` Qing Zhao
  2020-09-04 15:43                                         ` Segher Boessenkool
  2 siblings, 1 reply; 188+ messages in thread
From: Ramana Radhakrishnan @ 2020-09-03 17:48 UTC (permalink / raw)
  To: Kees Cook
  Cc: Qing Zhao, Jakub Jelinek, GCC Patches, Uros Bizjak,
	victor Rodriguez Bahena, Segher Boessenkool

On Thu, Sep 3, 2020 at 6:13 PM Kees Cook via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
> > On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks.
> > If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average.
> > Looks like the overhead of zeroing vector registers is much bigger.
> >
> > For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
>
> That looks great; thanks for doing those tests!
>
> (And it seems like these benchmarks are kind of a "worst case" scenario
> with regard to performance, yes? As in it's mostly tight call loops?)


That's true of some of them but definitely not all - the GCC benchmark
springs to mind in SPEC as having quite a flat profile, so I'd take a
look there and probe a bit more in that one to see what happens. Don't
ask me what else , that's all I have in my cache this evening :)

I'd also query the "average" slowdown metric in those numbers as
something that's being measured in a different way here. IIRC the SPEC
scores for int and FP are computed with a geometric mean of the
individual ratios of each of the benchmark. Thus I don't think the
average of the slowdowns is enough to talk about slowdowns for the
benchmark suite. A quick calculation of the arithmetic mean of column
B in my head suggests that it's the arithmetic mean of all the
slowdowns ?

i.e. Slowdown (Geometric Mean (x, y, z, ....))  != Arithmetic mean (
Slowdown (x), Slowdown (y) .....)

So another metric to look at would be to look at the Slowdown of your
estimated (probably non-reportable) SPEC scores as well to get a more
"spec like" metric.

regards
Ramana
>
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-03 17:48                                         ` Ramana Radhakrishnan
@ 2020-09-03 19:20                                           ` Qing Zhao
  0 siblings, 0 replies; 188+ messages in thread
From: Qing Zhao @ 2020-09-03 19:20 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: Kees Cook, Jakub Jelinek, GCC Patches, Uros Bizjak,
	victor Rodriguez Bahena, Segher Boessenkool



> On Sep 3, 2020, at 12:48 PM, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
> 
> On Thu, Sep 3, 2020 at 6:13 PM Kees Cook via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>> 
>> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
>>> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks.
>>> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average.
>>> Looks like the overhead of zeroing vector registers is much bigger.
>>> 
>>> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
>> 
>> That looks great; thanks for doing those tests!
>> 
>> (And it seems like these benchmarks are kind of a "worst case" scenario
>> with regard to performance, yes? As in it's mostly tight call loops?)
> 
> 
> That's true of some of them but definitely not all - the GCC benchmark
> springs to mind in SPEC as having quite a flat profile, so I'd take a
> look there and probe a bit more in that one to see what happens. Don't
> ask me what else , that's all I have in my cache this evening :)
> 
> I'd also query the "average" slowdown metric in those numbers as
> something that's being measured in a different way here. IIRC the SPEC
> scores for int and FP are computed with a geometric mean of the
> individual ratios of each of the benchmark. Thus I don't think the
> average of the slowdowns is enough to talk about slowdowns for the
> benchmark suite. A quick calculation of the arithmetic mean of column
> B in my head suggests that it's the arithmetic mean of all the
> slowdowns ?
> 
> i.e. Slowdown (Geometric Mean (x, y, z, ....))  != Arithmetic mean (
> Slowdown (x), Slowdown (y) .....)
> 
> So another metric to look at would be to look at the Slowdown of your
> estimated (probably non-reportable) SPEC scores as well to get a more
> "spec like" metric.

Please take a look at the new csv file at:

https://gitlab.com/x86-gcc/gcc/-/wikis/Zero-call-used-registers-data <https://gitlab.com/x86-gcc/gcc/-/wikis/Zero-call-used-registers-data>

I just uploaded the slowdown data computed based on Est.SPECrate(R)2017_int_base and Est.SPECrate(R)2017_fp_base. All data are computed against “no”. 

Compare this slowdown data to the one I computed previously as (Arithmetic mean (Slowdown(x), slowdown(y)…), the numbers do change a little bit, however, the basic information provided from the data keeps the same as before. 

Let me know if you have further comments.

thanks.

Qing


> 
> regards
> Ramana
>> 
>> --
>> Kees Cook


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-03 17:43                                         ` Qing Zhao
@ 2020-09-04  1:23                                           ` Rodriguez Bahena, Victor
  2020-09-04 14:18                                             ` Qing Zhao
  2020-09-07 14:44                                             ` Segher Boessenkool
  0 siblings, 2 replies; 188+ messages in thread
From: Rodriguez Bahena, Victor @ 2020-09-04  1:23 UTC (permalink / raw)
  To: Qing Zhao, Kees Cook
  Cc: Segher Boessenkool, Jakub Jelinek, Uros Bizjak, GCC Patches



-----Original Message-----
From: Qing Zhao <QING.ZHAO@oracle.com>
Date: Thursday, September 3, 2020 at 12:55 PM
To: Kees Cook <keescook@chromium.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>, Jakub Jelinek <jakub@redhat.com>, Uros Bizjak <ubizjak@gmail.com>, "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com>, GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]



    > On Sep 3, 2020, at 12:13 PM, Kees Cook <keescook@chromium.org> wrote:
    > 
    > On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
    >> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
    >> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
    >> Looks like the overhead of zeroing vector registers is much bigger. 
    >> 
    >> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
    > 
    > That looks great; thanks for doing those tests!
    > 
    > (And it seems like these benchmarks are kind of a "worst case" scenario
    > with regard to performance, yes? As in it's mostly tight call loops?)

    The top 3 benchmarks that have the most overhead from this option are: 531.deepsjeng_r, 541.leela_r, and 511.povray_r.
    All of them are C++ benchmarks. 
    I guess that the most important reason is  the smaller routine size in general (especially at the hot execution path or loops).
    As a result, the overhead of these additional zeroing instructions in each routine will be relatively higher.  

    Qing

I think that overhead is expected in benchmarks like 541.leela_r, according to https://www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html is a benchmark for Artificial Intelligence (Monte Carlo simulation, game tree search & pattern recognition). The addition of fzero-call-used-regs will represent an overhead each time the functions are being call and in areas like game tree search is high. 

Qing, thanks a lot for the measurement, I am not sure if this is the limit of overhead the community is willing to accept by adding extra security (me as gcc user will be willing to accept). 

Regards

Victor 


    > 
    > -- 
    > Kees Cook



^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-04  1:23                                           ` Rodriguez Bahena, Victor
@ 2020-09-04 14:18                                             ` Qing Zhao
  2020-09-07 13:06                                               ` Rodriguez Bahena, Victor
  2020-09-07 14:44                                             ` Segher Boessenkool
  1 sibling, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-04 14:18 UTC (permalink / raw)
  To: Rodriguez Bahena, Victor, Kees Cook
  Cc: Segher Boessenkool, Jakub Jelinek, Uros Bizjak, GCC Patches



> On Sep 3, 2020, at 8:23 PM, Rodriguez Bahena, Victor <victor.rodriguez.bahena@intel.com> wrote:
> 
> 
> 
> -----Original Message-----
> From: Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>>
> Date: Thursday, September 3, 2020 at 12:55 PM
> To: Kees Cook <keescook@chromium.org <mailto:keescook@chromium.org>>
> Cc: Segher Boessenkool <segher@kernel.crashing.org <mailto:segher@kernel.crashing.org>>, Jakub Jelinek <jakub@redhat.com <mailto:jakub@redhat.com>>, Uros Bizjak <ubizjak@gmail.com <mailto:ubizjak@gmail.com>>, "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com <mailto:victor.rodriguez.bahena@intel.com>>, GCC Patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>>
> Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> 
> 
> 
>> On Sep 3, 2020, at 12:13 PM, Kees Cook <keescook@chromium.org> wrote:
>> 
>> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
>>> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
>>> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
>>> Looks like the overhead of zeroing vector registers is much bigger. 
>>> 
>>> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
>> 
>> That looks great; thanks for doing those tests!
>> 
>> (And it seems like these benchmarks are kind of a "worst case" scenario
>> with regard to performance, yes? As in it's mostly tight call loops?)
> 
>    The top 3 benchmarks that have the most overhead from this option are: 531.deepsjeng_r, 541.leela_r, and 511.povray_r.
>    All of them are C++ benchmarks. 
>    I guess that the most important reason is  the smaller routine size in general (especially at the hot execution path or loops).
>    As a result, the overhead of these additional zeroing instructions in each routine will be relatively higher.  
> 
>    Qing
> 
> I think that overhead is expected in benchmarks like 541.leela_r, according to https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html__;!!GqivPVa7Brio!I4c2wyzrNGbeOTsX7BSD-4C9Cv3ypQ4N1qfRzSK__STxRGa5M4VarBKof2ak8-dT$ <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html__;!!GqivPVa7Brio!I4c2wyzrNGbeOTsX7BSD-4C9Cv3ypQ4N1qfRzSK__STxRGa5M4VarBKof2ak8-dT$>  is a benchmark for Artificial Intelligence (Monte Carlo simulation, game tree search & pattern recognition). The addition of fzero-call-used-regs will represent an overhead each time the functions are being call and in areas like game tree search is high. 
> 
> Qing, thanks a lot for the measurement, I am not sure if this is the limit of overhead the community is willing to accept by adding extra security (me as gcc user will be willing to accept). 

From the performance data, we can see that the runtime overhead of clearing only_used registers is very reasonable, even for 541.leela_r, 531.deepsjent_r, and 511.povray.   If try to clear all registers whatever used or not in the current routine, the overhead will be increased dramatically. 

So, my question is:

From the security point of view, does clearing ALL registers have more benefit than clearing USED registers?  
From my understanding, clearing registers that are not used in the current routine does NOT provide additional benefit, correct me if I am wrong here.

Thanks.

Qing


> 
> Regards
> 
> Victor 
> 
> 
>> 
>> -- 
>> Kees Cook


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-24 20:49                                     ` Qing Zhao
@ 2020-09-04 15:18                                       ` Segher Boessenkool
  2020-09-04 17:34                                         ` H.J. Lu
  0 siblings, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-04 15:18 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Rodriguez Bahena, Victor, Richard Biener, Jeff Law, Uros Bizjak,
	H. J. Lu, Jakub Jelinek, GCC Patches, Kees Cook

Hi!

On Mon, Aug 24, 2020 at 03:49:50PM -0500, Qing Zhao wrote:
> > Do you want to do this before or after the epilogue code is generated?
> 
> static rtx_insn *
> make_epilogue_seq (void)
> {
>   if (!targetm.have_epilogue ())
>     return NULL;
> 
>   start_sequence ();
>   emit_note (NOTE_INSN_EPILOGUE_BEG);
> 
>  +++++ gen_call_used_regs_seq ();                     // this is the place to emit the zeroing insn sequence
> 
>   rtx_insn *seq = targetm.gen_epilogue ();
> …
> }
> 
> Any comment on this?

So, before.  This is problematic if the epilogue uses any of those
registers: if the epilogue expects some value there, you just destroyed
it; and, conversely, if the epilogue writes such a reg, your zeroing is
useless.


You probably have to do this for every target separately?  But it is not
enough to handle it in the epilogue, you also need to make sure it is
done on every path that returns *without* epilogue.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-08-24 20:43                                   ` Qing Zhao
  2020-08-25  6:41                                     ` Uros Bizjak
@ 2020-09-04 15:26                                     ` Segher Boessenkool
  1 sibling, 0 replies; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-04 15:26 UTC (permalink / raw)
  To: Qing Zhao
  Cc: victor Rodriguez Bahena, Richard Biener, Jeff Law, Uros Bizjak,
	H. J. Lu, Jakub Jelinek, GCC Patches, Kees Cook

On Mon, Aug 24, 2020 at 03:43:11PM -0500, Qing Zhao wrote:
> > On Aug 24, 2020, at 3:20 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >> For this testing? (Is CPU2017 good enough)?
> > 
> > I would use something more real-life, not 12 small pieces of code.
> 
> Then, what kind of real-life benchmark you are suggesting? 

Picking benchmark code is Hard (and that is your job, not mine, sorry).
Maybe firefox or openoffice or whatever.  Some *bigger* code.  Real-life
code.

> >> Please take a look at this paper. 
> > 
> > As I told you before, that isn't open information, I cannot reply to
> > any of that.
> 
> A little confused here, what’s you mean by “open information”? Is the information in a published paper not open information?

I am not allowed to quote it here.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-03 17:13                                       ` Kees Cook
  2020-09-03 17:43                                         ` Qing Zhao
  2020-09-03 17:48                                         ` Ramana Radhakrishnan
@ 2020-09-04 15:43                                         ` Segher Boessenkool
  2020-09-04 17:18                                           ` Qing Zhao
  2 siblings, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-04 15:43 UTC (permalink / raw)
  To: Kees Cook
  Cc: Qing Zhao, Jakub Jelinek, Uros Bizjak, victor Rodriguez Bahena,
	GCC Patches

On Thu, Sep 03, 2020 at 10:13:35AM -0700, Kees Cook wrote:
> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
> > On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
> > If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
> > Looks like the overhead of zeroing vector registers is much bigger. 
> > 
> > For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
> 
> That looks great; thanks for doing those tests!
> 
> (And it seems like these benchmarks are kind of a "worst case" scenario
> with regard to performance, yes? As in it's mostly tight call loops?)

I call this very expensive, already, and it is benchmarked on a target
where this should be very cheap (it has few registers) :-/


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-04 15:43                                         ` Segher Boessenkool
@ 2020-09-04 17:18                                           ` Qing Zhao
  2020-09-04 18:04                                             ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-04 17:18 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Kees Cook, Jakub Jelinek, Uros Bizjak, victor Rodriguez Bahena,
	GCC Patches



> On Sep 4, 2020, at 10:43 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Thu, Sep 03, 2020 at 10:13:35AM -0700, Kees Cook wrote:
>> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
>>> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
>>> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
>>> Looks like the overhead of zeroing vector registers is much bigger. 
>>> 
>>> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
>> 
>> That looks great; thanks for doing those tests!
>> 
>> (And it seems like these benchmarks are kind of a "worst case" scenario
>> with regard to performance, yes? As in it's mostly tight call loops?)
> 
> I call this very expensive, already,

Yes, I think that 17.56% on average is quite expensive. That’s the data for -fzero-call-used-regs=all, the worst case i.e, clearing all the call-used registers at the return.

However, if we only clear USED registers, the worst case is 1.72% on average.  This overhead is very reasonable. 
Furthermore, if we only clear used_gpr_arg, i.e used general purpose registers that pass parameters, this should be enough to be used for mitigation ROP, the overhead is even smaller, it’s 0.84% on average. 


> and it is benchmarked on a target
> where this should be very cheap (it has few registers) :-/

It’s a tradeoff to improve the software security with some runtime overhead. 

For compiler, we should provide such option to the users to satisfy their security need even though the runtime overhead.  Of course, during compiler implementation, we will do our best to minimize the runtime overhead.

Qing



> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-04 15:18                                       ` Segher Boessenkool
@ 2020-09-04 17:34                                         ` H.J. Lu
  2020-09-04 18:09                                           ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: H.J. Lu @ 2020-09-04 17:34 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Qing Zhao, Rodriguez Bahena, Victor, Richard Biener, Jeff Law,
	Uros Bizjak, Jakub Jelinek, GCC Patches, Kees Cook

On Fri, Sep 4, 2020 at 8:18 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Mon, Aug 24, 2020 at 03:49:50PM -0500, Qing Zhao wrote:
> > > Do you want to do this before or after the epilogue code is generated?
> >
> > static rtx_insn *
> > make_epilogue_seq (void)
> > {
> >   if (!targetm.have_epilogue ())
> >     return NULL;
> >
> >   start_sequence ();
> >   emit_note (NOTE_INSN_EPILOGUE_BEG);
> >
> >  +++++ gen_call_used_regs_seq ();                     // this is the place to emit the zeroing insn sequence
> >
> >   rtx_insn *seq = targetm.gen_epilogue ();
> > …
> > }
> >
> > Any comment on this?
>
> So, before.  This is problematic if the epilogue uses any of those
> registers: if the epilogue expects some value there, you just destroyed
> it; and, conversely, if the epilogue writes such a reg, your zeroing is
> useless.
>
>
> You probably have to do this for every target separately?  But it is not
> enough to handle it in the epilogue, you also need to make sure it is
> done on every path that returns *without* epilogue.

This feature is designed for normal return with epilogue.

-- 
H.J.

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-04 17:18                                           ` Qing Zhao
@ 2020-09-04 18:04                                             ` Segher Boessenkool
  2020-09-04 19:00                                               ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-04 18:04 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Kees Cook, Jakub Jelinek, Uros Bizjak, victor Rodriguez Bahena,
	GCC Patches

On Fri, Sep 04, 2020 at 12:18:12PM -0500, Qing Zhao wrote:
> > I call this very expensive, already,
> 
> Yes, I think that 17.56% on average is quite expensive. That’s the data for -fzero-call-used-regs=all, the worst case i.e, clearing all the call-used registers at the return.
> 
> However, if we only clear USED registers, the worst case is 1.72% on average.  This overhead is very reasonable. 

No, that is the number I meant.  2% overhead is extremely much, unless
this is magically super effective, and actually protects many things
from exploitation (that aren't already protected some other way, SSP for
example).

> > and it is benchmarked on a target
> > where this should be very cheap (it has few registers) :-/
> 
> It’s a tradeoff to improve the software security with some runtime overhead. 

Yes.  Which is why I asked for numbers of both sides of the equation:
how much it costs, vs. how much value it brings.

> For compiler, we should provide such option to the users to satisfy their security need even though the runtime overhead.  Of course, during compiler implementation, we will do our best to minimize the runtime overhead.

There also is a real cost to the compiler *developers*.  Which is my
prime worry here.  If this gives users at most marginal value, then it
is real cost to us, but nothing to hold up to that.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-04 17:34                                         ` H.J. Lu
@ 2020-09-04 18:09                                           ` Segher Boessenkool
  2020-09-04 18:52                                             ` H.J. Lu
  0 siblings, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-04 18:09 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Qing Zhao, Rodriguez Bahena, Victor, Richard Biener, Jeff Law,
	Uros Bizjak, Jakub Jelinek, GCC Patches, Kees Cook

On Fri, Sep 04, 2020 at 10:34:23AM -0700, H.J. Lu wrote:
> > You probably have to do this for every target separately?  But it is not
> > enough to handle it in the epilogue, you also need to make sure it is
> > done on every path that returns *without* epilogue.
> 
> This feature is designed for normal return with epilogue.

Very many normal returns do *not* pass through an epilogue, but are
simple_return.  Disabling that is *much* more expensive than that 2%.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-04 18:09                                           ` Segher Boessenkool
@ 2020-09-04 18:52                                             ` H.J. Lu
  2020-09-07 14:06                                               ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: H.J. Lu @ 2020-09-04 18:52 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Qing Zhao, Rodriguez Bahena, Victor, Richard Biener, Jeff Law,
	Uros Bizjak, Jakub Jelinek, GCC Patches, Kees Cook

On Fri, Sep 4, 2020 at 11:09 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Fri, Sep 04, 2020 at 10:34:23AM -0700, H.J. Lu wrote:
> > > You probably have to do this for every target separately?  But it is not
> > > enough to handle it in the epilogue, you also need to make sure it is
> > > done on every path that returns *without* epilogue.
> >
> > This feature is designed for normal return with epilogue.
>
> Very many normal returns do *not* pass through an epilogue, but are
> simple_return.  Disabling that is *much* more expensive than that 2%.

Sibcall isn't covered.  What other cases don't have an epilogue?

-- 
H.J.

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-04 18:04                                             ` Segher Boessenkool
@ 2020-09-04 19:00                                               ` Qing Zhao
  2020-09-07 14:36                                                 ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-04 19:00 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Kees Cook, Jakub Jelinek, Uros Bizjak, victor Rodriguez Bahena,
	GCC Patches



> On Sep 4, 2020, at 1:04 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Sep 04, 2020 at 12:18:12PM -0500, Qing Zhao wrote:
>>> I call this very expensive, already,
>> 
>> Yes, I think that 17.56% on average is quite expensive. That’s the data for -fzero-call-used-regs=all, the worst case i.e, clearing all the call-used registers at the return.
>> 
>> However, if we only clear USED registers, the worst case is 1.72% on average.  This overhead is very reasonable. 
> 
> No, that is the number I meant.  2% overhead is extremely much, unless
> this is magically super effective, and actually protects many things
> from exploitation (that aren't already protected some other way, SSP for
> example).

Then how about the 0.81% overhead on average for -fzero-call-used-regs=used_gpr_arg? 

This option can be used to effectively mitigate ROP attack. 

and currently,   Clear Linux project has been using a similar option as this one since GCC 8 (similar as -fzero-call-used-regs=used_gpr). 


>>> and it is benchmarked on a target
>>> where this should be very cheap (it has few registers) :-/
>> 
>> It’s a tradeoff to improve the software security with some runtime overhead. 
> 
> Yes.  Which is why I asked for numbers of both sides of the equation:
> how much it costs, vs. how much value it brings.

Reasonable. 

> 
>> For compiler, we should provide such option to the users to satisfy their security need even though the runtime overhead.  Of course, during compiler implementation, we will do our best to minimize the runtime overhead.
> 
> There also is a real cost to the compiler *developers*.  Which is my
> prime worry here.  If this gives users at most marginal value, then it
> is real cost to us, but nothing to hold up to that.

Here, you mean the future maintenance  cost  for this part of the code?

Qing
> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-04 14:18                                             ` Qing Zhao
@ 2020-09-07 13:06                                               ` Rodriguez Bahena, Victor
  2020-09-08 15:00                                                 ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Rodriguez Bahena, Victor @ 2020-09-07 13:06 UTC (permalink / raw)
  To: Qing Zhao, Kees Cook
  Cc: Segher Boessenkool, Jakub Jelinek, Uros Bizjak, GCC Patches



From: Qing Zhao <QING.ZHAO@ORACLE.COM>
Date: Friday, September 4, 2020 at 9:19 AM
To: "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com>, Kees Cook <keescook@chromium.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>, Jakub Jelinek <jakub@redhat.com>, Uros Bizjak <ubizjak@gmail.com>, GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]




On Sep 3, 2020, at 8:23 PM, Rodriguez Bahena, Victor <victor.rodriguez.bahena@intel.com<mailto:victor.rodriguez.bahena@intel.com>> wrote:



-----Original Message-----
From: Qing Zhao <QING.ZHAO@oracle.com<mailto:QING.ZHAO@oracle.com>>
Date: Thursday, September 3, 2020 at 12:55 PM
To: Kees Cook <keescook@chromium.org<mailto:keescook@chromium.org>>
Cc: Segher Boessenkool <segher@kernel.crashing.org<mailto:segher@kernel.crashing.org>>, Jakub Jelinek <jakub@redhat.com<mailto:jakub@redhat.com>>, Uros Bizjak <ubizjak@gmail.com<mailto:ubizjak@gmail.com>>, "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com<mailto:victor.rodriguez.bahena@intel.com>>, GCC Patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]




On Sep 3, 2020, at 12:13 PM, Kees Cook <keescook@chromium.org<mailto:keescook@chromium.org>> wrote:

On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:

On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks.
If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average.
Looks like the overhead of zeroing vector registers is much bigger.

For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.

That looks great; thanks for doing those tests!

(And it seems like these benchmarks are kind of a "worst case" scenario
with regard to performance, yes? As in it's mostly tight call loops?)

   The top 3 benchmarks that have the most overhead from this option are: 531.deepsjeng_r, 541.leela_r, and 511.povray_r.
   All of them are C++ benchmarks.
   I guess that the most important reason is  the smaller routine size in general (especially at the hot execution path or loops).
   As a result, the overhead of these additional zeroing instructions in each routine will be relatively higher.

   Qing

I think that overhead is expected in benchmarks like 541.leela_r, according to https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html__;!!GqivPVa7Brio!I4c2wyzrNGbeOTsX7BSD-4C9Cv3ypQ4N1qfRzSK__STxRGa5M4VarBKof2ak8-dT$<https://urldefense.com/v3/__https:/www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html__;!!GqivPVa7Brio!I4c2wyzrNGbeOTsX7BSD-4C9Cv3ypQ4N1qfRzSK__STxRGa5M4VarBKof2ak8-dT$>  is a benchmark for Artificial Intelligence (Monte Carlo simulation, game tree search & pattern recognition). The addition of fzero-call-used-regs will represent an overhead each time the functions are being call and in areas like game tree search is high.

Qing, thanks a lot for the measurement, I am not sure if this is the limit of overhead the community is willing to accept by adding extra security (me as gcc user will be willing to accept).

From the performance data, we can see that the runtime overhead of clearing only_used registers is very reasonable, even for 541.leela_r, 531.deepsjent_r, and 511.povray.   If try to clear all registers whatever used or not in the current routine, the overhead will be increased dramatically.

So, my question is:

From the security point of view, does clearing ALL registers have more benefit than clearing USED registers?
From my understanding, clearing registers that are not used in the current routine does NOT provide additional benefit, correct me if I am wrong here.

You are right, it does not provide additional security


Thanks.

Qing



Regards

Victor




--
Kees Cook



^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-04 18:52                                             ` H.J. Lu
@ 2020-09-07 14:06                                               ` Segher Boessenkool
  2020-09-07 15:58                                                 ` H.J. Lu
  0 siblings, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-07 14:06 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Qing Zhao, Rodriguez Bahena, Victor, Richard Biener, Jeff Law,
	Uros Bizjak, Jakub Jelinek, GCC Patches, Kees Cook

On Fri, Sep 04, 2020 at 11:52:13AM -0700, H.J. Lu wrote:
> On Fri, Sep 4, 2020 at 11:09 AM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > On Fri, Sep 04, 2020 at 10:34:23AM -0700, H.J. Lu wrote:
> > > > You probably have to do this for every target separately?  But it is not
> > > > enough to handle it in the epilogue, you also need to make sure it is
> > > > done on every path that returns *without* epilogue.
> > >
> > > This feature is designed for normal return with epilogue.
> >
> > Very many normal returns do *not* pass through an epilogue, but are
> > simple_return.  Disabling that is *much* more expensive than that 2%.
> 
> Sibcall isn't covered.  What other cases don't have an epilogue?

Shrink-wrapped stuff.  Quite important for performance.  Not something
you can throw away.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-04 19:00                                               ` Qing Zhao
@ 2020-09-07 14:36                                                 ` Segher Boessenkool
  2020-09-08 14:55                                                   ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-07 14:36 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Kees Cook, Jakub Jelinek, Uros Bizjak, victor Rodriguez Bahena,
	GCC Patches

On Fri, Sep 04, 2020 at 02:00:41PM -0500, Qing Zhao wrote:
> >> However, if we only clear USED registers, the worst case is 1.72% on average.  This overhead is very reasonable. 
> > 
> > No, that is the number I meant.  2% overhead is extremely much, unless
> > this is magically super effective, and actually protects many things
> > from exploitation (that aren't already protected some other way, SSP for
> > example).
> 
> Then how about the 0.81% overhead on average for -fzero-call-used-regs=used_gpr_arg? 

That is still quite a lot.

> This option can be used to effectively mitigate ROP attack. 

Nice assertion.  Show it!

> > Yes.  Which is why I asked for numbers of both sides of the equation:
> > how much it costs, vs. how much value it brings.
> 
> Reasonable. 

I'm glad you agree :-)

> >> For compiler, we should provide such option to the users to satisfy their security need even though the runtime overhead.  Of course, during compiler implementation, we will do our best to minimize the runtime overhead.
> > 
> > There also is a real cost to the compiler *developers*.  Which is my
> > prime worry here.  If this gives users at most marginal value, then it
> > is real cost to us, but nothing to hold up to that.
> 
> Here, you mean the future maintenance  cost  for this part of the code?

Not just that.  *All* support costs, and consider all other
optimisations it will interfere with, etc.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-04  1:23                                           ` Rodriguez Bahena, Victor
  2020-09-04 14:18                                             ` Qing Zhao
@ 2020-09-07 14:44                                             ` Segher Boessenkool
  2020-09-08 15:05                                               ` Patrick McGehearty
  1 sibling, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-07 14:44 UTC (permalink / raw)
  To: Rodriguez Bahena, Victor
  Cc: Qing Zhao, Kees Cook, Jakub Jelinek, Uros Bizjak, GCC Patches

On Fri, Sep 04, 2020 at 01:23:14AM +0000, Rodriguez Bahena, Victor wrote:
> Qing, thanks a lot for the measurement, I am not sure if this is the limit of overhead the community is willing to accept by adding extra security (me as gcc user will be willing to accept). 

The overhead is of course bearable for most programs / users, but what
is the return?  For what percentage of programs are ROP attacks no
longer possible, for example.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-07 14:06                                               ` Segher Boessenkool
@ 2020-09-07 15:58                                                 ` H.J. Lu
  2020-09-08 16:43                                                   ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: H.J. Lu @ 2020-09-07 15:58 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Qing Zhao, Rodriguez Bahena, Victor, Richard Biener, Jeff Law,
	Uros Bizjak, Jakub Jelinek, GCC Patches, Kees Cook

On Mon, Sep 7, 2020 at 7:06 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Fri, Sep 04, 2020 at 11:52:13AM -0700, H.J. Lu wrote:
> > On Fri, Sep 4, 2020 at 11:09 AM Segher Boessenkool
> > <segher@kernel.crashing.org> wrote:
> > > On Fri, Sep 04, 2020 at 10:34:23AM -0700, H.J. Lu wrote:
> > > > > You probably have to do this for every target separately?  But it is not
> > > > > enough to handle it in the epilogue, you also need to make sure it is
> > > > > done on every path that returns *without* epilogue.
> > > >
> > > > This feature is designed for normal return with epilogue.
> > >
> > > Very many normal returns do *not* pass through an epilogue, but are
> > > simple_return.  Disabling that is *much* more expensive than that 2%.
> >
> > Sibcall isn't covered.  What other cases don't have an epilogue?
>
> Shrink-wrapped stuff.  Quite important for performance.  Not something
> you can throw away.
>

Qing, can you check how it interacts with shrink-wrap?

-- 
H.J.

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-07 14:36                                                 ` Segher Boessenkool
@ 2020-09-08 14:55                                                   ` Qing Zhao
  2020-09-10 21:56                                                     ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-08 14:55 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Kees Cook, Jakub Jelinek, Uros Bizjak, victor Rodriguez Bahena,
	GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2085 bytes --]



> On Sep 7, 2020, at 9:36 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Sep 04, 2020 at 02:00:41PM -0500, Qing Zhao wrote:
>>>> However, if we only clear USED registers, the worst case is 1.72% on average.  This overhead is very reasonable. 
>>> 
>>> No, that is the number I meant.  2% overhead is extremely much, unless
>>> this is magically super effective, and actually protects many things
>>> from exploitation (that aren't already protected some other way, SSP for
>>> example).
>> 
>> Then how about the 0.81% overhead on average for -fzero-call-used-regs=used_gpr_arg? 
> 
> That is still quite a lot.
> 
>> This option can be used to effectively mitigate ROP attack. 
> 
> Nice assertion.  Show it!

As I mentioned multiple times,  one important background of this patch is this  paper which was published at 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP):

"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks”

https://ieeexplore.ieee.org/document/8445132

Downloading this paper form IEEE needs a fee. I have downloaded it from my company’s account, however, After consulting, it turned out that I was not allowed to further forward the copy I downloaded through my company’s account to this alias. 

However, There is some more information on this paper online though:

https://www.semanticscholar.org/paper/Clean-the-Scratch-Registers:-A-Way-to-Mitigate-Rong-Xie/6f2ce4fd31baa0f6c02f9eb5c57b90d39fe5fa13

All the figures and tables in this paper are available in this link. 

In which, Table III, Table IV and Table V are the results of “zeroing scratch register mitigate ROP attack”. From the tables, zeroing scratch registers can successfully mitigate the ROP on all those benchmarks. 

What other information you need to show the effective of mitigation ROP attack?

> 
>>> Yes.  Which is why I asked for numbers of both sides of the equation:
>>> how much it costs, vs. how much value it brings.

[-- Attachment #2: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all].eml --]
[-- Type: message/rfc822, Size: 14720 bytes --]

From: Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org>
To: Uros Bizjak <ubizjak@gmail.com>, Segher Boessenkool <segher@kernel.crashing.org>
Cc: Jakub Jelinek <jakub@redhat.com>, GCC Patches <gcc-patches@gcc.gnu.org>, Kees Cook <keescook@chromium.org>, Segher Boessenkool <segher@kernel.crashing.org>, victor Rodriguez Bahena <victor.rodriguez.bahena@intel.com>
Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
Date: Tue, 25 Aug 2020 17:31:53 -0500
Message-ID: <13A8233E-406C-40D4-B772-72004EE69C07@ORACLE.COM>



> On Aug 25, 2020, at 9:05 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> 
> 
>> On Aug 25, 2020, at 1:41 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> 
>>>> 
>>>>>> (The other side of the coin is how much this helps prevent exploitation;
>>>>>> numbers on that would be good to see, too.)
>>>>> 
>>>>> This can be well showed from the paper:
>>>>> 
>>>>> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"
>>>>> 
>>>>> https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$ <https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$ >
>>>>> 
>>>>> Please take a look at this paper.
>>>> 
>>>> As I told you before, that isn't open information, I cannot reply to
>>>> any of that.
>>> 
>>> A little confused here, what’s you mean by “open information”? Is the information in a published paper not open information?
>> 
>> No, because it is behind a paywall.
> 
> Still don’t understand here:  this paper has been published in the proceeding of “ 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)”.
> If you want to read the complete version online, you need to pay for it.
> 
> However, it’s still a published paper, and the information inside it should be “open information”. 
> 
> So, what’s the definition of “open information” you have?
> 
> I downloaded a PDF copy of this paper through my company’s paid account.  But I am not sure whether it’s legal for me to attach it to this mailing list?

After consulting, it turned out that I was not allowed to further forward the copy I downloaded through my company’s account to this alias. 
There is some more information on this paper online though:

https://urldefense.com/v3/__https://www.semanticscholar.org/paper/Clean-the-Scratch-Registers:-A-Way-to-Mitigate-Rong-Xie/6f2ce4fd31baa0f6c02f9eb5c57b90d39fe5fa13__;!!GqivPVa7Brio!I4MGz7_DH7Dtcfzmgz7MxfDNnuJO-CiNo1jUcp4OOQOiPi4uEEOfuoT7_1SSMt1D$ 

All the figures and tables in this paper are available in this link. 

In which, Figure 1 is an illustration  of a typical ROP attack, please pay special attention on the “Gadgets”, which are carefully chosen machine instruction sequences that are already present in the machine's memory, Each gadget typically ends in a return instruction and is located in a subroutine within the existing program and/or shared library code. Chained together, these gadgets allow an attacker to perform arbitrary operations on a machine employing defenses that thwart simpler attacks.

The paper identified the important features of ROP attack as following:

"First, the destination of using gadget chains in usual is performing system call or system fucntion to perform malicious behaviour such as file access, network access and W ⊕ X disable. In most cases, the adversary would like to disable W ⊕ X. Because once W ⊕ X has been disabled, shellcode can be executed directly instead of rewritting shellcode to ROP chains which may cause some troubles for the adversary. 

Second, if the adversary performs ROP attacks using system call instruction, no matter on x86 or x64 architecture, the register would be used to pass parameter. Or if the adversary performs ROP attacks using system function such as “read” or “mprotect”, on x64 system, the register would still be used to pass parameters, as mentioned in subsection B and C.”
As a result, the paper proposed the idea to zeroing scratch registers that pass parameters at the “return” insns to mitigate the ROP attack. 

Table III, Table IV and Table V are the results of “zeroing scratch register mitigate ROP attack”. From the tables, zeroing scratch registers can successfully mitigate the ROP on all those benchmarks. 

Table VI is the performance overhead of their implementation, it looks like very high, average 16.2X runtime overhead.  However, this implementation is not use compiler to statically generate zeroing sequence, instead, it used "dynamic binary instrumentation at runtime “ to check every instruction to 
1. Set/unset flags to check which scratch registers are used in the routine;
2. Whether the instruction is return instruction or not, if it’s is return, insert the zeroing used scratch register sequence before the “return” insn. 

Due to the above run-time dynamic instrumentation method, the high runtime overhead is expecting, I think.

If we use GCC to statically check the “used” information and add zeroing sequence before return insn, the run-time overhead will be much smaller. 

I will provide run-time overhead information with the 2nd version of the patch by using CPU2017 applications.

thanks.

Qing


> Qing
> 
> 
>> 
>> Uros.


[-- Attachment #3: Type: text/plain, Size: 1052 bytes --]

>> 
>> Reasonable. 
> 
> I'm glad you agree :-)
> 
>>>> For compiler, we should provide such option to the users to satisfy their security need even though the runtime overhead.  Of course, during compiler implementation, we will do our best to minimize the runtime overhead.
>>> 
>>> There also is a real cost to the compiler *developers*.  Which is my
>>> prime worry here.  If this gives users at most marginal value, then it
>>> is real cost to us, but nothing to hold up to that.
>> 
>> Here, you mean the future maintenance  cost  for this part of the code?
> 
> Not just that.  *All* support costs, and consider all other
> optimisations it will interfere with, etc.

Many new features need these kinds of cost, as long as the new feature is necessary to provide important feature to the users.

From my understanding, this is a feature asked by kernel security people to improve kernel's security. And this feature has been in CLEAR LINUX since 2018 to improve kernel security on x86. 

thanks.

Qing
> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-07 13:06                                               ` Rodriguez Bahena, Victor
@ 2020-09-08 15:00                                                 ` Qing Zhao
  2020-09-10 19:07                                                   ` Kees Cook
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-08 15:00 UTC (permalink / raw)
  To: Rodriguez Bahena, Victor, Kees Cook
  Cc: Segher Boessenkool, Jakub Jelinek, Uros Bizjak, GCC Patches



> On Sep 7, 2020, at 8:06 AM, Rodriguez Bahena, Victor <victor.rodriguez.bahena@intel.com> wrote:
> 
>  
>  
> From: Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>>
> Date: Friday, September 4, 2020 at 9:19 AM
> To: "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com <mailto:victor.rodriguez.bahena@intel.com>>, Kees Cook <keescook@chromium.org <mailto:keescook@chromium.org>>
> Cc: Segher Boessenkool <segher@kernel.crashing.org <mailto:segher@kernel.crashing.org>>, Jakub Jelinek <jakub@redhat.com <mailto:jakub@redhat.com>>, Uros Bizjak <ubizjak@gmail.com <mailto:ubizjak@gmail.com>>, GCC Patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>>
> Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
>  
>  
> 
> 
>> On Sep 3, 2020, at 8:23 PM, Rodriguez Bahena, Victor <victor.rodriguez.bahena@intel.com <mailto:victor.rodriguez.bahena@intel.com>> wrote:
>>  
>> 
>> 
>> -----Original Message-----
>> From: Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>>
>> Date: Thursday, September 3, 2020 at 12:55 PM
>> To: Kees Cook <keescook@chromium.org <mailto:keescook@chromium.org>>
>> Cc: Segher Boessenkool <segher@kernel.crashing.org <mailto:segher@kernel.crashing.org>>, Jakub Jelinek <jakub@redhat.com <mailto:jakub@redhat.com>>, Uros Bizjak <ubizjak@gmail.com <mailto:ubizjak@gmail.com>>, "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com <mailto:victor.rodriguez.bahena@intel.com>>, GCC Patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>>
>> Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
>> 
>> 
>> 
>> 
>>> On Sep 3, 2020, at 12:13 PM, Kees Cook <keescook@chromium.org <mailto:keescook@chromium.org>> wrote:
>>> 
>>> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
>>> 
>>>> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
>>>> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
>>>> Looks like the overhead of zeroing vector registers is much bigger. 
>>>> 
>>>> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
>>> 
>>> That looks great; thanks for doing those tests!
>>> 
>>> (And it seems like these benchmarks are kind of a "worst case" scenario
>>> with regard to performance, yes? As in it's mostly tight call loops?)
>> 
>>    The top 3 benchmarks that have the most overhead from this option are: 531.deepsjeng_r, 541.leela_r, and 511.povray_r.
>>    All of them are C++ benchmarks. 
>>    I guess that the most important reason is  the smaller routine size in general (especially at the hot execution path or loops).
>>    As a result, the overhead of these additional zeroing instructions in each routine will be relatively higher.  
>> 
>>    Qing
>> 
>> I think that overhead is expected in benchmarks like 541.leela_r, according to https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html__;!!GqivPVa7Brio!I4c2wyzrNGbeOTsX7BSD-4C9Cv3ypQ4N1qfRzSK__STxRGa5M4VarBKof2ak8-dT$ <https://urldefense.com/v3/__https:/www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html__;!!GqivPVa7Brio!I4c2wyzrNGbeOTsX7BSD-4C9Cv3ypQ4N1qfRzSK__STxRGa5M4VarBKof2ak8-dT$>  is a benchmark for Artificial Intelligence (Monte Carlo simulation, game tree search & pattern recognition). The addition of fzero-call-used-regs will represent an overhead each time the functions are being call and in areas like game tree search is high. 
>> 
>> Qing, thanks a lot for the measurement, I am not sure if this is the limit of overhead the community is willing to accept by adding extra security (me as gcc user will be willing to accept). 
>  
> From the performance data, we can see that the runtime overhead of clearing only_used registers is very reasonable, even for 541.leela_r, 531.deepsjent_r, and 511.povray.   If try to clear all registers whatever used or not in the current routine, the overhead will be increased dramatically. 
>  
> So, my question is:
>  
> From the security point of view, does clearing ALL registers have more benefit than clearing USED registers?  
> From my understanding, clearing registers that are not used in the current routine does NOT provide additional benefit, correct me if I am wrong here.
>  
> You are right, it does not provide additional security

Then, is it necessary to provide 

-fzero-call-used-regs=all-arg|all-gpr|all   to the user?

Can we just delete these 3 sub options?


Qing


>  
>  
> Thanks.
>  
> Qing
>  
>  
>> 
>> Regards
>> 
>> Victor 
>> 
>> 
>> 
>>> 
>>> -- 
>>> Kees Cook
> 
> 
> 


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-07 14:44                                             ` Segher Boessenkool
@ 2020-09-08 15:05                                               ` Patrick McGehearty
  2020-09-10 12:11                                                 ` Richard Sandiford
  0 siblings, 1 reply; 188+ messages in thread
From: Patrick McGehearty @ 2020-09-08 15:05 UTC (permalink / raw)
  To: gcc-patches

My understanding is this feature/flag is not intended to be "default on".
It is intended to be used in security sensitive environments such
as the Linux kernel where it was requested by kernel security experts.
I'm not understanding the objection here if the feature is requested
by security teams and the average cost is modest.

My background is in performance and application optimization. I agree
that for typical computation oriented, non-secure applications, I would
not use the feature, but for system applications that have the ability
to cross protection boundaries, it seems to be clearly a worthwhile
feature.

- patrick


On 9/7/2020 9:44 AM, Segher Boessenkool wrote:
> On Fri, Sep 04, 2020 at 01:23:14AM +0000, Rodriguez Bahena, Victor wrote:
>> Qing, thanks a lot for the measurement, I am not sure if this is the limit of overhead the community is willing to accept by adding extra security (me as gcc user will be willing to accept).
> The overhead is of course bearable for most programs / users, but what
> is the return?  For what percentage of programs are ROP attacks no
> longer possible, for example.
>
>
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-07 15:58                                                 ` H.J. Lu
@ 2020-09-08 16:43                                                   ` Qing Zhao
  2020-09-10 22:05                                                     ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-08 16:43 UTC (permalink / raw)
  To: H.J. Lu, Segher Boessenkool
  Cc: Rodriguez Bahena, Victor, Richard Biener, Jeff Law, Uros Bizjak,
	Jakub Jelinek, GCC Patches, Kees Cook



> On Sep 7, 2020, at 10:58 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> 
> On Mon, Sep 7, 2020 at 7:06 AM Segher Boessenkool
> <segher@kernel.crashing.org <mailto:segher@kernel.crashing.org>> wrote:
>> 
>> On Fri, Sep 04, 2020 at 11:52:13AM -0700, H.J. Lu wrote:
>>> On Fri, Sep 4, 2020 at 11:09 AM Segher Boessenkool
>>> <segher@kernel.crashing.org> wrote:
>>>> On Fri, Sep 04, 2020 at 10:34:23AM -0700, H.J. Lu wrote:
>>>>>> You probably have to do this for every target separately?  But it is not
>>>>>> enough to handle it in the epilogue, you also need to make sure it is
>>>>>> done on every path that returns *without* epilogue.
>>>>> 
>>>>> This feature is designed for normal return with epilogue.
>>>> 
>>>> Very many normal returns do *not* pass through an epilogue, but are
>>>> simple_return.  Disabling that is *much* more expensive than that 2%.
>>> 
>>> Sibcall isn't covered.  What other cases don't have an epilogue?
>> 
>> Shrink-wrapped stuff.  Quite important for performance.  Not something
>> you can throw away.
>> 
> 
> Qing, can you check how it interacts with shrink-wrap?

We have some discussion on shrink-wrapping previously.  And we agreed on  the following at that time:

"Shrink-wrapping often deals with the non-volatile registers, so that
doesn't matter much for this patch series.”

On the other hand, we deal with volatile registers in this patch, so from the registers point of view, there is NO overlap between this
Patch and the shrink-wrapping. 

So, what’s the other possible issues when this patch interacting with shrink-wrapping?

When I checked the gcc source code on shrink-wrapping as following (gcc/function.c):


…….
  rtx_insn *epilogue_seq = make_epilogue_seq ();

  /* Try to perform a kind of shrink-wrapping, making sure the
     prologue/epilogue is emitted only around those parts of the
     function that require it.  */
  try_shrink_wrapping (&entry_edge, prologue_seq);

  /* If the target can handle splitting the prologue/epilogue into separate
     components, try to shrink-wrap these components separately.  */
  try_shrink_wrapping_separate (entry_edge->dest);

  /* If that did anything for any component we now need the generate the
     "main" prologue again.  Because some targets require some of these
     to be called in a specific order (i386 requires the split prologue
     to be first, for example), we create all three sequences again here.
     If this does not work for some target, that target should not enable
     separate shrink-wrapping.  */
  if (crtl->shrink_wrapped_separate)
    {
      split_prologue_seq = make_split_prologue_seq ();
      prologue_seq = make_prologue_seq ();
      epilogue_seq = make_epilogue_seq ();
    }
…….

My understanding from the above is:

1. “try_shrink_wrapping” should NOT interact with make_epilogue_seq since only “prologue_seq” will not touched. 
2. “try_shrink_wrapping_seperate”  might interact with epilogue, however, if there is anything changed with “try_shrink_wrapping_seperate”,
    make_epilogue_seq() will be called again, and then the zeroing sequence will be generated still at the end of the routine. 

So, from the above, I didn’t see any obvious issues.

But I might miss some important  issues here, please let me know what I am missing here?

Thanks a lot for any help.

Qing



> 
> -- 
> H.J.


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-08 15:05                                               ` Patrick McGehearty
@ 2020-09-10 12:11                                                 ` Richard Sandiford
  2020-09-10 14:34                                                   ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Richard Sandiford @ 2020-09-10 12:11 UTC (permalink / raw)
  To: Patrick McGehearty via Gcc-patches

Patrick McGehearty via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> My understanding is this feature/flag is not intended to be "default on".
> It is intended to be used in security sensitive environments such
> as the Linux kernel where it was requested by kernel security experts.
> I'm not understanding the objection here if the feature is requested
> by security teams and the average cost is modest.

Agreed.  And of course, “is modest” here means “is modest in the eyes
of the people who want to use it”.

IMO it's been established at this point that the feature is useful
enough to some people.  It might be too expensive for others,
but that's OK.

I've kind-of lost track of where we stand given all the subthreads.
If we've now decided which suboptions we want to support, would it
make sense to start a new thread with the current patch, and then
just concentrate on code review for that subthread?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-10 12:11                                                 ` Richard Sandiford
@ 2020-09-10 14:34                                                   ` Qing Zhao
  2020-09-10 14:59                                                     ` Rodriguez Bahena, Victor
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-10 14:34 UTC (permalink / raw)
  To: Richard Sandiford, kees Cook, victor Rodriguez Bahena
  Cc: Patrick McGehearty via Gcc-patches

Richard,

Thank you!

> On Sep 10, 2020, at 7:11 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Patrick McGehearty via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> My understanding is this feature/flag is not intended to be "default on".
>> It is intended to be used in security sensitive environments such
>> as the Linux kernel where it was requested by kernel security experts.
>> I'm not understanding the objection here if the feature is requested
>> by security teams and the average cost is modest.
> 
> Agreed.  And of course, “is modest” here means “is modest in the eyes
> of the people who want to use it”.
> 
> IMO it's been established at this point that the feature is useful
> enough to some people.  It might be too expensive for others,
> but that's OK.
> 
> I've kind-of lost track of where we stand given all the subthreads.
> If we've now decided which suboptions we want to support,

From the performance data, we saw that clearing ALL registers cost too much more without any additional benefit, so, I’d like to delete all those sub-options including “ALL”, i.e, all-arg, all-gpr, all.

Now, the option will be:

-fzero-call-used-regs=skip|gpr-arg|all-arg|gpr|all

Add -fzero-call-used-regs=[skip|gpr-arg|all-arg|gpr|all] command-line option
and
zero_call_used_regs("skip|gpr-arg|all-arg|gpr|all") function attribues:

    1. -mzero-call-used-regs=skip and zero_call_used_regs("skip")

    Don't zero call-used registers upon function return. This is the default behavior.

    2. -mzero-call-used-regs=gpr-arg and zero_call_used_regs("gpr-arg")

    Upon function return,  zero call-used general purpose registers that are used in the routine and might pass parameters.

    3. -mzero-call-used-regs=used-arg and zero_call_used_regs(“all-arg")

    Upon function return, zero call-used registers that are used in the routine and might pass parameters.
    4. -mzero-call-used-regs=used-gpr and zero_call_used_regs("gpr")

    Upon function return, zero call-used general purpose registers that are used in the routine.

    5. -mzero-call-used-regs=used and zero_call_used_regs(“all")

    Upon function return, zero call-used registers that are used in the routine.

Let me know any objection or comment. 

> would it
> make sense to start a new thread with the current patch, and then
> just concentrate on code review for that subthread?

I will start the new thread after my new patch is ready.

Thanks again.

Qing
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-10 14:34                                                   ` Qing Zhao
@ 2020-09-10 14:59                                                     ` Rodriguez Bahena, Victor
  0 siblings, 0 replies; 188+ messages in thread
From: Rodriguez Bahena, Victor @ 2020-09-10 14:59 UTC (permalink / raw)
  To: Qing Zhao, Richard Sandiford, kees Cook
  Cc: Patrick McGehearty via Gcc-patches



-----Original Message-----
From: Qing Zhao <QING.ZHAO@ORACLE.COM>
Date: Thursday, September 10, 2020 at 9:34 AM
To: Richard Sandiford <richard.sandiford@arm.com>, kees Cook <keescook@chromium.org>, "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com>
Cc: Patrick McGehearty via Gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

    Richard,

    Thank you!

    > On Sep 10, 2020, at 7:11 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
    > 
    > Patrick McGehearty via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
    >> My understanding is this feature/flag is not intended to be "default on".
    >> It is intended to be used in security sensitive environments such
    >> as the Linux kernel where it was requested by kernel security experts.
    >> I'm not understanding the objection here if the feature is requested
    >> by security teams and the average cost is modest.
    > 
    > Agreed.  And of course, “is modest” here means “is modest in the eyes
    > of the people who want to use it”.
    > 
    > IMO it's been established at this point that the feature is useful
    > enough to some people.  It might be too expensive for others,
    > but that's OK.
    > 
    > I've kind-of lost track of where we stand given all the subthreads.
    > If we've now decided which suboptions we want to support,

    From the performance data, we saw that clearing ALL registers cost too much more without any additional benefit, so, I’d like to delete all those sub-options including “ALL”, i.e, all-arg, all-gpr, all.

    Now, the option will be:

    -fzero-call-used-regs=skip|gpr-arg|all-arg|gpr|all

    Add -fzero-call-used-regs=[skip|gpr-arg|all-arg|gpr|all] command-line option
    and
    zero_call_used_regs("skip|gpr-arg|all-arg|gpr|all") function attribues:

        1. -mzero-call-used-regs=skip and zero_call_used_regs("skip")

        Don't zero call-used registers upon function return. This is the default behavior.

        2. -mzero-call-used-regs=gpr-arg and zero_call_used_regs("gpr-arg")

        Upon function return,  zero call-used general purpose registers that are used in the routine and might pass parameters.

        3. -mzero-call-used-regs=used-arg and zero_call_used_regs(“all-arg")

        Upon function return, zero call-used registers that are used in the routine and might pass parameters.
        4. -mzero-call-used-regs=used-gpr and zero_call_used_regs("gpr")

        Upon function return, zero call-used general purpose registers that are used in the routine.

        5. -mzero-call-used-regs=used and zero_call_used_regs(“all")

        Upon function return, zero call-used registers that are used in the routine.

    Let me know any objection or comment. 

+1

    > would it
    > make sense to start a new thread with the current patch, and then
    > just concentrate on code review for that subthread?

    I will start the new thread after my new patch is ready.

    Thanks again.

    Qing
    > 
    > Thanks,
    > Richard



^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-08 15:00                                                 ` Qing Zhao
@ 2020-09-10 19:07                                                   ` Kees Cook
  2020-09-10 22:40                                                     ` Qing Zhao
  2020-09-11 10:06                                                     ` Richard Sandiford
  0 siblings, 2 replies; 188+ messages in thread
From: Kees Cook @ 2020-09-10 19:07 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Rodriguez Bahena, Victor, Segher Boessenkool, Jakub Jelinek,
	Uros Bizjak, GCC Patches

[tried to clean up quoting...]

On Tue, Sep 08, 2020 at 10:00:09AM -0500, Qing Zhao wrote:
> 
> > On Sep 7, 2020, at 8:06 AM, Rodriguez Bahena, Victor <victor.rodriguez.bahena@intel.com> wrote:
> > 
> >>> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
> >>> So, my question is:
> >>>
> >>> From the security point of view, does clearing ALL registers have more benefit than clearing USED registers?  
> >>> From my understanding, clearing registers that are not used in the current routine does NOT provide additional benefit, correct me if I am wrong here.
> >  
> > You are right, it does not provide additional security
> 
> Then, is it necessary to provide 
> 
> -fzero-call-used-regs=all-arg|all-gpr|all   to the user?
> 
> Can we just delete these 3 sub options?

Well... I'd say there is some benefit (remember that ROP gadgets are
built from function trailers, so there is rarely a concern over what the
rest of the function is doing). Generally, they are chained together
based on just the last couple instructions:

 *useful action*
 *ret*

So with ...=used this turns into:

 *useful action*
 *clear some registers*
 *ret*

Which may still be helpful (if, for example, the state being built by
the attacker is using registers _not_ in the cleared list). However:

 *useful action*
 *clear all registers*
 *ret*

Means that suddenly the ROP chain cannot use *any* of the caller-saved
registers to hold state.

So, while ...=used is likely going to block a lot, ...=all will block
even more. I'd prefer to have both available, if for no other reason
than to compare the ROP gadget availability for any given binary (e.g.
if some future attack is found that bypasses ...=used, does it also
bypass ...=all?)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-08 14:55                                                   ` Qing Zhao
@ 2020-09-10 21:56                                                     ` Segher Boessenkool
  0 siblings, 0 replies; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-10 21:56 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Kees Cook, Jakub Jelinek, Uros Bizjak, victor Rodriguez Bahena,
	GCC Patches

On Tue, Sep 08, 2020 at 09:55:19AM -0500, Qing Zhao wrote:
> Downloading this paper form IEEE needs a fee.

Yes, and we cannot discuss it here.

> What other information you need to show the effective of mitigation ROP attack?

Anything that we *can* talk about.  Stuff we cannot talk about does not
let us progress in one way or the other.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-08 16:43                                                   ` Qing Zhao
@ 2020-09-10 22:05                                                     ` Segher Boessenkool
  2020-09-10 22:50                                                       ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-10 22:05 UTC (permalink / raw)
  To: Qing Zhao
  Cc: H.J. Lu, Rodriguez Bahena, Victor, Richard Biener, Jeff Law,
	Uros Bizjak, Jakub Jelinek, GCC Patches, Kees Cook

On Tue, Sep 08, 2020 at 11:43:30AM -0500, Qing Zhao wrote:
> > On Sep 7, 2020, at 10:58 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> > On Mon, Sep 7, 2020 at 7:06 AM Segher Boessenkool
> > <segher@kernel.crashing.org <mailto:segher@kernel.crashing.org>> wrote:
> >> On Fri, Sep 04, 2020 at 11:52:13AM -0700, H.J. Lu wrote:
> >>> On Fri, Sep 4, 2020 at 11:09 AM Segher Boessenkool
> >>> <segher@kernel.crashing.org> wrote:
> >>>> Very many normal returns do *not* pass through an epilogue, but are
> >>>> simple_return.  Disabling that is *much* more expensive than that 2%.
> >>> 
> >>> Sibcall isn't covered.  What other cases don't have an epilogue?
> >> 
> >> Shrink-wrapped stuff.  Quite important for performance.  Not something
> >> you can throw away.
> > 
> > Qing, can you check how it interacts with shrink-wrap?
> >> 

<snip>

> But I might miss some important  issues here, please let me know what I am missing here?

Start looking at handle_simple_exit()?  targetm.gen_simple_return()...


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-10 19:07                                                   ` Kees Cook
@ 2020-09-10 22:40                                                     ` Qing Zhao
  2020-09-11 10:06                                                     ` Richard Sandiford
  1 sibling, 0 replies; 188+ messages in thread
From: Qing Zhao @ 2020-09-10 22:40 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rodriguez Bahena, Victor, Segher Boessenkool, Jakub Jelinek,
	Uros Bizjak, GCC Patches



> On Sep 10, 2020, at 2:07 PM, Kees Cook <keescook@chromium.org> wrote:
> 
> [tried to clean up quoting...]
> 
> On Tue, Sep 08, 2020 at 10:00:09AM -0500, Qing Zhao wrote:
>> 
>>> On Sep 7, 2020, at 8:06 AM, Rodriguez Bahena, Victor <victor.rodriguez.bahena@intel.com> wrote:
>>> 
>>>>> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
>>>>> So, my question is:
>>>>> 
>>>>> From the security point of view, does clearing ALL registers have more benefit than clearing USED registers?  
>>>>> From my understanding, clearing registers that are not used in the current routine does NOT provide additional benefit, correct me if I am wrong here.
>>> 
>>> You are right, it does not provide additional security
>> 
>> Then, is it necessary to provide 
>> 
>> -fzero-call-used-regs=all-arg|all-gpr|all   to the user?
>> 
>> Can we just delete these 3 sub options?
> 
> Well... I'd say there is some benefit (remember that ROP gadgets are
> built from function trailers, so there is rarely a concern over what the
> rest of the function is doing). Generally, they are chained together
> based on just the last couple instructions:
> 
> *useful action*
> *ret*
> 
> So with ...=used this turns into:
> 
> *useful action*
> *clear some registers*
> *ret*
> 
> Which may still be helpful (if, for example, the state being built by
> the attacker is using registers _not_ in the cleared list). However:
> 
> *useful action*
> *clear all registers*
> *ret*
> 
> Means that suddenly the ROP chain cannot use *any* of the caller-saved
> registers to hold state.
> 
> So, while ...=used is likely going to block a lot, ...=all will block
> even more. I'd prefer to have both available,

Okay. I am fine with this. 

My biggest concern is the much bigger run-time overhead from zeroing those unused-registers.
Might need to mention the big run-time overhead in the users’s manual.

Qing
> if for no other reason
> than to compare the ROP gadget availability for any given binary (e.g.
> if some future attack is found that bypasses ...=used, does it also
> bypass ...=all?)
> 
> -- 
> Kees Cook


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-10 22:05                                                     ` Segher Boessenkool
@ 2020-09-10 22:50                                                       ` Qing Zhao
  2020-09-11 17:18                                                         ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-10 22:50 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: H.J. Lu, Rodriguez Bahena, Victor, Richard Biener, Jeff Law,
	Uros Bizjak, Jakub Jelinek, GCC Patches, Kees Cook



> On Sep 10, 2020, at 5:05 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Tue, Sep 08, 2020 at 11:43:30AM -0500, Qing Zhao wrote:
>>> On Sep 7, 2020, at 10:58 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Mon, Sep 7, 2020 at 7:06 AM Segher Boessenkool
>>> <segher@kernel.crashing.org <mailto:segher@kernel.crashing.org>> wrote:
>>>> On Fri, Sep 04, 2020 at 11:52:13AM -0700, H.J. Lu wrote:
>>>>> On Fri, Sep 4, 2020 at 11:09 AM Segher Boessenkool
>>>>> <segher@kernel.crashing.org> wrote:
>>>>>> Very many normal returns do *not* pass through an epilogue, but are
>>>>>> simple_return.  Disabling that is *much* more expensive than that 2%.
>>>>> 
>>>>> Sibcall isn't covered.  What other cases don't have an epilogue?
>>>> 
>>>> Shrink-wrapped stuff.  Quite important for performance.  Not something
>>>> you can throw away.
>>> 
>>> Qing, can you check how it interacts with shrink-wrap?
>>>> 
> 
> <snip>
> 
>> But I might miss some important  issues here, please let me know what I am missing here?
> 
> Start looking at handle_simple_exit()?  targetm.gen_simple_return()…

Yes, I have been looking at this since this morning. 
You are right, we also need to insert zeroing sequence before  this simple_return which the current patch missed.

I am currently try to resolve this issue with the following idea:

In the routine “thread_prologue_and_epilogue_insns”,  After both “make_epilogue_seq” and “try_shrink_wrapping” finished, 

Scan every exit block to see whether the last insn is a ANY_RETURN_P(insn), 
If YES, generate the zero sequence before this RETURN insn. 

Then we should take care all the exit path that returns.

Do you see any issue from this idea? 

Thanks a lot for your help.

Qing

> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-10 19:07                                                   ` Kees Cook
  2020-09-10 22:40                                                     ` Qing Zhao
@ 2020-09-11 10:06                                                     ` Richard Sandiford
  2020-09-11 16:14                                                       ` Segher Boessenkool
  1 sibling, 1 reply; 188+ messages in thread
From: Richard Sandiford @ 2020-09-11 10:06 UTC (permalink / raw)
  To: Kees Cook via Gcc-patches
  Cc: Qing Zhao, Kees Cook, Jakub Jelinek, Uros Bizjak,
	Rodriguez Bahena, Victor, Segher Boessenkool

Kees Cook via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> [tried to clean up quoting...]
>
> On Tue, Sep 08, 2020 at 10:00:09AM -0500, Qing Zhao wrote:
>> 
>> > On Sep 7, 2020, at 8:06 AM, Rodriguez Bahena, Victor <victor.rodriguez.bahena@intel.com> wrote:
>> > 
>> >>> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
>> >>> So, my question is:
>> >>>
>> >>> From the security point of view, does clearing ALL registers have more benefit than clearing USED registers?  
>> >>> From my understanding, clearing registers that are not used in the current routine does NOT provide additional benefit, correct me if I am wrong here.
>> >  
>> > You are right, it does not provide additional security
>> 
>> Then, is it necessary to provide 
>> 
>> -fzero-call-used-regs=all-arg|all-gpr|all   to the user?
>> 
>> Can we just delete these 3 sub options?
>
> Well... I'd say there is some benefit (remember that ROP gadgets are
> built from function trailers, so there is rarely a concern over what the
> rest of the function is doing). Generally, they are chained together
> based on just the last couple instructions:
>
>  *useful action*
>  *ret*
>
> So with ...=used this turns into:
>
>  *useful action*
>  *clear some registers*
>  *ret*
>
> Which may still be helpful (if, for example, the state being built by
> the attacker is using registers _not_ in the cleared list). However:
>
>  *useful action*
>  *clear all registers*
>  *ret*
>
> Means that suddenly the ROP chain cannot use *any* of the caller-saved
> registers to hold state.
>
> So, while ...=used is likely going to block a lot, ...=all will block
> even more. I'd prefer to have both available, if for no other reason
> than to compare the ROP gadget availability for any given binary (e.g.
> if some future attack is found that bypasses ...=used, does it also
> bypass ...=all?)

This might have already been discussed/answered, sorry, but:
when there's a choice, is there an obvious winner between:

(1) clearing call-clobbered registers and then restoring call-preserved ones
(2) restoring call-preserved registers and then clearing call-clobbered ones
    
Is one option more likely to be useful to attackers than the other?

(For some frames, it might be necessary to use a small number of
call-clobbered registers to perform the restore sequence, so (1)
wouldn't be fully achievable in all cases.)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 10:06                                                     ` Richard Sandiford
@ 2020-09-11 16:14                                                       ` Segher Boessenkool
  2020-09-11 16:52                                                         ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-11 16:14 UTC (permalink / raw)
  To: Kees Cook via Gcc-patches, Qing Zhao, Kees Cook, Jakub Jelinek,
	Uros Bizjak, Rodriguez Bahena, Victor, richard.sandiford

On Fri, Sep 11, 2020 at 11:06:03AM +0100, Richard Sandiford wrote:
> This might have already been discussed/answered, sorry, but:
> when there's a choice, is there an obvious winner between:
> 
> (1) clearing call-clobbered registers and then restoring call-preserved ones
> (2) restoring call-preserved registers and then clearing call-clobbered ones
>     
> Is one option more likely to be useful to attackers than the other?
> 
> (For some frames, it might be necessary to use a small number of
> call-clobbered registers to perform the restore sequence, so (1)
> wouldn't be fully achievable in all cases.)

The same is true for what you have to do *after* restoring registers, as
I said before.  Clearing all is not correct in all cases, and also it is
not useful in all cases (code right after it might write the registers
again.

This really is very (sub-)target-specific, it cannot be done by generic
code on its own *at all*.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 16:14                                                       ` Segher Boessenkool
@ 2020-09-11 16:52                                                         ` Qing Zhao
  2020-09-11 17:13                                                           ` Segher Boessenkool
  2020-09-11 17:32                                                           ` Richard Sandiford
  0 siblings, 2 replies; 188+ messages in thread
From: Qing Zhao @ 2020-09-11 16:52 UTC (permalink / raw)
  To: Segher Boessenkool, Richard Sandiford
  Cc: Kees Cook via Gcc-patches, Kees Cook, Jakub Jelinek, Uros Bizjak,
	Rodriguez Bahena, Victor, richard.sandiford



> On Sep 11, 2020, at 11:14 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Sep 11, 2020 at 11:06:03AM +0100, Richard Sandiford wrote:
>> This might have already been discussed/answered, sorry, but:
>> when there's a choice, is there an obvious winner between:
>> 
>> (1) clearing call-clobbered registers and then restoring call-preserved ones
>> (2) restoring call-preserved registers and then clearing call-clobbered ones
>> 
>> Is one option more likely to be useful to attackers than the other?

for mitigating ROP purpose, I think that (2) is better than (1). i.e, the clearing
call-clobbered register sequence will be immediately before “ret” instruction. 
This will prevent the gadget from doing any useful things.

>> 
>> (For some frames, it might be necessary to use a small number of
>> call-clobbered registers to perform the restore sequence, so (1)
>> wouldn't be fully achievable in all cases.)
> 

Yes, looks like that (1) is also not correct.

> The same is true for what you have to do *after* restoring registers, as
> I said before.  Clearing all is not correct in all cases, and also it is
> not useful in all cases (code right after it might write the registers
> again.

I don’t understand why it’s not correct if we clearing call-clobbered registers 
AFTER restoring call-preserved registers?

Even though we might need to use some call-clobbered registers to restore 
the call-preserved registers, after the restoring is done, we can use data flow
to make sure the call-clobbered registers not lived at that point anymore, then
Clearing those not-lived call-clobbered registers immediately before “ret”.

For me, this should be correct. 

Let me know anything I am missing here.

Thanks.

Qing



> 
> This really is very (sub-)target-specific, it cannot be done by generic
> code on its own *at all*.
> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 16:52                                                         ` Qing Zhao
@ 2020-09-11 17:13                                                           ` Segher Boessenkool
  2020-09-11 19:40                                                             ` Qing Zhao
  2020-09-11 17:32                                                           ` Richard Sandiford
  1 sibling, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-11 17:13 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Sandiford, Kees Cook via Gcc-patches, Kees Cook,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

On Fri, Sep 11, 2020 at 11:52:29AM -0500, Qing Zhao wrote:
> I don’t understand why it’s not correct if we clearing call-clobbered registers 
> AFTER restoring call-preserved registers?

Because the compiler backend (or the linker!  Or the dynamic linker!
Etc.) can use volatile registers for their own purposes.

Like, on Power, r11 and r12 are used for various calling convention
purposes; they are also used for other purposes; and r0 is used as some
all-purpose volatile (it typically holds the return address near the
end of a function).

"Call-clobbered" is pretty meaningless.  It only holds meaning for a
function calling another, and then only to know which registers lose
their value then.  It has no semantics for other cases, like a function
that will return soonish, as here.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-10 22:50                                                       ` Qing Zhao
@ 2020-09-11 17:18                                                         ` Segher Boessenkool
  2020-09-11 19:53                                                           ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-11 17:18 UTC (permalink / raw)
  To: Qing Zhao
  Cc: H.J. Lu, Rodriguez Bahena, Victor, Richard Biener, Jeff Law,
	Uros Bizjak, Jakub Jelinek, GCC Patches, Kees Cook

On Thu, Sep 10, 2020 at 05:50:40PM -0500, Qing Zhao wrote:
> >>>> Shrink-wrapped stuff.  Quite important for performance.  Not something
> >>>> you can throw away.

^^^ !!! ^^^

> > Start looking at handle_simple_exit()?  targetm.gen_simple_return()…
> 
> Yes, I have been looking at this since this morning. 
> You are right, we also need to insert zeroing sequence before  this simple_return which the current patch missed.

Please run the performance loss numbers again after you have something
more realistic :-(

> I am currently try to resolve this issue with the following idea:
> 
> In the routine “thread_prologue_and_epilogue_insns”,  After both “make_epilogue_seq” and “try_shrink_wrapping” finished, 
> 
> Scan every exit block to see whether the last insn is a ANY_RETURN_P(insn), 
> If YES, generate the zero sequence before this RETURN insn. 
> 
> Then we should take care all the exit path that returns.
> 
> Do you see any issue from this idea? 

You need to let the backend decide what to do, for this as well as for
all other cases.  I do not know how often I will have to repeat that.

There also is separate shrink-wrapping, which you haven't touched on at
all yet.  Joy.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 16:52                                                         ` Qing Zhao
  2020-09-11 17:13                                                           ` Segher Boessenkool
@ 2020-09-11 17:32                                                           ` Richard Sandiford
  2020-09-11 20:01                                                             ` Segher Boessenkool
  2020-09-11 20:14                                                             ` Qing Zhao
  1 sibling, 2 replies; 188+ messages in thread
From: Richard Sandiford @ 2020-09-11 17:32 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Segher Boessenkool, Kees Cook via Gcc-patches, Kees Cook,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 11, 2020, at 11:14 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> 
>> On Fri, Sep 11, 2020 at 11:06:03AM +0100, Richard Sandiford wrote:
>>> This might have already been discussed/answered, sorry, but:
>>> when there's a choice, is there an obvious winner between:
>>> 
>>> (1) clearing call-clobbered registers and then restoring call-preserved ones
>>> (2) restoring call-preserved registers and then clearing call-clobbered ones
>>> 
>>> Is one option more likely to be useful to attackers than the other?
>
> for mitigating ROP purpose, I think that (2) is better than (1). i.e, the clearing
> call-clobbered register sequence will be immediately before “ret” instruction. 
> This will prevent the gadget from doing any useful things.

OK.  The reason I was asking was that (from the naive perspective of
someone not well versed in this stuff): if the effect of one of the
register restores is itself a useful gadget, the clearing wouldn't
protect against it.  But if the register restores are not part of the
intended effect, it seemed that having them immediately before the
ret might make the gadget harder to use than clearing registers would,
because the side-effects of restores would be harder to control than the
(predictable) effect of clearing registers.

But like I say, this is very much not my area of expertise, so that's
probably missing the point in a major way. ;-)

I think the original patch plugged into pass_thread_prologue_and_epilogue,
is that right?  If we go for (2), then I think it would be better to do
it at the start of pass_late_compilation instead.  (Some targets wouldn't
cope with doing it later.)  The reason for doing it so late is that the
set of used “volatile”/caller-saved registers is not fixed at prologue
and epilogue generation: later optimisation passes can introduce uses
of volatile registers that weren't used previously.  (Sorry if this
has already been suggested.)

Unlike Segher, I think this can/should be done in target-independent
code as far as possible (like the patch seemed to do).

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 17:13                                                           ` Segher Boessenkool
@ 2020-09-11 19:40                                                             ` Qing Zhao
  2020-09-11 20:05                                                               ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-11 19:40 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Richard Sandiford, Kees Cook via Gcc-patches, Kees Cook,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor



> On Sep 11, 2020, at 12:13 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Sep 11, 2020 at 11:52:29AM -0500, Qing Zhao wrote:
>> I don’t understand why it’s not correct if we clearing call-clobbered registers 
>> AFTER restoring call-preserved registers?
> 
> Because the compiler backend (or the linker!  Or the dynamic linker!
> Etc.) can use volatile registers for their own purposes.

For the following sequence at the end of a routine:

*...*
“restore call-preserved registers”
*clear call-clobbered registers"*
*ret*

“Clear call-clobbered registers” will only clear the call-clobbered registers that are not live at the end of the routine.
If the call-clobbered register is live at the end of the routine, for example, holding the return value,
It will NOT be cleared at all.  

If the call-clobbered register has some other usage after the routine return, then the backend should know this and will not
clear it. Then we will resolve this issue, right?


> 
> Like, on Power, r11 and r12 are used for various calling convention
> purposes; they are also used for other purposes; and r0 is used as some
> all-purpose volatile (it typically holds the return address near the
> end of a function).

In the new version of the patch,  the implementation of clearing call-clobbered registers is done in backend, middle end only 
computes a hard register set based on user option, source attribute, data flow information, and function abi information, and
Then pass this hard register set to the target hook to generate the clearing sequence.  The backend will have all the details
on the special situations you mentioned. 

Let me know any more concerns here.

thanks.

Qing

> 
> "Call-clobbered" is pretty meaningless.  It only holds meaning for a
> function calling another, and then only to know which registers lose
> their value then.  It has no semantics for other cases, like a function
> that will return soonish, as here.
> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 17:18                                                         ` Segher Boessenkool
@ 2020-09-11 19:53                                                           ` Qing Zhao
  0 siblings, 0 replies; 188+ messages in thread
From: Qing Zhao @ 2020-09-11 19:53 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: H.J. Lu, Rodriguez Bahena, Victor, Richard Biener, Jeff Law,
	Uros Bizjak, Jakub Jelinek, GCC Patches, Kees Cook



> On Sep 11, 2020, at 12:18 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Thu, Sep 10, 2020 at 05:50:40PM -0500, Qing Zhao wrote:
>>>>>> Shrink-wrapped stuff.  Quite important for performance.  Not something
>>>>>> you can throw away.
> 
> ^^^ !!! ^^^
> 
>>> Start looking at handle_simple_exit()?  targetm.gen_simple_return()…
>> 
>> Yes, I have been looking at this since this morning. 
>> You are right, we also need to insert zeroing sequence before  this simple_return which the current patch missed.
> 
> Please run the performance loss numbers again after you have something
> more realistic :-(

Yes, I will collect the performance data with the new patch. 

> 
>> I am currently try to resolve this issue with the following idea:
>> 
>> In the routine “thread_prologue_and_epilogue_insns”,  After both “make_epilogue_seq” and “try_shrink_wrapping” finished, 
>> 
>> Scan every exit block to see whether the last insn is a ANY_RETURN_P(insn), 
>> If YES, generate the zero sequence before this RETURN insn. 
>> 
>> Then we should take care all the exit path that returns.
>> 
>> Do you see any issue from this idea? 
> 
> You need to let the backend decide what to do, for this as well as for
> all other cases.  I do not know how often I will have to repeat that.

Yes, the new patch will separate the whole task into two parts:

A. Compute the hard register set based on user option, source code attribute, data flow information, function abi information, 
     The result will be “need_zeroed_register_set”, and then pass this hard reg set to the target hook.
B. Each target will have it’s own implementation of emitting the zeroing sequence based on the “need_zeroed_register_set”.


> 
> There also is separate shrink-wrapping, which you haven't touched on at
> all yet.  Joy.

Yes, in addition to shrink-wrapping, I also noticed that there are other places that generate “simple_return” or “return” that are not in
The epilogue, for example, in “dbr” phase (delay_slots phase), in “mach” phase (machine reorg phase), etc. 

So, only generate zeroing sequence in epilogue is not enough. 

Hongjiu and I discussed this more, and we came up with a new implementation, I will describe this new implementation in another email later.

Thanks.

Qing
> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 17:32                                                           ` Richard Sandiford
@ 2020-09-11 20:01                                                             ` Segher Boessenkool
  2020-09-11 20:14                                                             ` Qing Zhao
  1 sibling, 0 replies; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-11 20:01 UTC (permalink / raw)
  To: Qing Zhao, Kees Cook via Gcc-patches, Kees Cook, Jakub Jelinek,
	Uros Bizjak, Rodriguez Bahena, Victor, richard.sandiford

On Fri, Sep 11, 2020 at 06:32:56PM +0100, Richard Sandiford wrote:
> Unlike Segher, I think this can/should be done in target-independent
> code as far as possible (like the patch seemed to do).

My problem with that is that it is both incorrect *and* inefficient.  It
writes registers it should not touch; and some of those will be written
with other values later again anyway; and if the goal is to clear as
many parameter passing registers as possible, so why does it touch
others at all?  This makes no sense.

Only the backend knows which registers it can write when.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 19:40                                                             ` Qing Zhao
@ 2020-09-11 20:05                                                               ` Segher Boessenkool
  2020-09-11 20:17                                                                 ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-11 20:05 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Sandiford, Kees Cook via Gcc-patches, Kees Cook,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

On Fri, Sep 11, 2020 at 02:40:06PM -0500, Qing Zhao wrote:
> > On Sep 11, 2020, at 12:13 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > On Fri, Sep 11, 2020 at 11:52:29AM -0500, Qing Zhao wrote:
> >> I don’t understand why it’s not correct if we clearing call-clobbered registers 
> >> AFTER restoring call-preserved registers?
> > 
> > Because the compiler backend (or the linker!  Or the dynamic linker!
> > Etc.) can use volatile registers for their own purposes.
> 
> For the following sequence at the end of a routine:
> 
> *...*
> “restore call-preserved registers”
> *clear call-clobbered registers"*
> *ret*
> 
> “Clear call-clobbered registers” will only clear the call-clobbered registers that are not live at the end of the routine.

And they can be written again right after the routine, by linker-
generated code for example.  This is a waste.

> In the new version of the patch,  the implementation of clearing call-clobbered registers is done in backend, middle end only 
> computes a hard register set based on user option, source attribute, data flow information, and function abi information, and
> Then pass this hard register set to the target hook to generate the clearing sequence.  The backend will have all the details
> on the special situations you mentioned. 
> 
> Let me know any more concerns here.

I cannot find that patch?


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 17:32                                                           ` Richard Sandiford
  2020-09-11 20:01                                                             ` Segher Boessenkool
@ 2020-09-11 20:14                                                             ` Qing Zhao
  2020-09-11 21:03                                                               ` Segher Boessenkool
  2020-09-11 21:44                                                               ` Richard Sandiford
  1 sibling, 2 replies; 188+ messages in thread
From: Qing Zhao @ 2020-09-11 20:14 UTC (permalink / raw)
  To: Richard Sandiford, Segher Boessenkool, Kees Cook
  Cc: Kees Cook via Gcc-patches, Kees Cook, Jakub Jelinek, Uros Bizjak,
	Rodriguez Bahena, Victor



> On Sep 11, 2020, at 12:32 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> On Sep 11, 2020, at 11:14 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> 
>>> On Fri, Sep 11, 2020 at 11:06:03AM +0100, Richard Sandiford wrote:
>>>> This might have already been discussed/answered, sorry, but:
>>>> when there's a choice, is there an obvious winner between:
>>>> 
>>>> (1) clearing call-clobbered registers and then restoring call-preserved ones
>>>> (2) restoring call-preserved registers and then clearing call-clobbered ones
>>>> 
>>>> Is one option more likely to be useful to attackers than the other?
>> 
>> for mitigating ROP purpose, I think that (2) is better than (1). i.e, the clearing
>> call-clobbered register sequence will be immediately before “ret” instruction. 
>> This will prevent the gadget from doing any useful things.
> 
> OK.  The reason I was asking was that (from the naive perspective of
> someone not well versed in this stuff): if the effect of one of the
> register restores is itself a useful gadget, the clearing wouldn't
> protect against it.  But if the register restores are not part of the
> intended effect, it seemed that having them immediately before the
> ret might make the gadget harder to use than clearing registers would,
> because the side-effects of restores would be harder to control than the
> (predictable) effect of clearing registers.
> 
> But like I say, this is very much not my area of expertise, so that's
> probably missing the point in a major way. ;-)

I am not an expert on the security area either. :-)

My understanding of how this scheme helps ROP is:  the attacker usually uses scratch register to pass
parameters to the sys call in the gadget, if clearing the scratch registers immediately before “ret”, then 
The parameters that are passed to sys call will be destroyed, therefore, the attack will likely failed.

So, clearing the scratch registers immediately before “ret” will be very helpful to mitigate ROP.

> 
> I think the original patch plugged into pass_thread_prologue_and_epilogue,
> is that right?

Yes.

>  If we go for (2), then I think it would be better to do
> it at the start of pass_late_compilation instead.  (Some targets wouldn't
> cope with doing it later.)  The reason for doing it so late is that the
> set of used “volatile”/caller-saved registers is not fixed at prologue
> and epilogue generation: later optimisation passes can introduce uses
> of volatile registers that weren't used previously.  (Sorry if this
> has already been suggested.)

Yes, I agree.

I thought that it might be better to move this task at the very late of the RTL stage, i.e, before “final” phase. 

Another solution is (discussed with Hongjiu):

1. Define a new target hook:

targetm.return_with_zeroing(bool simple_return_p, HARD_REG_SET need_zeroed_hardregs, bool gpr_only)

2. Add the following routine in middle end:

rtx_insn *
generate_return_rtx (bool simple_return_p)
{
  if (targetm.return_with_zeroing)
    {
      Compute the hardregs set for clearing into “need_zeroed_hardregs”;
     return targetm.return_with_zeroing (simple_return_p, need_zeroed_hardregs, gpr_only);
   }
 else
    {
     if (simple_return_p)
       return targetm.gen_simple_return ( );
    else
       return targetm.gen_return ();
  }
}

Then replace all call to “targetm.gen_simple_return” and “targetm.gen_return” to “generate_return_rtx()”.

3. In the target, 
Implement “return_with_zeroing”.


Let me know your comments on this.

Thanks a lot.

Qing
> 
> Unlike Segher, I think this can/should be done in target-independent
> code as far as possible (like the patch seemed to do).
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 20:05                                                               ` Segher Boessenkool
@ 2020-09-11 20:17                                                                 ` Qing Zhao
  2020-09-11 20:36                                                                   ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-11 20:17 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Richard Sandiford, Kees Cook via Gcc-patches, Kees Cook,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor



> On Sep 11, 2020, at 3:05 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Sep 11, 2020 at 02:40:06PM -0500, Qing Zhao wrote:
>>> On Sep 11, 2020, at 12:13 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> On Fri, Sep 11, 2020 at 11:52:29AM -0500, Qing Zhao wrote:
>>>> I don’t understand why it’s not correct if we clearing call-clobbered registers 
>>>> AFTER restoring call-preserved registers?
>>> 
>>> Because the compiler backend (or the linker!  Or the dynamic linker!
>>> Etc.) can use volatile registers for their own purposes.
>> 
>> For the following sequence at the end of a routine:
>> 
>> *...*
>> “restore call-preserved registers”
>> *clear call-clobbered registers"*
>> *ret*
>> 
>> “Clear call-clobbered registers” will only clear the call-clobbered registers that are not live at the end of the routine.
> 
> And they can be written again right after the routine, by linker-
> generated code for example.  This is a waste.
> 
>> In the new version of the patch,  the implementation of clearing call-clobbered registers is done in backend, middle end only 
>> computes a hard register set based on user option, source attribute, data flow information, and function abi information, and
>> Then pass this hard register set to the target hook to generate the clearing sequence.  The backend will have all the details
>> on the special situations you mentioned. 
>> 
>> Let me know any more concerns here.
> 
> I cannot find that patch?

Haven’t finished yet. -:).

Qing
> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 20:17                                                                 ` Qing Zhao
@ 2020-09-11 20:36                                                                   ` Segher Boessenkool
  2020-09-11 21:12                                                                     ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-11 20:36 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Sandiford, Kees Cook via Gcc-patches, Kees Cook,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

On Fri, Sep 11, 2020 at 03:17:19PM -0500, Qing Zhao wrote:
> > On Sep 11, 2020, at 3:05 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > On Fri, Sep 11, 2020 at 02:40:06PM -0500, Qing Zhao wrote:
> >>> On Sep 11, 2020, at 12:13 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >>> On Fri, Sep 11, 2020 at 11:52:29AM -0500, Qing Zhao wrote:
> >>>> I don’t understand why it’s not correct if we clearing call-clobbered registers 
> >>>> AFTER restoring call-preserved registers?
> >>> 
> >>> Because the compiler backend (or the linker!  Or the dynamic linker!
> >>> Etc.) can use volatile registers for their own purposes.
> >> 
> >> For the following sequence at the end of a routine:
> >> 
> >> *...*
> >> “restore call-preserved registers”
> >> *clear call-clobbered registers"*
> >> *ret*
> >> 
> >> “Clear call-clobbered registers” will only clear the call-clobbered registers that are not live at the end of the routine.
> > 
> > And they can be written again right after the routine, by linker-
> > generated code for example.  This is a waste.
> > 
> >> In the new version of the patch,  the implementation of clearing call-clobbered registers is done in backend, middle end only 
> >> computes a hard register set based on user option, source attribute, data flow information, and function abi information, and
> >> Then pass this hard register set to the target hook to generate the clearing sequence.  The backend will have all the details
> >> on the special situations you mentioned. 
> >> 
> >> Let me know any more concerns here.
> > 
> > I cannot find that patch?
> 
> Haven’t finished yet. -:).

Ah okay :-)

If you have, please send it in a new thread (not as a reply)?  So that
it will be much easirer to handle :-)


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 20:14                                                             ` Qing Zhao
@ 2020-09-11 21:03                                                               ` Segher Boessenkool
  2020-09-11 21:29                                                                 ` Qing Zhao
  2020-09-11 21:44                                                               ` Richard Sandiford
  1 sibling, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-11 21:03 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Sandiford, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Hi!

On Fri, Sep 11, 2020 at 03:14:57PM -0500, Qing Zhao wrote:
> My understanding of how this scheme helps ROP is:  the attacker usually uses scratch register to pass

Help obstruct ROP ;-)

> parameters to the sys call in the gadget, if clearing the scratch registers immediately before “ret”, then 
> The parameters that are passed to sys call will be destroyed, therefore, the attack will likely failed.

But you do not need more than one non-zero argument for execv*, and that
is usually the same register as the normal return value register; all
other registers *should* be zero for a simple execv*("/bin/sh", ...)!

(There is also the system call number register, rax on x86-64, but if
overwriting that would be any effective, you could just do that one
always and everywhere.  This is only an effective defence if there are
no gadgets that do the system call an attacker wants, and he has to
construct that sequence himself; but it very effective and cheap then).


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 20:36                                                                   ` Segher Boessenkool
@ 2020-09-11 21:12                                                                     ` Qing Zhao
  0 siblings, 0 replies; 188+ messages in thread
From: Qing Zhao @ 2020-09-11 21:12 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Kees Cook via Gcc-patches



> On Sep 11, 2020, at 3:36 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Sep 11, 2020 at 03:17:19PM -0500, Qing Zhao wrote:
>>> On Sep 11, 2020, at 3:05 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> On Fri, Sep 11, 2020 at 02:40:06PM -0500, Qing Zhao wrote:
>>>>> On Sep 11, 2020, at 12:13 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>>> On Fri, Sep 11, 2020 at 11:52:29AM -0500, Qing Zhao wrote:
>>>>>> I don’t understand why it’s not correct if we clearing call-clobbered registers 
>>>>>> AFTER restoring call-preserved registers?
>>>>> 
>>>>> Because the compiler backend (or the linker!  Or the dynamic linker!
>>>>> Etc.) can use volatile registers for their own purposes.
>>>> 
>>>> For the following sequence at the end of a routine:
>>>> 
>>>> *...*
>>>> “restore call-preserved registers”
>>>> *clear call-clobbered registers"*
>>>> *ret*
>>>> 
>>>> “Clear call-clobbered registers” will only clear the call-clobbered registers that are not live at the end of the routine.
>>> 
>>> And they can be written again right after the routine, by linker-
>>> generated code for example.  This is a waste.
>>> 
>>>> In the new version of the patch,  the implementation of clearing call-clobbered registers is done in backend, middle end only 
>>>> computes a hard register set based on user option, source attribute, data flow information, and function abi information, and
>>>> Then pass this hard register set to the target hook to generate the clearing sequence.  The backend will have all the details
>>>> on the special situations you mentioned. 
>>>> 
>>>> Let me know any more concerns here.
>>> 
>>> I cannot find that patch?
>> 
>> Haven’t finished yet. -:).
> 
> Ah okay :-)
> 
> If you have, please send it in a new thread (not as a reply)?  So that
> it will be much easirer to handle :-)

Okay. Will do.

Qing
> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 21:03                                                               ` Segher Boessenkool
@ 2020-09-11 21:29                                                                 ` Qing Zhao
  2020-09-11 21:51                                                                   ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-11 21:29 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Richard Sandiford, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor



> On Sep 11, 2020, at 4:03 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Hi!
> 
> On Fri, Sep 11, 2020 at 03:14:57PM -0500, Qing Zhao wrote:
>> My understanding of how this scheme helps ROP is:  the attacker usually uses scratch register to pass
> 
> Help obstruct ROP ;-)
Thanks for catching my mistake.
> 
>> parameters to the sys call in the gadget, if clearing the scratch registers immediately before “ret”, then 
>> The parameters that are passed to sys call will be destroyed, therefore, the attack will likely failed.
> 
> But you do not need more than one non-zero argument for execv*, and that
> is usually the same register as the normal return value register; all
> other registers *should* be zero for a simple execv*("/bin/sh", ...)!
> 
> (There is also the system call number register, rax on x86-64, but if
> overwriting that would be any effective, you could just do that one
> always and everywhere.  This is only an effective defence if there are
> no gadgets that do the system call an attacker wants, and he has to
> construct that sequence himself; but it very effective and cheap then).

In the above, do you mean only clearing “rax” on x86-64 should be effective enough? 

Qing
> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 20:14                                                             ` Qing Zhao
  2020-09-11 21:03                                                               ` Segher Boessenkool
@ 2020-09-11 21:44                                                               ` Richard Sandiford
  2020-09-11 22:24                                                                 ` Qing Zhao
  2020-09-18 20:31                                                                 ` Qing Zhao
  1 sibling, 2 replies; 188+ messages in thread
From: Richard Sandiford @ 2020-09-11 21:44 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 11, 2020, at 12:32 PM, Richard Sandiford <richard.sandiford@arm.com> >>  If we go for (2), then I think it would be better to do
>> it at the start of pass_late_compilation instead.  (Some targets wouldn't
>> cope with doing it later.)  The reason for doing it so late is that the
>> set of used “volatile”/caller-saved registers is not fixed at prologue
>> and epilogue generation: later optimisation passes can introduce uses
>> of volatile registers that weren't used previously.  (Sorry if this
>> has already been suggested.)
>
> Yes, I agree.
>
> I thought that it might be better to move this task at the very late of the RTL stage, i.e, before “final” phase. 
>
> Another solution is (discussed with Hongjiu):
>
> 1. Define a new target hook:
>
> targetm.return_with_zeroing(bool simple_return_p, HARD_REG_SET need_zeroed_hardregs, bool gpr_only)
>
> 2. Add the following routine in middle end:
>
> rtx_insn *
> generate_return_rtx (bool simple_return_p)
> {
>   if (targetm.return_with_zeroing)
>     {
>       Compute the hardregs set for clearing into “need_zeroed_hardregs”;
>      return targetm.return_with_zeroing (simple_return_p, need_zeroed_hardregs, gpr_only);
>    }
>  else
>     {
>      if (simple_return_p)
>        return targetm.gen_simple_return ( );
>     else
>        return targetm.gen_return ();
>   }
> }
>
> Then replace all call to “targetm.gen_simple_return” and “targetm.gen_return” to “generate_return_rtx()”.
>
> 3. In the target, 
> Implement “return_with_zeroing”.
>
>
> Let me know your comments on this.

I think having a separate pass is better.  We don't normally know
at the point of generating the return which registers will need
to be cleared.  So IMO the pass should just search for all the
returns in a function and insert the zeroing instructions before
each one.

Having a target hook sounds good, but I think it should have a
default definition that just uses the move patterns to zero each
selected register.  I expect the default will be good enough for
most targets.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 21:29                                                                 ` Qing Zhao
@ 2020-09-11 21:51                                                                   ` Segher Boessenkool
  2020-09-11 22:41                                                                     ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-11 21:51 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Sandiford, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

On Fri, Sep 11, 2020 at 04:29:16PM -0500, Qing Zhao wrote:
> > On Sep 11, 2020, at 4:03 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >> The parameters that are passed to sys call will be destroyed, therefore, the attack will likely failed.
> > 
> > But you do not need more than one non-zero argument for execv*, and that
> > is usually the same register as the normal return value register; all
> > other registers *should* be zero for a simple execv*("/bin/sh", ...)!
> > 
> > (There is also the system call number register, rax on x86-64, but if
> > overwriting that would be any effective, you could just do that one
> > always and everywhere.  This is only an effective defence if there are
> > no gadgets that do the system call an attacker wants, and he has to
> > construct that sequence himself; but it very effective and cheap then).
> 
> In the above, do you mean only clearing “rax” on x86-64 should be effective enough? 

(rax=0 is "read", you might want to do another value, but that's just
details.)

"This is only an effective defence if there are
no gadgets that do the system call an attacker wants, and he has to
construct that sequence himself; but it very effective and cheap then)."

It is definitely *not* effective if there are gadgets that set rax to
a value the attacker wants and then do a syscall.  It of course is quite
effective in breaking a ROP chain of (set rax) -> (syscall).  How
effective it is in practice, I have no idea.

My point was that your proposed scheme does not protect the other
syscall parameters very much either.

And, hrm, rax is actually the first return value.  On most ABIs the
same registers are used for arguments and for return values, I got
confused.  Sorry.  So this cannot be very effective for x86-64 no
matter what.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 21:44                                                               ` Richard Sandiford
@ 2020-09-11 22:24                                                                 ` Qing Zhao
  2020-09-11 22:56                                                                   ` Richard Sandiford
  2020-09-14 23:20                                                                   ` Segher Boessenkool
  2020-09-18 20:31                                                                 ` Qing Zhao
  1 sibling, 2 replies; 188+ messages in thread
From: Qing Zhao @ 2020-09-11 22:24 UTC (permalink / raw)
  To: Richard Sandiford, Segher Boessenkool
  Cc: Kees Cook, Kees Cook via Gcc-patches, Jakub Jelinek, Uros Bizjak,
	Rodriguez Bahena, Victor



> On Sep 11, 2020, at 4:44 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> On Sep 11, 2020, at 12:32 PM, Richard Sandiford <richard.sandiford@arm.com> >>  If we go for (2), then I think it would be better to do
>>> it at the start of pass_late_compilation instead.  (Some targets wouldn't
>>> cope with doing it later.)  The reason for doing it so late is that the
>>> set of used “volatile”/caller-saved registers is not fixed at prologue
>>> and epilogue generation: later optimisation passes can introduce uses
>>> of volatile registers that weren't used previously.  (Sorry if this
>>> has already been suggested.)
>> 
>> Yes, I agree.
>> 
>> I thought that it might be better to move this task at the very late of the RTL stage, i.e, before “final” phase. 
>> 
>> Another solution is (discussed with Hongjiu):
>> 
>> 1. Define a new target hook:
>> 
>> targetm.return_with_zeroing(bool simple_return_p, HARD_REG_SET need_zeroed_hardregs, bool gpr_only)
>> 
>> 2. Add the following routine in middle end:
>> 
>> rtx_insn *
>> generate_return_rtx (bool simple_return_p)
>> {
>>  if (targetm.return_with_zeroing)
>>    {
>>      Compute the hardregs set for clearing into “need_zeroed_hardregs”;
>>     return targetm.return_with_zeroing (simple_return_p, need_zeroed_hardregs, gpr_only);
>>   }
>> else
>>    {
>>     if (simple_return_p)
>>       return targetm.gen_simple_return ( );
>>    else
>>       return targetm.gen_return ();
>>  }
>> }
>> 
>> Then replace all call to “targetm.gen_simple_return” and “targetm.gen_return” to “generate_return_rtx()”.
>> 
>> 3. In the target, 
>> Implement “return_with_zeroing”.
>> 
>> 
>> Let me know your comments on this.
> 
> I think having a separate pass is better.  We don't normally know
> at the point of generating the return which registers will need
> to be cleared.  

At the point of generating the return, we can compute the “need_zeroed_hardregs” HARD_REG_SET 
by using data flow information, the function abi of the routine, and also the user option and source code 
attribute information together. These information should be available at each point when generating the return.


> So IMO the pass should just search for all the
> returns in a function and insert the zeroing instructions before
> each one.

I was considering this approach too for some time, however, there is one major issue with this as 
Segher mentioned, The middle end does not know some details on the registers, lacking such 
detailed information might result incorrect code generation at middle end.

For example, on x86_64 target, when “return” with pop, the scratch register “ECX” will be 
used for returning, then it’s incorrect to zero “ecx” before generating the return. Since middle end
doesn't have such information, it cannot avoid to zero “ecx”. Therefore incorrect code might be 
generated. 

Segher also mentioned that on Power, there are some scratch registers also are used for 
Other purpose, clearing them before return is not correct. 


> 
> Having a target hook sounds good, but I think it should have a
> default definition that just uses the move patterns to zero each
> selected register.  I expect the default will be good enough for
> most targets.

Based on the above, I think that generating the zeroing instructions at middle end is not correct. 

Thanks.

Qing
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 21:51                                                                   ` Segher Boessenkool
@ 2020-09-11 22:41                                                                     ` Qing Zhao
  2020-09-14 23:09                                                                       ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-11 22:41 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Richard Sandiford, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor



> On Sep 11, 2020, at 4:51 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Sep 11, 2020 at 04:29:16PM -0500, Qing Zhao wrote:
>>> On Sep 11, 2020, at 4:03 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>> The parameters that are passed to sys call will be destroyed, therefore, the attack will likely failed.
>>> 
>>> But you do not need more than one non-zero argument for execv*, and that
>>> is usually the same register as the normal return value register; all
>>> other registers *should* be zero for a simple execv*("/bin/sh", ...)!
>>> 
>>> (There is also the system call number register, rax on x86-64, but if
>>> overwriting that would be any effective, you could just do that one
>>> always and everywhere.  This is only an effective defence if there are
>>> no gadgets that do the system call an attacker wants, and he has to
>>> construct that sequence himself; but it very effective and cheap then).
>> 
>> In the above, do you mean only clearing “rax” on x86-64 should be effective enough? 
> 
> (rax=0 is "read", you might want to do another value, but that's just
> details.)
> 
> "This is only an effective defence if there are
> no gadgets that do the system call an attacker wants, and he has to
> construct that sequence himself; but it very effective and cheap then)."
> 
> It is definitely *not* effective if there are gadgets that set rax to
> a value the attacker wants and then do a syscall.

You mean the following gadget:


Gadget 1:

mov  rax,  value
syscall
ret

Qing

> It of course is quite
> effective in breaking a ROP chain of (set rax) -> (syscall).  How
> effective it is in practice, I have no idea.
> 
> My point was that your proposed scheme does not protect the other
> syscall parameters very much either.
> 
> And, hrm, rax is actually the first return value.  On most ABIs the
> same registers are used for arguments and for return values, I got
> confused.  Sorry.  So this cannot be very effective for x86-64 no
> matter what.
> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 22:24                                                                 ` Qing Zhao
@ 2020-09-11 22:56                                                                   ` Richard Sandiford
  2020-09-14 14:56                                                                     ` Qing Zhao
  2020-09-14 23:20                                                                   ` Segher Boessenkool
  1 sibling, 1 reply; 188+ messages in thread
From: Richard Sandiford @ 2020-09-11 22:56 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 11, 2020, at 4:44 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> On Sep 11, 2020, at 12:32 PM, Richard Sandiford <richard.sandiford@arm.com> >>  If we go for (2), then I think it would be better to do
>>>> it at the start of pass_late_compilation instead.  (Some targets wouldn't
>>>> cope with doing it later.)  The reason for doing it so late is that the
>>>> set of used “volatile”/caller-saved registers is not fixed at prologue
>>>> and epilogue generation: later optimisation passes can introduce uses
>>>> of volatile registers that weren't used previously.  (Sorry if this
>>>> has already been suggested.)
>>> 
>>> Yes, I agree.
>>> 
>>> I thought that it might be better to move this task at the very late of the RTL stage, i.e, before “final” phase. 
>>> 
>>> Another solution is (discussed with Hongjiu):
>>> 
>>> 1. Define a new target hook:
>>> 
>>> targetm.return_with_zeroing(bool simple_return_p, HARD_REG_SET need_zeroed_hardregs, bool gpr_only)
>>> 
>>> 2. Add the following routine in middle end:
>>> 
>>> rtx_insn *
>>> generate_return_rtx (bool simple_return_p)
>>> {
>>>  if (targetm.return_with_zeroing)
>>>    {
>>>      Compute the hardregs set for clearing into “need_zeroed_hardregs”;
>>>     return targetm.return_with_zeroing (simple_return_p, need_zeroed_hardregs, gpr_only);
>>>   }
>>> else
>>>    {
>>>     if (simple_return_p)
>>>       return targetm.gen_simple_return ( );
>>>    else
>>>       return targetm.gen_return ();
>>>  }
>>> }
>>> 
>>> Then replace all call to “targetm.gen_simple_return” and “targetm.gen_return” to “generate_return_rtx()”.
>>> 
>>> 3. In the target, 
>>> Implement “return_with_zeroing”.
>>> 
>>> 
>>> Let me know your comments on this.
>> 
>> I think having a separate pass is better.  We don't normally know
>> at the point of generating the return which registers will need
>> to be cleared.  
>
> At the point of generating the return, we can compute the “need_zeroed_hardregs” HARD_REG_SET 
> by using data flow information, the function abi of the routine, and also the user option and source code 
> attribute information together. These information should be available at each point when generating the return.

Like I mentioned earlier though, passes that run after
pass_thread_prologue_and_epilogue can use call-clobbered registers that
weren't previously used.  For example, on x86_64, the function might
not use %r8 when the prologue, epilogue and returns are generated,
but pass_regrename might later introduce a new use of %r8.  AIUI,
the “used” version of the new command-line option is supposed to clear
%r8 in these circumstances, but it wouldn't do so if the data was
collected at the point that the return is generated.

That's why I think it's more robust to do this later (at the beginning
of pass_late_compilation) and insert the zeroing before returns that
already exist.

>> So IMO the pass should just search for all the
>> returns in a function and insert the zeroing instructions before
>> each one.
>
> I was considering this approach too for some time, however, there is one major issue with this as 
> Segher mentioned, The middle end does not know some details on the registers, lacking such 
> detailed information might result incorrect code generation at middle end.
>
> For example, on x86_64 target, when “return” with pop, the scratch register “ECX” will be 
> used for returning, then it’s incorrect to zero “ecx” before generating the return. Since middle end
> doesn't have such information, it cannot avoid to zero “ecx”. Therefore incorrect code might be 
> generated. 
>
> Segher also mentioned that on Power, there are some scratch registers also are used for 
> Other purpose, clearing them before return is not correct. 

But the dataflow information has to be correct between
pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
any pass in that region could clobber the registers in the same way.

To get the registers that are live before the return, you can start with
the registers that are live out from the block that contains the return,
then “simulate” the return instruction backwards to get the set of
registers that are live before the return instruction
(see df_simulate_one_insn_backwards).

In the x86_64 case you mention, the pattern is:

(define_insn "*simple_return_indirect_internal<mode>"
  [(simple_return)
   (use (match_operand:W 0 "register_operand" "r"))]
  "reload_completed"
  …)

This (use …) tells the df machinery that the instruction needs
operand 0 (= ecx).  The process above would therefore realise
that ecx can't be clobbered.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 22:56                                                                   ` Richard Sandiford
@ 2020-09-14 14:56                                                                     ` Qing Zhao
  2020-09-14 16:33                                                                       ` Richard Sandiford
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-14 14:56 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Hi, Richard,

> On Sep 11, 2020, at 5:56 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>> On Sep 11, 2020, at 4:44 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>> On Sep 11, 2020, at 12:32 PM, Richard Sandiford <richard.sandiford@arm.com> >>  If we go for (2), then I think it would be better to do
>>>>> it at the start of pass_late_compilation instead.  (Some targets wouldn't
>>>>> cope with doing it later.)  The reason for doing it so late is that the
>>>>> set of used “volatile”/caller-saved registers is not fixed at prologue
>>>>> and epilogue generation: later optimisation passes can introduce uses
>>>>> of volatile registers that weren't used previously.  (Sorry if this
>>>>> has already been suggested.)
>>>> 
>>>> Yes, I agree.
>>>> 
>>>> I thought that it might be better to move this task at the very late of the RTL stage, i.e, before “final” phase. 
>>>> 
>>>> Another solution is (discussed with Hongjiu):
>>>> 
>>>> 1. Define a new target hook:
>>>> 
>>>> targetm.return_with_zeroing(bool simple_return_p, HARD_REG_SET need_zeroed_hardregs, bool gpr_only)
>>>> 
>>>> 2. Add the following routine in middle end:
>>>> 
>>>> rtx_insn *
>>>> generate_return_rtx (bool simple_return_p)
>>>> {
>>>> if (targetm.return_with_zeroing)
>>>>   {
>>>>     Compute the hardregs set for clearing into “need_zeroed_hardregs”;
>>>>    return targetm.return_with_zeroing (simple_return_p, need_zeroed_hardregs, gpr_only);
>>>>  }
>>>> else
>>>>   {
>>>>    if (simple_return_p)
>>>>      return targetm.gen_simple_return ( );
>>>>   else
>>>>      return targetm.gen_return ();
>>>> }
>>>> }
>>>> 
>>>> Then replace all call to “targetm.gen_simple_return” and “targetm.gen_return” to “generate_return_rtx()”.
>>>> 
>>>> 3. In the target, 
>>>> Implement “return_with_zeroing”.
>>>> 
>>>> 
>>>> Let me know your comments on this.
>>> 
>>> I think having a separate pass is better.  We don't normally know
>>> at the point of generating the return which registers will need
>>> to be cleared.  
>> 
>> At the point of generating the return, we can compute the “need_zeroed_hardregs” HARD_REG_SET 
>> by using data flow information, the function abi of the routine, and also the user option and source code 
>> attribute information together. These information should be available at each point when generating the return.
> 
> Like I mentioned earlier though, passes that run after
> pass_thread_prologue_and_epilogue can use call-clobbered registers that
> weren't previously used.  For example, on x86_64, the function might
> not use %r8 when the prologue, epilogue and returns are generated,
> but pass_regrename might later introduce a new use of %r8.  AIUI,
> the “used” version of the new command-line option is supposed to clear
> %r8 in these circumstances, but it wouldn't do so if the data was
> collected at the point that the return is generated.

Thanks for the information.

> 
> That's why I think it's more robust to do this later (at the beginning
> of pass_late_compilation) and insert the zeroing before returns that
> already exist.

Yes, looks like it’s not correct to insert the zeroing at the time when prologue, epilogue and return are generated.
As I also checked, “return” might be also generated as late as pass “pass_delay_slots”,  So, shall we move the
New pass as late as possible?

Can I put it immediately before “pass_final”? What’s the latest place I can put it?


> 
>>> So IMO the pass should just search for all the
>>> returns in a function and insert the zeroing instructions before
>>> each one.
>> 
>> I was considering this approach too for some time, however, there is one major issue with this as 
>> Segher mentioned, The middle end does not know some details on the registers, lacking such 
>> detailed information might result incorrect code generation at middle end.
>> 
>> For example, on x86_64 target, when “return” with pop, the scratch register “ECX” will be 
>> used for returning, then it’s incorrect to zero “ecx” before generating the return. Since middle end
>> doesn't have such information, it cannot avoid to zero “ecx”. Therefore incorrect code might be 
>> generated. 
>> 
>> Segher also mentioned that on Power, there are some scratch registers also are used for 
>> Other purpose, clearing them before return is not correct. 
> 
> But the dataflow information has to be correct between
> pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
> any pass in that region could clobber the registers in the same way.

You mean, the data flow information will be not correct after pass_free_cfg? 
 “pass_delay_slots” is after “pass_free_cfg”,  and there might be new “return” generated in “pass_delay_slots”, 
If we want to generate zeroing for the new “return” which was generated in “pass_delay_slots”, can we correctly to do so?

> 
> To get the registers that are live before the return, you can start with
> the registers that are live out from the block that contains the return,
> then “simulate” the return instruction backwards to get the set of
> registers that are live before the return instruction
> (see df_simulate_one_insn_backwards).

Okay. 
Currently, I am using the following to check whether a reg is live out the block that contains the return:

/* Check whether the hard register REGNO is live at the exit block
 * of the current routine.  */
static bool
is_live_reg_at_exit (unsigned int regno)
{
  edge e;
  edge_iterator ei;

  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
    {
      bitmap live_out = df_get_live_out (e->src);
      if (REGNO_REG_SET_P (live_out, regno))
        return true;
    }

  return false;
}

Is this correct?

> 
> In the x86_64 case you mention, the pattern is:
> 
> (define_insn "*simple_return_indirect_internal<mode>"
>  [(simple_return)
>   (use (match_operand:W 0 "register_operand" "r"))]
>  "reload_completed"
>  …)
> 
> This (use …) tells the df machinery that the instruction needs
> operand 0 (= ecx).  The process above would therefore realise
> that ecx can't be clobbered.

Okay, I see.  The df will reflect this information, no need for special handling here. 

However, for the cases on Power as Segher mentioned, there are also some scratch registers used for
Other purpose, not sure whether we can correctly generate zeroing in middle-end for Power?

Thanks

Qing
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-14 14:56                                                                     ` Qing Zhao
@ 2020-09-14 16:33                                                                       ` Richard Sandiford
  2020-09-14 18:50                                                                         ` Qing Zhao
  2020-09-14 23:35                                                                         ` Segher Boessenkool
  0 siblings, 2 replies; 188+ messages in thread
From: Richard Sandiford @ 2020-09-14 16:33 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> Like I mentioned earlier though, passes that run after
>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>> weren't previously used.  For example, on x86_64, the function might
>> not use %r8 when the prologue, epilogue and returns are generated,
>> but pass_regrename might later introduce a new use of %r8.  AIUI,
>> the “used” version of the new command-line option is supposed to clear
>> %r8 in these circumstances, but it wouldn't do so if the data was
>> collected at the point that the return is generated.
>
> Thanks for the information.
>
>> 
>> That's why I think it's more robust to do this later (at the beginning
>> of pass_late_compilation) and insert the zeroing before returns that
>> already exist.
>
> Yes, looks like it’s not correct to insert the zeroing at the time when prologue, epilogue and return are generated.
> As I also checked, “return” might be also generated as late as pass “pass_delay_slots”,  So, shall we move the
> New pass as late as possible?

If we insert the zeroing before pass_delay_slots and describe the
result correctly, pass_delay_slots should do the right thing.

Describing the result correctly includes ensuring that the cleared
registers are treated as live on exit from the function, so that the
zeroing doesn't get deleted again, or skipped by pass_delay_slots.

> Can I put it immediately before “pass_final”? What’s the latest place
> I can put it?

Like you say here…

>>>> So IMO the pass should just search for all the
>>>> returns in a function and insert the zeroing instructions before
>>>> each one.
>>> 
>>> I was considering this approach too for some time, however, there is one major issue with this as 
>>> Segher mentioned, The middle end does not know some details on the registers, lacking such 
>>> detailed information might result incorrect code generation at middle end.
>>> 
>>> For example, on x86_64 target, when “return” with pop, the scratch register “ECX” will be 
>>> used for returning, then it’s incorrect to zero “ecx” before generating the return. Since middle end
>>> doesn't have such information, it cannot avoid to zero “ecx”. Therefore incorrect code might be 
>>> generated. 
>>> 
>>> Segher also mentioned that on Power, there are some scratch registers also are used for 
>>> Other purpose, clearing them before return is not correct. 
>> 
>> But the dataflow information has to be correct between
>> pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
>> any pass in that region could clobber the registers in the same way.
>
> You mean, the data flow information will be not correct after pass_free_cfg? 
>  “pass_delay_slots” is after “pass_free_cfg”,  and there might be new “return” generated in “pass_delay_slots”, 
> If we want to generate zeroing for the new “return” which was generated in “pass_delay_slots”, can we correctly to do so?

…the zeroing has to be done before pass_free_cfg, because the information
isn't reliable after that point.  I think it would make sense to do it
before pass_compute_alignments, because inserting the zeros will affect
alignment.

>> To get the registers that are live before the return, you can start with
>> the registers that are live out from the block that contains the return,
>> then “simulate” the return instruction backwards to get the set of
>> registers that are live before the return instruction
>> (see df_simulate_one_insn_backwards).
>
> Okay. 
> Currently, I am using the following to check whether a reg is live out the block that contains the return:
>
> /* Check whether the hard register REGNO is live at the exit block
>  * of the current routine.  */
> static bool
> is_live_reg_at_exit (unsigned int regno)
> {
>   edge e;
>   edge_iterator ei;
>
>   FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
>     {
>       bitmap live_out = df_get_live_out (e->src);
>       if (REGNO_REG_SET_P (live_out, regno))
>         return true;
>     }
>
>   return false;
> }
>
> Is this correct?

df_get_live_out is the right way to get the set of live registers
on exit from a block.  But if we search for return instructions
and find a return instruction R, we should do:

  basic_block bb = BLOCK_FOR_INSN (R);
  auto_bitmap live_regs;
  bitmap_copy (regs, df_get_live_out (bb));
  df_simulate_one_insn_backwards (bb, R, live_regs);

and then use LIVE_REGS as the set of registers that are live before R,
and so can't be clobbered.

For extra safety, you could/should also check targetm.hard_regno_scratch_ok
to see whether there's a target-specific reason why a register can't
be clobbered.

>> In the x86_64 case you mention, the pattern is:
>> 
>> (define_insn "*simple_return_indirect_internal<mode>"
>>  [(simple_return)
>>   (use (match_operand:W 0 "register_operand" "r"))]
>>  "reload_completed"
>>  …)
>> 
>> This (use …) tells the df machinery that the instruction needs
>> operand 0 (= ecx).  The process above would therefore realise
>> that ecx can't be clobbered.
>
> Okay, I see.  The df will reflect this information, no need for special handling here. 
>
> However, for the cases on Power as Segher mentioned, there are also some scratch registers used for
> Other purpose, not sure whether we can correctly generate zeroing in middle-end for Power?

Segher would be better placed to answer that, but I think the process
above has to give a conservatively-accurate list of live registers.
If it misses a register, the other late rtl passes could clobber
that same register.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-14 16:33                                                                       ` Richard Sandiford
@ 2020-09-14 18:50                                                                         ` Qing Zhao
  2020-09-14 19:20                                                                           ` Richard Sandiford
  2020-09-14 23:35                                                                         ` Segher Boessenkool
  1 sibling, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-14 18:50 UTC (permalink / raw)
  To: Richard Sandiford, Segher Boessenkool
  Cc: Kees Cook, Kees Cook via Gcc-patches, Jakub Jelinek, Uros Bizjak,
	Rodriguez Bahena, Victor



> On Sep 14, 2020, at 11:33 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> Like I mentioned earlier though, passes that run after
>>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>>> weren't previously used.  For example, on x86_64, the function might
>>> not use %r8 when the prologue, epilogue and returns are generated,
>>> but pass_regrename might later introduce a new use of %r8.  AIUI,
>>> the “used” version of the new command-line option is supposed to clear
>>> %r8 in these circumstances, but it wouldn't do so if the data was
>>> collected at the point that the return is generated.
>> 
>> Thanks for the information.
>> 
>>> 
>>> That's why I think it's more robust to do this later (at the beginning
>>> of pass_late_compilation) and insert the zeroing before returns that
>>> already exist.
>> 
>> Yes, looks like it’s not correct to insert the zeroing at the time when prologue, epilogue and return are generated.
>> As I also checked, “return” might be also generated as late as pass “pass_delay_slots”,  So, shall we move the
>> New pass as late as possible?
> 
> If we insert the zeroing before pass_delay_slots and describe the
> result correctly, pass_delay_slots should do the right thing.
> 
> Describing the result correctly includes ensuring that the cleared
> registers are treated as live on exit from the function, so that the
> zeroing doesn't get deleted again, or skipped by pass_delay_slots.

In the current implementation for x86, when we generating a zeroing insn as the following:

(insn 18 16 19 2 (set (reg:SI 1 dx)
        (const_int 0 [0])) "t10.c":11:1 -1
     (nil))
(insn 19 18 20 2 (unspec_volatile [
            (reg:SI 1 dx)
        ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
     (nil))

i.e, after each zeroing insn, the register that is zeroed is marked as “UNSPECV_PRO_EPILOGUE_USE”, 
By doing this, we can avoid this zeroing insn from being deleted or skipped. 

Is doing this enough to describe the result correctly?
Is there other thing we need to do in addition to this?

> 
>> Can I put it immediately before “pass_final”? What’s the latest place
>> I can put it?
> 
> Like you say here…
> 
>>>>> So IMO the pass should just search for all the
>>>>> returns in a function and insert the zeroing instructions before
>>>>> each one.
>>>> 
>>>> I was considering this approach too for some time, however, there is one major issue with this as 
>>>> Segher mentioned, The middle end does not know some details on the registers, lacking such 
>>>> detailed information might result incorrect code generation at middle end.
>>>> 
>>>> For example, on x86_64 target, when “return” with pop, the scratch register “ECX” will be 
>>>> used for returning, then it’s incorrect to zero “ecx” before generating the return. Since middle end
>>>> doesn't have such information, it cannot avoid to zero “ecx”. Therefore incorrect code might be 
>>>> generated. 
>>>> 
>>>> Segher also mentioned that on Power, there are some scratch registers also are used for 
>>>> Other purpose, clearing them before return is not correct. 
>>> 
>>> But the dataflow information has to be correct between
>>> pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
>>> any pass in that region could clobber the registers in the same way.
>> 
>> You mean, the data flow information will be not correct after pass_free_cfg? 
>> “pass_delay_slots” is after “pass_free_cfg”,  and there might be new “return” generated in “pass_delay_slots”, 
>> If we want to generate zeroing for the new “return” which was generated in “pass_delay_slots”, can we correctly to do so?
> 
> …the zeroing has to be done before pass_free_cfg, because the information
> isn't reliable after that point.  I think it would make sense to do it
> before pass_compute_alignments, because inserting the zeros will affect
> alignment.

Okay. 

Then there is another problem:  what about the new “return”s that are generated at pass_delay_slots?

Should we generate the zeroing for these new returns? Since the data flow information might not be correct at this pass,
It looks like that there is no correct way to add the zeroing insn for these new “return”, then, what should we do about this?

> 
>>> To get the registers that are live before the return, you can start with
>>> the registers that are live out from the block that contains the return,
>>> then “simulate” the return instruction backwards to get the set of
>>> registers that are live before the return instruction
>>> (see df_simulate_one_insn_backwards).
>> 
>> Okay. 
>> Currently, I am using the following to check whether a reg is live out the block that contains the return:
>> 
>> /* Check whether the hard register REGNO is live at the exit block
>> * of the current routine.  */
>> static bool
>> is_live_reg_at_exit (unsigned int regno)
>> {
>>  edge e;
>>  edge_iterator ei;
>> 
>>  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
>>    {
>>      bitmap live_out = df_get_live_out (e->src);
>>      if (REGNO_REG_SET_P (live_out, regno))
>>        return true;
>>    }
>> 
>>  return false;
>> }
>> 
>> Is this correct?
> 
> df_get_live_out is the right way to get the set of live registers
> on exit from a block.  But if we search for return instructions
> and find a return instruction R, we should do:
> 
>  basic_block bb = BLOCK_FOR_INSN (R);
>  auto_bitmap live_regs;
>  bitmap_copy (regs, df_get_live_out (bb));
>  df_simulate_one_insn_backwards (bb, R, live_regs);

> 
> and then use LIVE_REGS as the set of registers that are live before R,
> and so can't be clobbered.

Okay. Thanks for the info.
> 
> For extra safety, you could/should also check targetm.hard_regno_scratch_ok
> to see whether there's a target-specific reason why a register can't
> be clobbered.

/* Return true if is OK to use a hard register REGNO as scratch register
   in peephole2.  */
DEFHOOK
(hard_regno_scratch_ok,


Is this checking only valid for pass_peephole2?

> 
>>> In the x86_64 case you mention, the pattern is:
>>> 
>>> (define_insn "*simple_return_indirect_internal<mode>"
>>> [(simple_return)
>>>  (use (match_operand:W 0 "register_operand" "r"))]
>>> "reload_completed"
>>> …)
>>> 
>>> This (use …) tells the df machinery that the instruction needs
>>> operand 0 (= ecx).  The process above would therefore realise
>>> that ecx can't be clobbered.
>> 
>> Okay, I see.  The df will reflect this information, no need for special handling here. 
>> 
>> However, for the cases on Power as Segher mentioned, there are also some scratch registers used for
>> Other purpose, not sure whether we can correctly generate zeroing in middle-end for Power?
> 
> Segher would be better placed to answer that, but I think the process
> above has to give a conservatively-accurate list of live registers.
> If it misses a register, the other late rtl passes could clobber
> that same register.

Segher, can you comment on this? 

thanks.

Qing
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-14 18:50                                                                         ` Qing Zhao
@ 2020-09-14 19:20                                                                           ` Richard Sandiford
  2020-09-14 20:24                                                                             ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Richard Sandiford @ 2020-09-14 19:20 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 14, 2020, at 11:33 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> Like I mentioned earlier though, passes that run after
>>>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>>>> weren't previously used.  For example, on x86_64, the function might
>>>> not use %r8 when the prologue, epilogue and returns are generated,
>>>> but pass_regrename might later introduce a new use of %r8.  AIUI,
>>>> the “used” version of the new command-line option is supposed to clear
>>>> %r8 in these circumstances, but it wouldn't do so if the data was
>>>> collected at the point that the return is generated.
>>> 
>>> Thanks for the information.
>>> 
>>>> 
>>>> That's why I think it's more robust to do this later (at the beginning
>>>> of pass_late_compilation) and insert the zeroing before returns that
>>>> already exist.
>>> 
>>> Yes, looks like it’s not correct to insert the zeroing at the time when prologue, epilogue and return are generated.
>>> As I also checked, “return” might be also generated as late as pass “pass_delay_slots”,  So, shall we move the
>>> New pass as late as possible?
>> 
>> If we insert the zeroing before pass_delay_slots and describe the
>> result correctly, pass_delay_slots should do the right thing.
>> 
>> Describing the result correctly includes ensuring that the cleared
>> registers are treated as live on exit from the function, so that the
>> zeroing doesn't get deleted again, or skipped by pass_delay_slots.
>
> In the current implementation for x86, when we generating a zeroing insn as the following:
>
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>         (const_int 0 [0])) "t10.c":11:1 -1
>      (nil))
> (insn 19 18 20 2 (unspec_volatile [
>             (reg:SI 1 dx)
>         ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>      (nil))
>
> i.e, after each zeroing insn, the register that is zeroed is marked as “UNSPECV_PRO_EPILOGUE_USE”, 
> By doing this, we can avoid this zeroing insn from being deleted or skipped. 
>
> Is doing this enough to describe the result correctly?
> Is there other thing we need to do in addition to this?

I guess that works, but I think it would be better to abstract
EPILOGUE_USES into a new target-independent wrapper function that
(a) returns true if EPILOGUE_USES itself returns true and (b) returns
true for registers that need to be zero on return, if the zeroing
instructions have already been inserted.  The places that currently
test EPILOGUE_USES should then test this new wrapper function instead.

After inserting the zeroing instructions, the pass should recompute the
live-out sets based on this.

>>>> But the dataflow information has to be correct between
>>>> pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
>>>> any pass in that region could clobber the registers in the same way.
>>> 
>>> You mean, the data flow information will be not correct after pass_free_cfg? 
>>> “pass_delay_slots” is after “pass_free_cfg”,  and there might be new “return” generated in “pass_delay_slots”, 
>>> If we want to generate zeroing for the new “return” which was generated in “pass_delay_slots”, can we correctly to do so?
>> 
>> …the zeroing has to be done before pass_free_cfg, because the information
>> isn't reliable after that point.  I think it would make sense to do it
>> before pass_compute_alignments, because inserting the zeros will affect
>> alignment.
>
> Okay. 
>
> Then there is another problem:  what about the new “return”s that are generated at pass_delay_slots?
>
> Should we generate the zeroing for these new returns? Since the data flow information might not be correct at this pass,
> It looks like that there is no correct way to add the zeroing insn for these new “return”, then, what should we do about this?

pass_delay_slots isn't a problem.  It doesn't change *what* happens
on each return path, it just changes how the instructions to achieve
it are arranged.

So e.g. if every path through the function clears register R before
pass_delay_slots, and if that clearing is represented as being necessary,
then every path through the function will clear register R after the pass
as well.

>> For extra safety, you could/should also check targetm.hard_regno_scratch_ok
>> to see whether there's a target-specific reason why a register can't
>> be clobbered.
>
> /* Return true if is OK to use a hard register REGNO as scratch register
>    in peephole2.  */
> DEFHOOK
> (hard_regno_scratch_ok,
>
>
> Is this checking only valid for pass_peephole2?

No, that comment looks out of date.  The hook is already used in
postreload, for example.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-14 19:20                                                                           ` Richard Sandiford
@ 2020-09-14 20:24                                                                             ` Qing Zhao
  2020-09-15  9:11                                                                               ` Richard Sandiford
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-14 20:24 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor



> On Sep 14, 2020, at 2:20 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>> On Sep 14, 2020, at 11:33 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>> Like I mentioned earlier though, passes that run after
>>>>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>>>>> weren't previously used.  For example, on x86_64, the function might
>>>>> not use %r8 when the prologue, epilogue and returns are generated,
>>>>> but pass_regrename might later introduce a new use of %r8.  AIUI,
>>>>> the “used” version of the new command-line option is supposed to clear
>>>>> %r8 in these circumstances, but it wouldn't do so if the data was
>>>>> collected at the point that the return is generated.
>>>> 
>>>> Thanks for the information.
>>>> 
>>>>> 
>>>>> That's why I think it's more robust to do this later (at the beginning
>>>>> of pass_late_compilation) and insert the zeroing before returns that
>>>>> already exist.
>>>> 
>>>> Yes, looks like it’s not correct to insert the zeroing at the time when prologue, epilogue and return are generated.
>>>> As I also checked, “return” might be also generated as late as pass “pass_delay_slots”,  So, shall we move the
>>>> New pass as late as possible?
>>> 
>>> If we insert the zeroing before pass_delay_slots and describe the
>>> result correctly, pass_delay_slots should do the right thing.
>>> 
>>> Describing the result correctly includes ensuring that the cleared
>>> registers are treated as live on exit from the function, so that the
>>> zeroing doesn't get deleted again, or skipped by pass_delay_slots.
>> 
>> In the current implementation for x86, when we generating a zeroing insn as the following:
>> 
>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>        (const_int 0 [0])) "t10.c":11:1 -1
>>     (nil))
>> (insn 19 18 20 2 (unspec_volatile [
>>            (reg:SI 1 dx)
>>        ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>     (nil))
>> 
>> i.e, after each zeroing insn, the register that is zeroed is marked as “UNSPECV_PRO_EPILOGUE_USE”, 
>> By doing this, we can avoid this zeroing insn from being deleted or skipped. 
>> 
>> Is doing this enough to describe the result correctly?
>> Is there other thing we need to do in addition to this?
> 
> I guess that works, but I think it would be better to abstract
> EPILOGUE_USES into a new target-independent wrapper function that
> (a) returns true if EPILOGUE_USES itself returns true and (b) returns
> true for registers that need to be zero on return, if the zeroing
> instructions have already been inserted.  The places that currently
> test EPILOGUE_USES should then test this new wrapper function instead.

Okay, I see. 
Looks like that EPILOGUE_USES is used in df-scan.c to compute the data flow information. If EPILOUGE_USES return true
for registers that need to be zeroed on return, those registers will be included in the data flow information, as a result, later
passes will not be able to delete them. 

This sounds to be a cleaner approach than the current one that marks the registers  “UNSPECV_PRO_EPILOGUE_USE”. 

A more detailed implementation question on this: 
Where should I put this new target-independent wrapper function in? Which header file will be a proper place to hold this new function?

> 
> After inserting the zeroing instructions, the pass should recompute the
> live-out sets based on this.

Is only computing the live-out sets of the block that including the return insn enough? Or we should re-compute the whole procedure? 

Which utility routine I should use to recompute the live-out sets?

> 
>>>>> But the dataflow information has to be correct between
>>>>> pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
>>>>> any pass in that region could clobber the registers in the same way.
>>>> 
>>>> You mean, the data flow information will be not correct after pass_free_cfg? 
>>>> “pass_delay_slots” is after “pass_free_cfg”,  and there might be new “return” generated in “pass_delay_slots”, 
>>>> If we want to generate zeroing for the new “return” which was generated in “pass_delay_slots”, can we correctly to do so?
>>> 
>>> …the zeroing has to be done before pass_free_cfg, because the information
>>> isn't reliable after that point.  I think it would make sense to do it
>>> before pass_compute_alignments, because inserting the zeros will affect
>>> alignment.
>> 
>> Okay. 
>> 
>> Then there is another problem:  what about the new “return”s that are generated at pass_delay_slots?
>> 
>> Should we generate the zeroing for these new returns? Since the data flow information might not be correct at this pass,
>> It looks like that there is no correct way to add the zeroing insn for these new “return”, then, what should we do about this?
> 
> pass_delay_slots isn't a problem.  It doesn't change *what* happens
> on each return path, it just changes how the instructions to achieve
> it are arranged.
> 
> So e.g. if every path through the function clears register R before
> pass_delay_slots, and if that clearing is represented as being necessary,
> then every path through the function will clear register R after the pass
> as well.

Okay, I might now understand what you mean here.

My understanding is:

In our new pass that is put in the beginning of the pass_late_compilation, I,e pass_zero_call_used_regs;

      PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
++++  NEXT_PASS (pass_zero_call_used_regs);
          NEXT_PASS (pass_compute_alignments);
          NEXT_PASS (pass_variable_tracking);
          NEXT_PASS (pass_free_cfg);
          NEXT_PASS (pass_machine_reorg);
          NEXT_PASS (pass_cleanup_barriers);
          NEXT_PASS (pass_delay_slots);

When we scan the EXIT BLOCK of the routine, all the return insns have already been there.
The later passes including “pass_delay_slots” will not generate additional returns anymore,  they might just call “target.gen_return” or “target.gen_simple_return() to replace 
“ret_rtx” or “simple_ret_rtx” ?


> 
>>> For extra safety, you could/should also check targetm.hard_regno_scratch_ok
>>> to see whether there's a target-specific reason why a register can't
>>> be clobbered.
>> 
>> /* Return true if is OK to use a hard register REGNO as scratch register
>>   in peephole2.  */
>> DEFHOOK
>> (hard_regno_scratch_ok,
>> 
>> 
>> Is this checking only valid for pass_peephole2?
> 
> No, that comment looks out of date.  The hook is already used in
> postreload, for example.

Okay, I see.

thanks.

Qing
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 22:41                                                                     ` Qing Zhao
@ 2020-09-14 23:09                                                                       ` Segher Boessenkool
  2020-09-15  3:07                                                                         ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-14 23:09 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Sandiford, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

On Fri, Sep 11, 2020 at 05:41:47PM -0500, Qing Zhao wrote:
> > On Sep 11, 2020, at 4:51 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > It is definitely *not* effective if there are gadgets that set rax to
> > a value the attacker wants and then do a syscall.
> 
> You mean the following gadget:
> 
> 
> Gadget 1:
> 
> mov  rax,  value
> syscall
> ret

No, just

mov rax,59
syscall

(no ret necessary!)

I.e. just anything that already does an execve.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 22:24                                                                 ` Qing Zhao
  2020-09-11 22:56                                                                   ` Richard Sandiford
@ 2020-09-14 23:20                                                                   ` Segher Boessenkool
  1 sibling, 0 replies; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-14 23:20 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Sandiford, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Hi!

On Fri, Sep 11, 2020 at 05:24:58PM -0500, Qing Zhao wrote:
> > So IMO the pass should just search for all the
> > returns in a function and insert the zeroing instructions before
> > each one.
> 
> I was considering this approach too for some time, however, there is one major issue with this as 
> Segher mentioned, The middle end does not know some details on the registers, lacking such 
> detailed information might result incorrect code generation at middle end.
> 
> For example, on x86_64 target, when “return” with pop, the scratch register “ECX” will be 
> used for returning, then it’s incorrect to zero “ecx” before generating the return. Since middle end
> doesn't have such information, it cannot avoid to zero “ecx”. Therefore incorrect code might be 
> generated. 
> 
> Segher also mentioned that on Power, there are some scratch registers also are used for 
> Other purpose, clearing them before return is not correct. 

Depending where you insert those insns, it can be non-harmful, but in
most places it will not be useful.


What you can do (easy and safe) is change the RTL return instructions to
clear all necessary registers (by outputting extra assembler
instructions).  I still have big doubts how effective that will be, and
esp. compared with how expensive that is, but at least its effect on the
compiler is very local, and it does not get in the way of most things.

(This also works with shrink-wrapping and similar.)

(The effectiveness of this whole scheme depends a *lot* on specifics of
the ABI, btw; in that way it is not generic at all!)


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-14 16:33                                                                       ` Richard Sandiford
  2020-09-14 18:50                                                                         ` Qing Zhao
@ 2020-09-14 23:35                                                                         ` Segher Boessenkool
  2020-09-15 11:46                                                                           ` Richard Sandiford
  1 sibling, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-14 23:35 UTC (permalink / raw)
  To: Qing Zhao, Kees Cook, Kees Cook via Gcc-patches, Jakub Jelinek,
	Uros Bizjak, Rodriguez Bahena, Victor, richard.sandiford

On Mon, Sep 14, 2020 at 05:33:33PM +0100, Richard Sandiford wrote:
> > However, for the cases on Power as Segher mentioned, there are also some scratch registers used for
> > Other purpose, not sure whether we can correctly generate zeroing in middle-end for Power?
> 
> Segher would be better placed to answer that, but I think the process
> above has to give a conservatively-accurate list of live registers.
> If it misses a register, the other late rtl passes could clobber
> that same register.

It will zero a whole bunch of registers that are overwritten later, that
are not parameter passing registers either.

Doing this with the limited information the middle end has is not the
best idea.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-14 23:09                                                                       ` Segher Boessenkool
@ 2020-09-15  3:07                                                                         ` Qing Zhao
  2020-09-15 18:51                                                                           ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-15  3:07 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Richard Sandiford, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor



> On Sep 14, 2020, at 6:09 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Sep 11, 2020 at 05:41:47PM -0500, Qing Zhao wrote:
>>> On Sep 11, 2020, at 4:51 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> It is definitely *not* effective if there are gadgets that set rax to
>>> a value the attacker wants and then do a syscall.
>> 
>> You mean the following gadget:
>> 
>> 
>> Gadget 1:
>> 
>> mov  rax,  value
>> syscall
>> ret
> 
> No, just
> 
> mov rax,59
> syscall
> 
> (no ret necessary!)

But for ROP, a typical gadget should be ended with a “ret” (or indirect branch), right?

Qing
> 
> I.e. just anything that already does an execve.
> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-14 20:24                                                                             ` Qing Zhao
@ 2020-09-15  9:11                                                                               ` Richard Sandiford
  2020-09-15 15:05                                                                                 ` Qing Zhao
  2020-09-15 19:41                                                                                 ` Segher Boessenkool
  0 siblings, 2 replies; 188+ messages in thread
From: Richard Sandiford @ 2020-09-15  9:11 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 14, 2020, at 2:20 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>>> On Sep 14, 2020, at 11:33 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>> 
>>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>>> Like I mentioned earlier though, passes that run after
>>>>>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>>>>>> weren't previously used.  For example, on x86_64, the function might
>>>>>> not use %r8 when the prologue, epilogue and returns are generated,
>>>>>> but pass_regrename might later introduce a new use of %r8.  AIUI,
>>>>>> the “used” version of the new command-line option is supposed to clear
>>>>>> %r8 in these circumstances, but it wouldn't do so if the data was
>>>>>> collected at the point that the return is generated.
>>>>> 
>>>>> Thanks for the information.
>>>>> 
>>>>>> 
>>>>>> That's why I think it's more robust to do this later (at the beginning
>>>>>> of pass_late_compilation) and insert the zeroing before returns that
>>>>>> already exist.
>>>>> 
>>>>> Yes, looks like it’s not correct to insert the zeroing at the time when prologue, epilogue and return are generated.
>>>>> As I also checked, “return” might be also generated as late as pass “pass_delay_slots”,  So, shall we move the
>>>>> New pass as late as possible?
>>>> 
>>>> If we insert the zeroing before pass_delay_slots and describe the
>>>> result correctly, pass_delay_slots should do the right thing.
>>>> 
>>>> Describing the result correctly includes ensuring that the cleared
>>>> registers are treated as live on exit from the function, so that the
>>>> zeroing doesn't get deleted again, or skipped by pass_delay_slots.
>>> 
>>> In the current implementation for x86, when we generating a zeroing insn as the following:
>>> 
>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>>        (const_int 0 [0])) "t10.c":11:1 -1
>>>     (nil))
>>> (insn 19 18 20 2 (unspec_volatile [
>>>            (reg:SI 1 dx)
>>>        ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>>     (nil))
>>> 
>>> i.e, after each zeroing insn, the register that is zeroed is marked as “UNSPECV_PRO_EPILOGUE_USE”, 
>>> By doing this, we can avoid this zeroing insn from being deleted or skipped. 
>>> 
>>> Is doing this enough to describe the result correctly?
>>> Is there other thing we need to do in addition to this?
>> 
>> I guess that works, but I think it would be better to abstract
>> EPILOGUE_USES into a new target-independent wrapper function that
>> (a) returns true if EPILOGUE_USES itself returns true and (b) returns
>> true for registers that need to be zero on return, if the zeroing
>> instructions have already been inserted.  The places that currently
>> test EPILOGUE_USES should then test this new wrapper function instead.
>
> Okay, I see. 
> Looks like that EPILOGUE_USES is used in df-scan.c to compute the data flow information. If EPILOUGE_USES return true
> for registers that need to be zeroed on return, those registers will be included in the data flow information, as a result, later
> passes will not be able to delete them. 
>
> This sounds to be a cleaner approach than the current one that marks the registers  “UNSPECV_PRO_EPILOGUE_USE”. 
>
> A more detailed implementation question on this: 
> Where should I put this new target-independent wrapper function in? Which header file will be a proper place to hold this new function?

Not a strong opinion, but: maybe df.h and df-scan.c, since this is
really a DF query.

>> After inserting the zeroing instructions, the pass should recompute the
>> live-out sets based on this.

Sorry, I was wrong here.  It should *cause* the sets to be recomputed
where necessary (rather than recompute them directly), but see below.

> Is only computing the live-out sets of the block that including the return insn enough? Or we should re-compute the whole procedure? 
>
> Which utility routine I should use to recompute the live-out sets?

Inserting the instructions will cause the containing block to be marked
dirty, via df_set_bb_dirty.  I think the pass should also call
df_set_bb_dirty on the exit block itself, to indicate that the
wrapper around EPILOGUE_USES has changed behaviour, but that might
not be necessary.

This gives the df machinery enough information to work out what has changed.
It will then propagate those changes throughout the function.  (I don't
think any propagation would be necessary here, but if I'm wrong about that,
then the df machinery will do whatever propagation is necessary.)

However, the convention is for a pass that uses the df machinery to call
df_analyze first.  This call to df_analyze updates any stale df information.

So unlike what I said yesterday, the pass itself doesn't need to make sure
that the df information is up-to-date.  It just needs to indicate what
has changed, as above.

In the case of pass_delay_slots, pass_free_cfg has:

  /* The resource.c machinery uses DF but the CFG isn't guaranteed to be
     valid at that point so it would be too late to call df_analyze.  */
  if (DELAY_SLOTS && optimize > 0 && flag_delayed_branch)
    {
      df_note_add_problem ();
      df_analyze ();
    }

Any other machine-specific passes that use df already need to call
df_analyze (if they use the df machinery).  So simply marking what
has changed is enough (by design).

> My understanding is:
>
> In our new pass that is put in the beginning of the pass_late_compilation, I,e pass_zero_call_used_regs;
>
>       PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
> ++++  NEXT_PASS (pass_zero_call_used_regs);
>           NEXT_PASS (pass_compute_alignments);
>           NEXT_PASS (pass_variable_tracking);
>           NEXT_PASS (pass_free_cfg);
>           NEXT_PASS (pass_machine_reorg);
>           NEXT_PASS (pass_cleanup_barriers);
>           NEXT_PASS (pass_delay_slots);
>
> When we scan the EXIT BLOCK of the routine, all the return insns have already been there.
> The later passes including “pass_delay_slots” will not generate additional returns anymore,  they might just call “target.gen_return” or “target.gen_simple_return() to replace 
> “ret_rtx” or “simple_ret_rtx” ?

Kind-of.  pass_delay_slots can also duplicate code, so it's not always a
straight replacement.  But the point is that returns don't appear out of
nowhere.  There has to be a semantic reason for them to exist.  The
behaviour of the function after pass_delay_slots has to be the same
as it was before the pass (disregarding undefined behaviour).  Once we've
added clearing of the zero registers to all return paths, that clearing
becomes part of the behaviour of the function, and so will be part of
the behaviour after pass_delay_slots as well.

So I don't think the problem is with passes generating new returns.
It's more whether they could use new registers that then need to be
cleared, which is the main justification for running the new pass
so late in the pipeline.

In principle, there's nothing stopping pass_delay_slots allocating
new registers (like pass_regrename does), and in principle that could
introduce the need to do more clearing.  But I don't think the current
pass does that.  The pass is also very much legacy code at this point,
so the chances of new optimisations being added to it are fairly low.
If that did happen, I think it would be reasonable to expect the pass
to work within the set of registers that have already been allocated,
at least when your new option is in effect.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-14 23:35                                                                         ` Segher Boessenkool
@ 2020-09-15 11:46                                                                           ` Richard Sandiford
  2020-09-15 19:22                                                                             ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: Richard Sandiford @ 2020-09-15 11:46 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Qing Zhao, Kees Cook, Kees Cook via Gcc-patches, Jakub Jelinek,
	Uros Bizjak, Rodriguez Bahena, Victor

Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Mon, Sep 14, 2020 at 05:33:33PM +0100, Richard Sandiford wrote:
>> > However, for the cases on Power as Segher mentioned, there are also some scratch registers used for
>> > Other purpose, not sure whether we can correctly generate zeroing in middle-end for Power?
>> 
>> Segher would be better placed to answer that, but I think the process
>> above has to give a conservatively-accurate list of live registers.
>> If it misses a register, the other late rtl passes could clobber
>> that same register.
>
> It will zero a whole bunch of registers that are overwritten later, that
> are not parameter passing registers either.

This thread has covered two main issues: correctness and cost.
The question above was about correctness, but your reply seems to be
about cost.  The correctness question was instead: would the process
described in my previous message lead the compiler to think that a
register wasn't live before a Power return instruction when the
register actually was live?  (And if so, how do we get around that
for other post prologue-epilogue passes that use df?)

On the cost issue: when you say some registers are “overwritten later”:
which registers do you mean, and who would be doing the overwriting?
We were talking about inserting zeroing instructions immediately before
returns that already exist.  It looks like the main Power return
pattern is:

(define_insn "<return_str>return"
  [(any_return)]
  "<return_pred>"
  "blr"
  [(set_attr "type" "jmpreg")])

Does this overwrite anything other than the PC?  If not, it doesn't
look like anything in the function itself would clobber other registers
later (i.e. later than the inserted zeroing instructions).  And of course,
if an attacker is performing a ROP attack, the attacker controls which
address the BLR returns to.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-15  9:11                                                                               ` Richard Sandiford
@ 2020-09-15 15:05                                                                                 ` Qing Zhao
  2020-09-15 19:41                                                                                 ` Segher Boessenkool
  1 sibling, 0 replies; 188+ messages in thread
From: Qing Zhao @ 2020-09-15 15:05 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor



> On Sep 15, 2020, at 4:11 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>> On Sep 14, 2020, at 2:20 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>>>> On Sep 14, 2020, at 11:33 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>> 
>>>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>>>> Like I mentioned earlier though, passes that run after
>>>>>>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>>>>>>> weren't previously used.  For example, on x86_64, the function might
>>>>>>> not use %r8 when the prologue, epilogue and returns are generated,
>>>>>>> but pass_regrename might later introduce a new use of %r8.  AIUI,
>>>>>>> the “used” version of the new command-line option is supposed to clear
>>>>>>> %r8 in these circumstances, but it wouldn't do so if the data was
>>>>>>> collected at the point that the return is generated.
>>>>>> 
>>>>>> Thanks for the information.
>>>>>> 
>>>>>>> 
>>>>>>> That's why I think it's more robust to do this later (at the beginning
>>>>>>> of pass_late_compilation) and insert the zeroing before returns that
>>>>>>> already exist.
>>>>>> 
>>>>>> Yes, looks like it’s not correct to insert the zeroing at the time when prologue, epilogue and return are generated.
>>>>>> As I also checked, “return” might be also generated as late as pass “pass_delay_slots”,  So, shall we move the
>>>>>> New pass as late as possible?
>>>>> 
>>>>> If we insert the zeroing before pass_delay_slots and describe the
>>>>> result correctly, pass_delay_slots should do the right thing.
>>>>> 
>>>>> Describing the result correctly includes ensuring that the cleared
>>>>> registers are treated as live on exit from the function, so that the
>>>>> zeroing doesn't get deleted again, or skipped by pass_delay_slots.
>>>> 
>>>> In the current implementation for x86, when we generating a zeroing insn as the following:
>>>> 
>>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>>>       (const_int 0 [0])) "t10.c":11:1 -1
>>>>    (nil))
>>>> (insn 19 18 20 2 (unspec_volatile [
>>>>           (reg:SI 1 dx)
>>>>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>>>    (nil))
>>>> 
>>>> i.e, after each zeroing insn, the register that is zeroed is marked as “UNSPECV_PRO_EPILOGUE_USE”, 
>>>> By doing this, we can avoid this zeroing insn from being deleted or skipped. 
>>>> 
>>>> Is doing this enough to describe the result correctly?
>>>> Is there other thing we need to do in addition to this?
>>> 
>>> I guess that works, but I think it would be better to abstract
>>> EPILOGUE_USES into a new target-independent wrapper function that
>>> (a) returns true if EPILOGUE_USES itself returns true and (b) returns
>>> true for registers that need to be zero on return, if the zeroing
>>> instructions have already been inserted.  The places that currently
>>> test EPILOGUE_USES should then test this new wrapper function instead.
>> 
>> Okay, I see. 
>> Looks like that EPILOGUE_USES is used in df-scan.c to compute the data flow information. If EPILOUGE_USES return true
>> for registers that need to be zeroed on return, those registers will be included in the data flow information, as a result, later
>> passes will not be able to delete them. 
>> 
>> This sounds to be a cleaner approach than the current one that marks the registers  “UNSPECV_PRO_EPILOGUE_USE”. 
>> 
>> A more detailed implementation question on this: 
>> Where should I put this new target-independent wrapper function in? Which header file will be a proper place to hold this new function?
> 
> Not a strong opinion, but: maybe df.h and df-scan.c, since this is
> really a DF query.

Okay.

> 
>>> After inserting the zeroing instructions, the pass should recompute the
>>> live-out sets based on this.
> 
> Sorry, I was wrong here.  It should *cause* the sets to be recomputed
> where necessary (rather than recompute them directly), but see below.
> 
>> Is only computing the live-out sets of the block that including the return insn enough? Or we should re-compute the whole procedure? 
>> 
>> Which utility routine I should use to recompute the live-out sets?
> 
> Inserting the instructions will cause the containing block to be marked
> dirty, via df_set_bb_dirty.  I think the pass should also call
> df_set_bb_dirty on the exit block itself, to indicate that the
> wrapper around EPILOGUE_USES has changed behaviour, but that might
> not be necessary.
> 
> This gives the df machinery enough information to work out what has changed.
> It will then propagate those changes throughout the function.  (I don't
> think any propagation would be necessary here, but if I'm wrong about that,
> then the df machinery will do whatever propagation is necessary.)
> 
> However, the convention is for a pass that uses the df machinery to call
> df_analyze first.  This call to df_analyze updates any stale df information.
> 
> So unlike what I said yesterday, the pass itself doesn't need to make sure
> that the df information is up-to-date.  It just needs to indicate what
> has changed, as above.
> 
> In the case of pass_delay_slots, pass_free_cfg has:
> 
>  /* The resource.c machinery uses DF but the CFG isn't guaranteed to be
>     valid at that point so it would be too late to call df_analyze.  */
>  if (DELAY_SLOTS && optimize > 0 && flag_delayed_branch)
>    {
>      df_note_add_problem ();
>      df_analyze ();
>    }
> 
> Any other machine-specific passes that use df already need to call
> df_analyze (if they use the df machinery).  So simply marking what
> has changed is enough (by design).

So, in this new pass, I need:

1. Call “df_analyze” in the beginning to get the up-to-data df information;
2. After generating the zero insns, mark the containing block with “df_set_bb_dirty”. 
3. mark the exit block with “df_set_bb_dirty” to indicate the wrapper around EPILOGUE_USES changed
    Behavior. (This might not need since “df_analyze” in the next pass will call EPILOGUE_USES automatically? )

Is the above enough for DF?

(BTW, how expensive to call “df_analyze”?)

> 
>> My understanding is:
>> 
>> In our new pass that is put in the beginning of the pass_late_compilation, I,e pass_zero_call_used_regs;
>> 
>>      PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
>> ++++  NEXT_PASS (pass_zero_call_used_regs);
>>          NEXT_PASS (pass_compute_alignments);
>>          NEXT_PASS (pass_variable_tracking);
>>          NEXT_PASS (pass_free_cfg);
>>          NEXT_PASS (pass_machine_reorg);
>>          NEXT_PASS (pass_cleanup_barriers);
>>          NEXT_PASS (pass_delay_slots);
>> 
>> When we scan the EXIT BLOCK of the routine, all the return insns have already been there.
>> The later passes including “pass_delay_slots” will not generate additional returns anymore,  they might just call “target.gen_return” or “target.gen_simple_return() to replace 
>> “ret_rtx” or “simple_ret_rtx” ?
> 
> Kind-of.  pass_delay_slots can also duplicate code, so it's not always a
> straight replacement.  But the point is that returns don't appear out of
> nowhere.  There has to be a semantic reason for them to exist.  The
> behaviour of the function after pass_delay_slots has to be the same
> as it was before the pass (disregarding undefined behaviour).  Once we've
> added clearing of the zero registers to all return paths, that clearing
> becomes part of the behaviour of the function, and so will be part of
> the behaviour after pass_delay_slots as well.
> 
> So I don't think the problem is with passes generating new returns.
> It's more whether they could use new registers that then need to be
> cleared, which is the main justification for running the new pass
> so late in the pipeline.

agreed.

> 
> In principle, there's nothing stopping pass_delay_slots allocating
> new registers (like pass_regrename does), and in principle that could
> introduce the need to do more clearing.  But I don't think the current
> pass does that.  The pass is also very much legacy code at this point,
> so the chances of new optimisations being added to it are fairly low.
> If that did happen, I think it would be reasonable to expect the pass
> to work within the set of registers that have already been allocated,
> at least when your new option is in effect.

Okay, thanks for the information.

Qing
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-15  3:07                                                                         ` Qing Zhao
@ 2020-09-15 18:51                                                                           ` Segher Boessenkool
  0 siblings, 0 replies; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-15 18:51 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Sandiford, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

On Mon, Sep 14, 2020 at 10:07:31PM -0500, Qing Zhao wrote:
> > On Sep 14, 2020, at 6:09 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >> Gadget 1:
> >> 
> >> mov  rax,  value
> >> syscall
> >> ret
> > 
> > No, just
> > 
> > mov rax,59
> > syscall
> > 
> > (no ret necessary!)
> 
> But for ROP, a typical gadget should be ended with a “ret” (or indirect branch), right?

Not the last one :-)  (Especially if it is exec!)


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-15 11:46                                                                           ` Richard Sandiford
@ 2020-09-15 19:22                                                                             ` Segher Boessenkool
  0 siblings, 0 replies; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-15 19:22 UTC (permalink / raw)
  To: Qing Zhao, Kees Cook, Kees Cook via Gcc-patches, Jakub Jelinek,
	Uros Bizjak, Rodriguez Bahena, Victor, richard.sandiford

On Tue, Sep 15, 2020 at 12:46:00PM +0100, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > On Mon, Sep 14, 2020 at 05:33:33PM +0100, Richard Sandiford wrote:
> >> > However, for the cases on Power as Segher mentioned, there are also some scratch registers used for
> >> > Other purpose, not sure whether we can correctly generate zeroing in middle-end for Power?
> >> 
> >> Segher would be better placed to answer that, but I think the process
> >> above has to give a conservatively-accurate list of live registers.
> >> If it misses a register, the other late rtl passes could clobber
> >> that same register.
> >
> > It will zero a whole bunch of registers that are overwritten later, that
> > are not parameter passing registers either.
> 
> This thread has covered two main issues: correctness and cost.
> The question above was about correctness, but your reply seems to be
> about cost.

The issues are very heavily intertwined.  A much too high execution
cost is unacceptable, just like machine code that does not implement the
source code faithfully.

> On the cost issue: when you say some registers are “overwritten later”:
> which registers do you mean, and who would be doing the overwriting?

(Glue) code that is generated by the linker.

> We were talking about inserting zeroing instructions immediately before
> returns that already exist.  It looks like the main Power return
> pattern is:

It is.

> (define_insn "<return_str>return"
>   [(any_return)]
>   "<return_pred>"
>   "blr"
>   [(set_attr "type" "jmpreg")])
> 
> Does this overwrite anything other than the PC?  If not, it doesn't

(We do not have a "PC" register, but :-) )

Nope.  The blr instruction does not write any register.  (The base
"bclr[l]" insn can write to CTR and LR).

> look like anything in the function itself would clobber other registers
> later (i.e. later than the inserted zeroing instructions).  And of course,
> if an attacker is performing a ROP attack, the attacker controls which
> address the BLR returns to.

That does not matter for the *normal* case.  Making the normal case even
more expensive than this scheme already is is no good.


Anyway, I was concerned about other architectures, too (that may not
even *have* a GCC port (yet)).  The point is that this should follow all
the rules we have for RTL.  Now that it will use DF (thanks!), most of
that will follow automatically (or easily, anyway).


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-15  9:11                                                                               ` Richard Sandiford
  2020-09-15 15:05                                                                                 ` Qing Zhao
@ 2020-09-15 19:41                                                                                 ` Segher Boessenkool
  2020-09-15 22:31                                                                                   ` Qing Zhao
  1 sibling, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-15 19:41 UTC (permalink / raw)
  To: Qing Zhao, Kees Cook, Kees Cook via Gcc-patches, Jakub Jelinek,
	Uros Bizjak, Rodriguez Bahena, Victor, richard.sandiford

On Tue, Sep 15, 2020 at 10:11:41AM +0100, Richard Sandiford wrote:
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> >> On Sep 14, 2020, at 2:20 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
(Putting correct info in DF, inserting the new insns in pro_and_epi).

But, scheduling runs *after* that, and then you need to prevent the
inserted (zeroing) insns from moving -- if you don't, the code after
some zeroing can be used as gadget!  You want to always have all
zeroing insns after *any* computational insn, or it becomes a gadget.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-15 19:41                                                                                 ` Segher Boessenkool
@ 2020-09-15 22:31                                                                                   ` Qing Zhao
  2020-09-15 23:09                                                                                     ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-15 22:31 UTC (permalink / raw)
  To: Segher Boessenkool, Richard Sandiford
  Cc: Kees Cook, Kees Cook via Gcc-patches, Jakub Jelinek, Uros Bizjak,
	Rodriguez Bahena, Victor



> On Sep 15, 2020, at 2:41 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Tue, Sep 15, 2020 at 10:11:41AM +0100, Richard Sandiford wrote:
>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> On Sep 14, 2020, at 2:20 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> (Putting correct info in DF, inserting the new insns in pro_and_epi).
> 
> But, scheduling runs *after* that, and then you need to prevent the
> inserted (zeroing) insns from moving -- if you don't, the code after
> some zeroing can be used as gadget!  You want to always have all
> zeroing insns after *any* computational insn, or it becomes a gadget.

Please see the previous discussion, we have agreed to put the new pass   (pass_zero_call_used_regs) 
in the beginning of the pass_late_compilation as following:

     PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
++++  NEXT_PASS (pass_zero_call_used_regs);
         NEXT_PASS (pass_compute_alignments);
         NEXT_PASS (pass_variable_tracking);
         NEXT_PASS (pass_free_cfg);
         NEXT_PASS (pass_machine_reorg);
         NEXT_PASS (pass_cleanup_barriers);
         NEXT_PASS (pass_delay_slots);

Scheduling has been done already. 

Qing


> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-15 22:31                                                                                   ` Qing Zhao
@ 2020-09-15 23:09                                                                                     ` Segher Boessenkool
  2020-09-16  1:51                                                                                       ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-15 23:09 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Sandiford, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

On Tue, Sep 15, 2020 at 05:31:48PM -0500, Qing Zhao wrote:
> > But, scheduling runs *after* that, and then you need to prevent the
> > inserted (zeroing) insns from moving -- if you don't, the code after
> > some zeroing can be used as gadget!  You want to always have all
> > zeroing insns after *any* computational insn, or it becomes a gadget.
> 
> Please see the previous discussion, we have agreed to put the new pass   (pass_zero_call_used_regs) 
> in the beginning of the pass_late_compilation as following:

Yes, I know that at some point it was said that seemed like a good place
for it.

>      PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
> ++++  NEXT_PASS (pass_zero_call_used_regs);
>          NEXT_PASS (pass_compute_alignments);
>          NEXT_PASS (pass_variable_tracking);
>          NEXT_PASS (pass_free_cfg);
>          NEXT_PASS (pass_machine_reorg);
>          NEXT_PASS (pass_cleanup_barriers);
>          NEXT_PASS (pass_delay_slots);
> 
> Scheduling has been done already. 

But there are many more passes that can reorder things.  Like
machine_reorg (which is a big deal).  I don't think other passes here
are harmful (maybe the shorten stuff)?  But.  Targets can also insert
more passes here.

If you want the zeroing insns to stay with the return, you have to
express that in RTL.  Anything else is extremely fragile.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-15 23:09                                                                                     ` Segher Boessenkool
@ 2020-09-16  1:51                                                                                       ` Qing Zhao
  2020-09-16 10:35                                                                                         ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-16  1:51 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Richard Sandiford, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor



> On Sep 15, 2020, at 6:09 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Tue, Sep 15, 2020 at 05:31:48PM -0500, Qing Zhao wrote:
>>> But, scheduling runs *after* that, and then you need to prevent the
>>> inserted (zeroing) insns from moving -- if you don't, the code after
>>> some zeroing can be used as gadget!  You want to always have all
>>> zeroing insns after *any* computational insn, or it becomes a gadget.
>> 
>> Please see the previous discussion, we have agreed to put the new pass   (pass_zero_call_used_regs) 
>> in the beginning of the pass_late_compilation as following:
> 
> Yes, I know that at some point it was said that seemed like a good place
> for it.
> 
>>     PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
>> ++++  NEXT_PASS (pass_zero_call_used_regs);
>>         NEXT_PASS (pass_compute_alignments);
>>         NEXT_PASS (pass_variable_tracking);
>>         NEXT_PASS (pass_free_cfg);
>>         NEXT_PASS (pass_machine_reorg);
>>         NEXT_PASS (pass_cleanup_barriers);
>>         NEXT_PASS (pass_delay_slots);
>> 
>> Scheduling has been done already. 
> 
> But there are many more passes that can reorder things.  Like
> machine_reorg (which is a big deal).  I don't think other passes here
> are harmful (maybe the shorten stuff)?  But.  Targets can also insert
> more passes here.
> 
> If you want the zeroing insns to stay with the return, you have to
> express that in RTL.  

What do you mean by “express that in RTL”?
Could you please explain this in more details?

Do you mean to implement this in “targetm.gen_return” and “targetm.gen_simple_return”?

Qing

> Anything else is extremely fragile.
> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-16  1:51                                                                                       ` Qing Zhao
@ 2020-09-16 10:35                                                                                         ` Segher Boessenkool
  2020-09-16 20:57                                                                                           ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-16 10:35 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Sandiford, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

On Tue, Sep 15, 2020 at 08:51:57PM -0500, Qing Zhao wrote:
> > On Sep 15, 2020, at 6:09 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > If you want the zeroing insns to stay with the return, you have to
> > express that in RTL.  
> 
> What do you mean by “express that in RTL”?
> Could you please explain this in more details?

Exactly as I say: you need to tell in the RTL that the insns should stay
together.

Easiest is to just make it one RTL insn.  There are other ways, but
those do not help anything here afaics.

> Do you mean to implement this in “targetm.gen_return” and “targetm.gen_simple_return”?

That is the easiest way, yes.

> > Anything else is extremely fragile.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-16 10:35                                                                                         ` Segher Boessenkool
@ 2020-09-16 20:57                                                                                           ` Qing Zhao
  2020-09-17  6:17                                                                                             ` Richard Sandiford
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-16 20:57 UTC (permalink / raw)
  To: Segher Boessenkool, Richard Sandiford
  Cc: Richard Sandiford, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Segher and Richard, 

Now there are two major concerns from the discussion so far:

1. (From Richard):  Inserting zero insns should be done after pass_thread_prologue_and_epilogue since later passes (for example, pass_regrename) might introduce new used caller-saved registers. 
     So, we should do this in the beginning of pass_late_compilation (some targets wouldn’t cope with doing it later). 

2. (From Segher): The inserted zero insns should stay together with the return, no other insns should move in-between zero insns and return insns. Otherwise, a valid gadget could be formed. 

I think that both of the above 2 concerns are important and should be addressed for the correct implementation. 

In order to support 1,  we cannot implementing it in “targetm.gen_return()” and “targetm.gen_simple_return()”  since “targetm.gen_return()” and “targetm.gen_simple_return()” are called during pass_thread_prologue_and_epilogue, at that time, the use information still not correct. 

In order to support 2, enhancing EPILOGUE_USES to include the zeroed registgers is NOT enough to prevent all the zero insns from moving around.  More restrictions need to be added to these new zero insns.  (I think that marking these new zeroed registers as “unspec_volatile” at RTL level is necessary to prevent them from deleting from moving around). 


So, based on the above, I propose the following approach that will resolve the above 2 concerns:

1. Add 2 new target hooks:
   A. targetm.pro_epilogue_use (reg)
   This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
   prevent deleting register setting instructions in prologue and epilogue.

   B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs)
   This hook will gen a sequence of zeroing insns that zero the registers that specified in NEED_ZEROED_HARDREGS.

    A default handler of “gen_zero_call_used_regs” could be defined in middle end, which use mov insns to zero registers, and then use “targetm.pro_epilogue_use(reg)” to mark each zeroed registers. 


2. Add  a new pass, pass_zero_call_used_regs,  in the beginning of pass_late_compilation. 

    This pass will search all “return”s, and compute the hard register set for zeroing, “need_zeroed_hardregs”, based on data flow information, user request, and function abi. 
    Then call targetm.gen_zero_call_used_regs(need_zeroed_hardregs).

3. X86 backend will implement a special version for “gen_zero_call_used_regs”, and “pro_epilogue_use”.


Let me know if you have any more comment on this approach.

thanks.

Qing




> On Sep 16, 2020, at 5:35 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Tue, Sep 15, 2020 at 08:51:57PM -0500, Qing Zhao wrote:
>>> On Sep 15, 2020, at 6:09 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>> If you want the zeroing insns to stay with the return, you have to
>>> express that in RTL.  
>> 
>> What do you mean by “express that in RTL”?
>> Could you please explain this in more details?
> 
> Exactly as I say: you need to tell in the RTL that the insns should stay
> together.
> 
> Easiest is to just make it one RTL insn.  There are other ways, but
> those do not help anything here afaics.
> 
>> Do you mean to implement this in “targetm.gen_return” and “targetm.gen_simple_return”?
> 
> That is the easiest way, yes.
> 
>>> Anything else is extremely fragile.
> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-16 20:57                                                                                           ` Qing Zhao
@ 2020-09-17  6:17                                                                                             ` Richard Sandiford
  2020-09-17 14:40                                                                                               ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Richard Sandiford @ 2020-09-17  6:17 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> Segher and Richard, 
>
> Now there are two major concerns from the discussion so far:
>
> 1. (From Richard):  Inserting zero insns should be done after pass_thread_prologue_and_epilogue since later passes (for example, pass_regrename) might introduce new used caller-saved registers. 
>      So, we should do this in the beginning of pass_late_compilation (some targets wouldn’t cope with doing it later). 
>
> 2. (From Segher): The inserted zero insns should stay together with the return, no other insns should move in-between zero insns and return insns. Otherwise, a valid gadget could be formed. 
>
> I think that both of the above 2 concerns are important and should be addressed for the correct implementation. 
>
> In order to support 1,  we cannot implementing it in “targetm.gen_return()” and “targetm.gen_simple_return()”  since “targetm.gen_return()” and “targetm.gen_simple_return()” are called during pass_thread_prologue_and_epilogue, at that time, the use information still not correct. 
>
> In order to support 2, enhancing EPILOGUE_USES to include the zeroed registgers is NOT enough to prevent all the zero insns from moving around.

Right.  The purpose of EPILOGUE_USES was instead to stop the moves from
being deleted as dead.

> More restrictions need to be added to these new zero insns.  (I think that marking these new zeroed registers as “unspec_volatile” at RTL level is necessary to prevent them from deleting from moving around). 
>
>
> So, based on the above, I propose the following approach that will resolve the above 2 concerns:
>
> 1. Add 2 new target hooks:
>    A. targetm.pro_epilogue_use (reg)
>    This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
>    prevent deleting register setting instructions in prologue and epilogue.
>
>    B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs)
>    This hook will gen a sequence of zeroing insns that zero the registers that specified in NEED_ZEROED_HARDREGS.
>
>     A default handler of “gen_zero_call_used_regs” could be defined in middle end, which use mov insns to zero registers, and then use “targetm.pro_epilogue_use(reg)” to mark each zeroed registers. 

This sounds like you're going back to using:

(insn 18 16 19 2 (set (reg:SI 1 dx)
        (const_int 0 [0])) "t10.c":11:1 -1
     (nil))
(insn 19 18 20 2 (unspec_volatile [
            (reg:SI 1 dx)
        ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
     (nil))

This also doesn't prevent the zeroing from being moved around.  Like the
EPILOGUE_USES approach, it only prevents the clearing from being removed
as dead.  I still think that the EPILOGUE_USES approach is the better
way of doing that.

In other words: the use insns themselves are volatile and so can't be
moved relative to each other and to other volatile insns.  But the uses
are fake instructions that don't do anything.  The preceding zeroing
instructions are just normal instructions that can be moved around
freely before their respective uses.

I don't think there's a foolproof way of preventing an unknown target
machine_reorg pass from moving the instructions around.  But since we
don't have unknown machine_reorgs (at least not in-tree), I think
instead we should be prepared to patch machine_reorgs where necessary
to ensure that they do the right thing.

If you want to increase the chances that machine_reorgs don't need to be
patched, you could either:

(a) to make the zeroing instructions themselves volatile or
(b) to insert a volatile reference to the register before (rather than
    after) the zeroing instruction

IMO (b) is the way to go, because it avoids the need to define special
volatile move patterns for each type of register.  (b) would be needed
on top of (rather than instead of) the EPILOGUE_USES thing.

I don't think we need a new target-specific unspec_volatile code to do (b).
We can just use an automatically-generated volatile asm to clobber the
registers first.  See e.g. how expand_asm_memory_blockage handles memory
scheduling barriers.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-17  6:17                                                                                             ` Richard Sandiford
@ 2020-09-17 14:40                                                                                               ` Qing Zhao
  2020-09-17 16:27                                                                                                 ` Richard Sandiford
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-17 14:40 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor



> On Sep 17, 2020, at 1:17 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> Segher and Richard, 
>> 
>> Now there are two major concerns from the discussion so far:
>> 
>> 1. (From Richard):  Inserting zero insns should be done after pass_thread_prologue_and_epilogue since later passes (for example, pass_regrename) might introduce new used caller-saved registers. 
>>     So, we should do this in the beginning of pass_late_compilation (some targets wouldn’t cope with doing it later). 
>> 
>> 2. (From Segher): The inserted zero insns should stay together with the return, no other insns should move in-between zero insns and return insns. Otherwise, a valid gadget could be formed. 
>> 
>> I think that both of the above 2 concerns are important and should be addressed for the correct implementation. 
>> 
>> In order to support 1,  we cannot implementing it in “targetm.gen_return()” and “targetm.gen_simple_return()”  since “targetm.gen_return()” and “targetm.gen_simple_return()” are called during pass_thread_prologue_and_epilogue, at that time, the use information still not correct. 
>> 
>> In order to support 2, enhancing EPILOGUE_USES to include the zeroed registgers is NOT enough to prevent all the zero insns from moving around.
> 
> Right.  The purpose of EPILOGUE_USES was instead to stop the moves from
> being deleted as dead.
> 
>> More restrictions need to be added to these new zero insns.  (I think that marking these new zeroed registers as “unspec_volatile” at RTL level is necessary to prevent them from deleting from moving around). 
>> 
>> 
>> So, based on the above, I propose the following approach that will resolve the above 2 concerns:
>> 
>> 1. Add 2 new target hooks:
>>   A. targetm.pro_epilogue_use (reg)
>>   This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
>>   prevent deleting register setting instructions in prologue and epilogue.
>> 
>>   B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs)
>>   This hook will gen a sequence of zeroing insns that zero the registers that specified in NEED_ZEROED_HARDREGS.
>> 
>>    A default handler of “gen_zero_call_used_regs” could be defined in middle end, which use mov insns to zero registers, and then use “targetm.pro_epilogue_use(reg)” to mark each zeroed registers. 
> 
> This sounds like you're going back to using:
> 
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>        (const_int 0 [0])) "t10.c":11:1 -1
>     (nil))
> (insn 19 18 20 2 (unspec_volatile [
>            (reg:SI 1 dx)
>        ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>     (nil))
> 
> This also doesn't prevent the zeroing from being moved around.  Like the
> EPILOGUE_USES approach, it only prevents the clearing from being removed
> as dead.  I still think that the EPILOGUE_USES approach is the better
> way of doing that.

The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)

;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
;; all of memory.  This blocks insns from being moved across this point.

I am not very familiar with how the unspec_volatile actually works in gcc’s data flow analysis, my understanding  from the above is, the RTL insns marked with UNSPEC_volatile would be served as a barrier that no other insns can move across this point. At the same time, since the marked RTL insns is considered to use and clobber all hard registers and memory, it cannot be deleted either. 

So, I thought that “UNSPEC_volatile” should be stronger than “EPILOGUE_USES”. And it can serve the purpose of preventing zeroing insns from deleting and moving. 



> 
> In other words: the use insns themselves are volatile and so can't be
> moved relative to each other and to other volatile insns.  But the uses
> are fake instructions that don't do anything.  The preceding zeroing
> instructions are just normal instructions that can be moved around
> freely before their respective uses.

But since the UNSPEC_volatile insns is considered as a barrier, no other insns can move across them, then the zero insns cannot be moved around too, right?

> 
> I don't think there's a foolproof way of preventing an unknown target
> machine_reorg pass from moving the instructions around.  But since we
> don't have unknown machine_reorgs (at least not in-tree), I think
> instead we should be prepared to patch machine_reorgs where necessary
> to ensure that they do the right thing.
> 
> If you want to increase the chances that machine_reorgs don't need to be
> patched, you could either:
> 
> (a) to make the zeroing instructions themselves volatile or
> (b) to insert a volatile reference to the register before (rather than
>    after) the zeroing instruction
> 
> IMO (b) is the way to go, because it avoids the need to define special
> volatile move patterns for each type of register.  (b) would be needed
> on top of (rather than instead of) the EPILOGUE_USES thing.
> 
Okay, will take approach b. 

But I still don’t quite understand why we still need “EPILOUGE_USES”? What’s the additional benefit from EPILOGUE_USES?

> I don't think we need a new target-specific unspec_volatile code to do (b).
> We can just use an automatically-generated volatile asm to clobber the
> registers first.  See e.g. how expand_asm_memory_blockage handles memory
> scheduling barriers.
/* Generate asm volatile("" : : : "memory") as the memory blockage.  */

static void
expand_asm_memory_blockage (void)
{
  rtx asm_op, clob;

  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
                                 rtvec_alloc (0), rtvec_alloc (0),
                                 rtvec_alloc (0), UNKNOWN_LOCATION);
  MEM_VOLATILE_P (asm_op) = 1;

  clob = gen_rtx_SCRATCH (VOIDmode);
  clob = gen_rtx_MEM (BLKmode, clob);
  clob = gen_rtx_CLOBBER (VOIDmode, clob);

  emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
}


As the following? 

/* Generate asm volatile("" : : : “regno") for REGNO.   */

static void
expand_asm_reg_volatile (machine_mode mode, unsigned int regno)
{
  rtx asm_op, clob;

  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
                                 rtvec_alloc (0), rtvec_alloc (0),
                                 rtvec_alloc (0), UNKNOWN_LOCATION);
  MEM_VOLATILE_P (asm_op) = 1;

  clob = gen_rtx_REG (mode, regno);
  clob = gen_rtx_CLOBBER (VOIDmode, clob);

  emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
}

Is the above correct? 

thanks.

Qing

> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-17 14:40                                                                                               ` Qing Zhao
@ 2020-09-17 16:27                                                                                                 ` Richard Sandiford
  2020-09-17 19:07                                                                                                   ` Qing Zhao
  2020-09-17 22:26                                                                                                   ` Segher Boessenkool
  0 siblings, 2 replies; 188+ messages in thread
From: Richard Sandiford @ 2020-09-17 16:27 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 17, 2020, at 1:17 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> Segher and Richard, 
>>> 
>>> Now there are two major concerns from the discussion so far:
>>> 
>>> 1. (From Richard):  Inserting zero insns should be done after pass_thread_prologue_and_epilogue since later passes (for example, pass_regrename) might introduce new used caller-saved registers. 
>>>     So, we should do this in the beginning of pass_late_compilation (some targets wouldn’t cope with doing it later). 
>>> 
>>> 2. (From Segher): The inserted zero insns should stay together with the return, no other insns should move in-between zero insns and return insns. Otherwise, a valid gadget could be formed. 
>>> 
>>> I think that both of the above 2 concerns are important and should be addressed for the correct implementation. 
>>> 
>>> In order to support 1,  we cannot implementing it in “targetm.gen_return()” and “targetm.gen_simple_return()”  since “targetm.gen_return()” and “targetm.gen_simple_return()” are called during pass_thread_prologue_and_epilogue, at that time, the use information still not correct. 
>>> 
>>> In order to support 2, enhancing EPILOGUE_USES to include the zeroed registgers is NOT enough to prevent all the zero insns from moving around.
>> 
>> Right.  The purpose of EPILOGUE_USES was instead to stop the moves from
>> being deleted as dead.
>> 
>>> More restrictions need to be added to these new zero insns.  (I think that marking these new zeroed registers as “unspec_volatile” at RTL level is necessary to prevent them from deleting from moving around). 
>>> 
>>> 
>>> So, based on the above, I propose the following approach that will resolve the above 2 concerns:
>>> 
>>> 1. Add 2 new target hooks:
>>>   A. targetm.pro_epilogue_use (reg)
>>>   This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
>>>   prevent deleting register setting instructions in prologue and epilogue.
>>> 
>>>   B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs)
>>>   This hook will gen a sequence of zeroing insns that zero the registers that specified in NEED_ZEROED_HARDREGS.
>>> 
>>>    A default handler of “gen_zero_call_used_regs” could be defined in middle end, which use mov insns to zero registers, and then use “targetm.pro_epilogue_use(reg)” to mark each zeroed registers. 
>> 
>> This sounds like you're going back to using:
>> 
>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>        (const_int 0 [0])) "t10.c":11:1 -1
>>     (nil))
>> (insn 19 18 20 2 (unspec_volatile [
>>            (reg:SI 1 dx)
>>        ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>     (nil))
>> 
>> This also doesn't prevent the zeroing from being moved around.  Like the
>> EPILOGUE_USES approach, it only prevents the clearing from being removed
>> as dead.  I still think that the EPILOGUE_USES approach is the better
>> way of doing that.
>
> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>
> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
> ;; all of memory.  This blocks insns from being moved across this point.

Heh, it looks like that comment dates back to 1994. :-)

The comment is no longer correct though.  I wasn't around at the time,
but I assume the comment was only locally true even then.

If what the comment said was true, then something like:

(define_insn "cld"
  [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
  ""
  "cld"
  [(set_attr "length" "1")
   (set_attr "length_immediate" "0")
   (set_attr "modrm" "0")])

would invalidate the entire register file and so would require all values
to be spilt to the stack around the CLD.

> I am not very familiar with how the unspec_volatile actually works in gcc’s data flow analysis, my understanding  from the above is, the RTL insns marked with UNSPEC_volatile would be served as a barrier that no other insns can move across this point. At the same time, since the marked RTL insns is considered to use and clobber all hard registers and memory, it cannot be deleted either. 

UNSPEC_VOLATILEs can't be deleted.  And they can't be reordered relative
to other UNSPEC_VOLATILEs.  But the problem with:

(insn 18 16 19 2 (set (reg:SI 1 dx)
       (const_int 0 [0])) "t10.c":11:1 -1
    (nil))
(insn 19 18 20 2 (unspec_volatile [
           (reg:SI 1 dx)
       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
    (nil))

is that the volatile occurs *after* the zeroing instruction.  So at best
it can stop insn 18 moving further down, to be closer to the return
instruction.  There's nothing to stop insn 18 moving further up,
away from the return instruction, which AIUI is what you're trying
to prevent.  E.g. suppose we had:

(insn 17 … pop a register other than dx from the stack …)
(insn 18 16 19 2 (set (reg:SI 1 dx)
       (const_int 0 [0])) "t10.c":11:1 -1
    (nil))
(insn 19 18 20 2 (unspec_volatile [
           (reg:SI 1 dx)
       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
    (nil))

There is nothing to stop an rtl pass reordering that to:

(insn 18 16 19 2 (set (reg:SI 1 dx)
       (const_int 0 [0])) "t10.c":11:1 -1
    (nil))
(insn 17 … pop a register other than dx from the stack …)
(insn 19 18 20 2 (unspec_volatile [
           (reg:SI 1 dx)
       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
    (nil))

There's also no dataflow reason why this couldn't be reordered to:

(insn 18 16 19 2 (set (reg:SI 1 dx)
       (const_int 0 [0])) "t10.c":11:1 -1
    (nil))
(insn 19 18 20 2 (unspec_volatile [
           (reg:SI 1 dx)
       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
    (nil))
(insn 17 … pop a register other than dx from the stack …)

So…

> So, I thought that “UNSPEC_volatile” should be stronger than “EPILOGUE_USES”. And it can serve the purpose of preventing zeroing insns from deleting and moving. 

…both EPILOGUE_USES and UNSPEC_VOLATILE would be effective ways of
stopping insn 18 from being deleted.  But an UNSPEC_VOLATILE after
the instruction would IMO be counterproductive: it would stop the
zeroing instructions that we want to be close to the return instruction
from moving “closer” to the return instruction, but it wouldn't do the
same for unrelated instructions.  So if anything, the unspec_volatile
could increase the chances that something unrelated to the register
zeroing is moved later than the register zeroing.  E.g. this could
happen when filling delayed branch slots.

>> I don't think there's a foolproof way of preventing an unknown target
>> machine_reorg pass from moving the instructions around.  But since we
>> don't have unknown machine_reorgs (at least not in-tree), I think
>> instead we should be prepared to patch machine_reorgs where necessary
>> to ensure that they do the right thing.
>> 
>> If you want to increase the chances that machine_reorgs don't need to be
>> patched, you could either:
>> 
>> (a) to make the zeroing instructions themselves volatile or
>> (b) to insert a volatile reference to the register before (rather than
>>    after) the zeroing instruction
>> 
>> IMO (b) is the way to go, because it avoids the need to define special
>> volatile move patterns for each type of register.  (b) would be needed
>> on top of (rather than instead of) the EPILOGUE_USES thing.
>> 
> Okay, will take approach b. 
>
> But I still don’t quite understand why we still need “EPILOUGE_USES”? What’s the additional benefit from EPILOGUE_USES?

The asm for (b) goes before the instruction, so we'd have:

(insn 17 … new asm …)
(insn 18 16 19 2 (set (reg:SI 1 dx)
       (const_int 0 [0])) "t10.c":11:1 -1
    (nil))
(insn 19 … return …)

But something has to tell the df machinery that the value of edx
matters on return from the function, otherwise insn 18 could be
deleted as dead.  Adding edx to EPILOGUE_USES provides that information
and stops the instruction from being deleted.

>> I don't think we need a new target-specific unspec_volatile code to do (b).
>> We can just use an automatically-generated volatile asm to clobber the
>> registers first.  See e.g. how expand_asm_memory_blockage handles memory
>> scheduling barriers.
> /* Generate asm volatile("" : : : "memory") as the memory blockage.  */
>
> static void
> expand_asm_memory_blockage (void)
> {
>   rtx asm_op, clob;
>
>   asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>                                  rtvec_alloc (0), rtvec_alloc (0),
>                                  rtvec_alloc (0), UNKNOWN_LOCATION);
>   MEM_VOLATILE_P (asm_op) = 1;
>
>   clob = gen_rtx_SCRATCH (VOIDmode);
>   clob = gen_rtx_MEM (BLKmode, clob);
>   clob = gen_rtx_CLOBBER (VOIDmode, clob);
>
>   emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
> }
>
>
> As the following? 
>
> /* Generate asm volatile("" : : : “regno") for REGNO.   */
>
> static void
> expand_asm_reg_volatile (machine_mode mode, unsigned int regno)
> {
>   rtx asm_op, clob;
>
>   asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>                                  rtvec_alloc (0), rtvec_alloc (0),
>                                  rtvec_alloc (0), UNKNOWN_LOCATION);
>   MEM_VOLATILE_P (asm_op) = 1;
>
>   clob = gen_rtx_REG (mode, regno);
>   clob = gen_rtx_CLOBBER (VOIDmode, clob);
>
>   emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
> }
>
> Is the above correct? 

Yeah, looks good.  You should be able to clobber all the registers you
want to clear in one asm.  For extra safety, it might be worth including
a (mem:BLK (scratch)) clobber too, so that memory instructions don't get
moved across the asm.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-17 16:27                                                                                                 ` Richard Sandiford
@ 2020-09-17 19:07                                                                                                   ` Qing Zhao
  2020-09-22 17:06                                                                                                     ` Richard Sandiford
  2020-09-17 22:26                                                                                                   ` Segher Boessenkool
  1 sibling, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-17 19:07 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor



> On Sep 17, 2020, at 11:27 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>> On Sep 17, 2020, at 1:17 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> Segher and Richard, 
>>>> 
>>>> Now there are two major concerns from the discussion so far:
>>>> 
>>>> 1. (From Richard):  Inserting zero insns should be done after pass_thread_prologue_and_epilogue since later passes (for example, pass_regrename) might introduce new used caller-saved registers. 
>>>>    So, we should do this in the beginning of pass_late_compilation (some targets wouldn’t cope with doing it later). 
>>>> 
>>>> 2. (From Segher): The inserted zero insns should stay together with the return, no other insns should move in-between zero insns and return insns. Otherwise, a valid gadget could be formed. 
>>>> 
>>>> I think that both of the above 2 concerns are important and should be addressed for the correct implementation. 
>>>> 
>>>> In order to support 1,  we cannot implementing it in “targetm.gen_return()” and “targetm.gen_simple_return()”  since “targetm.gen_return()” and “targetm.gen_simple_return()” are called during pass_thread_prologue_and_epilogue, at that time, the use information still not correct. 
>>>> 
>>>> In order to support 2, enhancing EPILOGUE_USES to include the zeroed registgers is NOT enough to prevent all the zero insns from moving around.
>>> 
>>> Right.  The purpose of EPILOGUE_USES was instead to stop the moves from
>>> being deleted as dead.
>>> 
>>>> More restrictions need to be added to these new zero insns.  (I think that marking these new zeroed registers as “unspec_volatile” at RTL level is necessary to prevent them from deleting from moving around). 
>>>> 
>>>> 
>>>> So, based on the above, I propose the following approach that will resolve the above 2 concerns:
>>>> 
>>>> 1. Add 2 new target hooks:
>>>>  A. targetm.pro_epilogue_use (reg)
>>>>  This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
>>>>  prevent deleting register setting instructions in prologue and epilogue.
>>>> 
>>>>  B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs)
>>>>  This hook will gen a sequence of zeroing insns that zero the registers that specified in NEED_ZEROED_HARDREGS.
>>>> 
>>>>   A default handler of “gen_zero_call_used_regs” could be defined in middle end, which use mov insns to zero registers, and then use “targetm.pro_epilogue_use(reg)” to mark each zeroed registers. 
>>> 
>>> This sounds like you're going back to using:
>>> 
>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>>       (const_int 0 [0])) "t10.c":11:1 -1
>>>    (nil))
>>> (insn 19 18 20 2 (unspec_volatile [
>>>           (reg:SI 1 dx)
>>>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>>    (nil))
>>> 
>>> This also doesn't prevent the zeroing from being moved around.  Like the
>>> EPILOGUE_USES approach, it only prevents the clearing from being removed
>>> as dead.  I still think that the EPILOGUE_USES approach is the better
>>> way of doing that.
>> 
>> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>> 
>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>> ;; all of memory.  This blocks insns from being moved across this point.
> 
> Heh, it looks like that comment dates back to 1994. :-)
> 
> The comment is no longer correct though.  I wasn't around at the time,
> but I assume the comment was only locally true even then.
> 
> If what the comment said was true, then something like:
> 
> (define_insn "cld"
>  [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>  ""
>  "cld"
>  [(set_attr "length" "1")
>   (set_attr "length_immediate" "0")
>   (set_attr "modrm" "0")])
> 
> would invalidate the entire register file and so would require all values
> to be spilt to the stack around the CLD.

Okay, thanks for the info. 
then, what’s the current definition of UNSPEC_VOLATILE? 


> 
>> I am not very familiar with how the unspec_volatile actually works in gcc’s data flow analysis, my understanding  from the above is, the RTL insns marked with UNSPEC_volatile would be served as a barrier that no other insns can move across this point. At the same time, since the marked RTL insns is considered to use and clobber all hard registers and memory, it cannot be deleted either. 
> 
> UNSPEC_VOLATILEs can't be deleted.  And they can't be reordered relative
> to other UNSPEC_VOLATILEs.  But the problem with:
> 
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>       (const_int 0 [0])) "t10.c":11:1 -1
>    (nil))
> (insn 19 18 20 2 (unspec_volatile [
>           (reg:SI 1 dx)
>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>    (nil))
> 
> is that the volatile occurs *after* the zeroing instruction.  So at best
> it can stop insn 18 moving further down, to be closer to the return
> instruction.  There's nothing to stop insn 18 moving further up,
> away from the return instruction, which AIUI is what you're trying
> to prevent.  E.g. suppose we had:
> 
> (insn 17 … pop a register other than dx from the stack …)
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>       (const_int 0 [0])) "t10.c":11:1 -1
>    (nil))
> (insn 19 18 20 2 (unspec_volatile [
>           (reg:SI 1 dx)
>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>    (nil))
> 
> There is nothing to stop an rtl pass reordering that to:
> 
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>       (const_int 0 [0])) "t10.c":11:1 -1
>    (nil))
> (insn 17 … pop a register other than dx from the stack …)
> (insn 19 18 20 2 (unspec_volatile [
>           (reg:SI 1 dx)
>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>    (nil))

Yes, agreed. And then the volatile marking insn should be put BEFORE the zeroing insn. 

> 
> There's also no dataflow reason why this couldn't be reordered to:
> 
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>       (const_int 0 [0])) "t10.c":11:1 -1
>    (nil))
> (insn 19 18 20 2 (unspec_volatile [
>           (reg:SI 1 dx)
>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>    (nil))
> (insn 17 … pop a register other than dx from the stack …)
> 

This is the place I don’t quite agree at this moment, maybe I still not quite understand the “UNSPEC_volatile”.

I checked several places in GCC that handle “UNSPEC_VOLATILE”, for example,  for the routine “can_move_insns_across” in gcc/df-problem.c:

      if (NONDEBUG_INSN_P (insn))
        {
          if (volatile_insn_p (PATTERN (insn)))
            return false;

From my understanding of reading the code, when an insn is UNSPEC_VOLATILE, another insn will NOT be able to move across it. 

Then for the above example, the insn 17 should Not be moved across insn 19 either.

Let me know if I miss anything important. 


> So…
> 
>> So, I thought that “UNSPEC_volatile” should be stronger than “EPILOGUE_USES”. And it can serve the purpose of preventing zeroing insns from deleting and moving. 
> 
> …both EPILOGUE_USES and UNSPEC_VOLATILE would be effective ways of
> stopping insn 18 from being deleted.  But an UNSPEC_VOLATILE after
> the instruction would IMO be counterproductive: it would stop the
> zeroing instructions that we want to be close to the return instruction
> from moving “closer” to the return instruction, but it wouldn't do the
> same for unrelated instructions.  So if anything, the unspec_volatile
> could increase the chances that something unrelated to the register
> zeroing is moved later than the register zeroing.  E.g. this could
> happen when filling delayed branch slots.
> 
>>> I don't think there's a foolproof way of preventing an unknown target
>>> machine_reorg pass from moving the instructions around.  But since we
>>> don't have unknown machine_reorgs (at least not in-tree), I think
>>> instead we should be prepared to patch machine_reorgs where necessary
>>> to ensure that they do the right thing.
>>> 
>>> If you want to increase the chances that machine_reorgs don't need to be
>>> patched, you could either:
>>> 
>>> (a) to make the zeroing instructions themselves volatile or
>>> (b) to insert a volatile reference to the register before (rather than
>>>   after) the zeroing instruction
>>> 
>>> IMO (b) is the way to go, because it avoids the need to define special
>>> volatile move patterns for each type of register.  (b) would be needed
>>> on top of (rather than instead of) the EPILOGUE_USES thing.
>>> 
>> Okay, will take approach b. 
>> 
>> But I still don’t quite understand why we still need “EPILOUGE_USES”? What’s the additional benefit from EPILOGUE_USES?
> 
> The asm for (b) goes before the instruction, so we'd have:
> 
> (insn 17 … new asm …)
> (insn 18 16 19 2 (set (reg:SI 1 dx)
>       (const_int 0 [0])) "t10.c":11:1 -1
>    (nil))
> (insn 19 … return …)
> 
> But something has to tell the df machinery that the value of edx
> matters on return from the function, otherwise insn 18 could be
> deleted as dead.  Adding edx to EPILOGUE_USES provides that information
> and stops the instruction from being deleted.


In the above, insn 17 will be something like:

(insn 17 ...(unspec_volatile [  (reg:SI 1 dx)
    ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1 
(nil))

So, the reg edx is marked as “UNSPEC_volatile” already, that should mean the value of edx matters on return from the function already, my understanding is that df should automatically pick up the “UNSPEC_VOLATILE” insn and it’s operands.   “UNSPEC_VOLATILE” insn should serve the same purpose as putting “edx” to EPILOGUE_USES. 

Do I miss anything here?

> 
>>> I don't think we need a new target-specific unspec_volatile code to do (b).
>>> We can just use an automatically-generated volatile asm to clobber the
>>> registers first.  See e.g. how expand_asm_memory_blockage handles memory
>>> scheduling barriers.
>> /* Generate asm volatile("" : : : "memory") as the memory blockage.  */
>> 
>> static void
>> expand_asm_memory_blockage (void)
>> {
>>  rtx asm_op, clob;
>> 
>>  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>>                                 rtvec_alloc (0), rtvec_alloc (0),
>>                                 rtvec_alloc (0), UNKNOWN_LOCATION);
>>  MEM_VOLATILE_P (asm_op) = 1;
>> 
>>  clob = gen_rtx_SCRATCH (VOIDmode);
>>  clob = gen_rtx_MEM (BLKmode, clob);
>>  clob = gen_rtx_CLOBBER (VOIDmode, clob);
>> 
>>  emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>> }
>> 
>> 
>> As the following? 
>> 
>> /* Generate asm volatile("" : : : “regno") for REGNO.   */
>> 
>> static void
>> expand_asm_reg_volatile (machine_mode mode, unsigned int regno)
>> {
>>  rtx asm_op, clob;
>> 
>>  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>>                                 rtvec_alloc (0), rtvec_alloc (0),
>>                                 rtvec_alloc (0), UNKNOWN_LOCATION);
>>  MEM_VOLATILE_P (asm_op) = 1;
>> 
>>  clob = gen_rtx_REG (mode, regno);
>>  clob = gen_rtx_CLOBBER (VOIDmode, clob);
>> 
>>  emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>> }
>> 
>> Is the above correct? 
> 
> Yeah, looks good.  You should be able to clobber all the registers you
> want to clear in one asm.

How to do this?

thanks.

Qing
>  For extra safety, it might be worth including
> a (mem:BLK (scratch)) clobber too, so that memory instructions don't get
> moved across the asm.
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-17 16:27                                                                                                 ` Richard Sandiford
  2020-09-17 19:07                                                                                                   ` Qing Zhao
@ 2020-09-17 22:26                                                                                                   ` Segher Boessenkool
  1 sibling, 0 replies; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-17 22:26 UTC (permalink / raw)
  To: Qing Zhao, Kees Cook, Kees Cook via Gcc-patches, Jakub Jelinek,
	Uros Bizjak, Rodriguez Bahena, Victor, richard.sandiford

On Thu, Sep 17, 2020 at 05:27:59PM +0100, Richard Sandiford wrote:
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> > The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
> >
> > ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
> > ;; all of memory.  This blocks insns from being moved across this point.
> 
> Heh, it looks like that comment dates back to 1994. :-)
> 
> The comment is no longer correct though.  I wasn't around at the time,
> but I assume the comment was only locally true even then.

I think it was never true at all, even.

An unspec_volatile is just an unspec that is volatile, i.e. it needs to
be executed in the real machine exactly like in the abstract C machine
(wrt sequence points).  It typically does something the compiler does
not model (say, to resources it does not know about), but you can use it
for anything you want executed approximately as written.

> UNSPEC_VOLATILEs can't be deleted.

(If they are executed at all, anyway ;-) )


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-11 21:44                                                               ` Richard Sandiford
  2020-09-11 22:24                                                                 ` Qing Zhao
@ 2020-09-18 20:31                                                                 ` Qing Zhao
  2020-09-18 22:51                                                                   ` Segher Boessenkool
  2020-09-21  7:23                                                                   ` Richard Sandiford
  1 sibling, 2 replies; 188+ messages in thread
From: Qing Zhao @ 2020-09-18 20:31 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Hi, Richard,

During my implementation of the new version of the patch. I still feel that it’s not practical to add a default definition in the middle end to just use move patterns to zero each selected register. 

The major issues are:

There are some target specific information on how to define “general register” set and “all register” set,  we have to add a new specific target hook to get such target specific information and pass to middle-end. 


For example, on X86, for CALL_USED_REGISTERS, we have:

#define CALL_USED_REGISTERS                                     \
/*ax,dx,cx,bx,si,di,bp,sp,st,st1,st2,st3,st4,st5,st6,st7*/      \
{  1, 1, 1, 0, 4, 4, 0, 1, 1,  1,  1,  1,  1,  1,  1,  1,       \
/*arg,flags,fpsr,frame*/                                        \
    1,   1,    1,    1,                                         \
/*xmm0,xmm1,xmm2,xmm3,xmm4,xmm5,xmm6,xmm7*/                     \
     1,   1,   1,   1,   1,   1,   6,   6,                      \
/* mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7*/   


From the above, we can see “st0 to st7” are call_used_registers for x86, however, we should not zero these registers on x86. 

Such details is only known by x86 backend. 

I guess that other platforms might have similar issue. 

If we still want  a default definition in middle end to generate the zeroing insn for selected registers, I have to add another target hook, say, “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)” to check whether a register should be zeroed based on gpr_only (general register only)  and target specific decision.   I will provide a x86 implementation for this target hook in this patch. 

Other targets have to implement this new target hook to utilize the default handler. 

Let me know your opinion:

A.  Will not provide default definition in middle end to generate the zeroing insn for selected registers.  Move the generation work all to target; X86 implementation will be provided;

OR:

B.  Will provide a default definition in middle end to generate the zeroing insn for selected registers. Then need to add a new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)”, same as A, X86 implementation will be provided in my patch. 


thanks.

Qing


> On Sep 11, 2020, at 4:44 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Having a target hook sounds good, but I think it should have a
> default definition that just uses the move patterns to zero each
> selected register.  I expect the default will be good enough for
> most targets.
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-18 20:31                                                                 ` Qing Zhao
@ 2020-09-18 22:51                                                                   ` Segher Boessenkool
  2020-09-21 14:13                                                                     ` Qing Zhao
  2020-09-21  7:23                                                                   ` Richard Sandiford
  1 sibling, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-18 22:51 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Sandiford, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Hi!

On Fri, Sep 18, 2020 at 03:31:12PM -0500, Qing Zhao wrote:
> Let me know your opinion:
> 
> A.  Will not provide default definition in middle end to generate the zeroing insn for selected registers.  Move the generation work all to target; X86 implementation will be provided;
> 
> OR:
> 
> B.  Will provide a default definition in middle end to generate the zeroing insn for selected registers. Then need to add a new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)”, same as A, X86 implementation will be provided in my patch. 

Is this just to make the xor thing work?  i386 has a peephole to
transform the mov to a xor for this (and the backend could just handle
it in its mov<M> patterns, maybe a peephole was easier for i386, no
idea).


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-18 20:31                                                                 ` Qing Zhao
  2020-09-18 22:51                                                                   ` Segher Boessenkool
@ 2020-09-21  7:23                                                                   ` Richard Sandiford
  2020-09-21 14:29                                                                     ` Qing Zhao
  1 sibling, 1 reply; 188+ messages in thread
From: Richard Sandiford @ 2020-09-21  7:23 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> Hi, Richard,
>
> During my implementation of the new version of the patch. I still feel that it’s not practical to add a default definition in the middle end to just use move patterns to zero each selected register. 
>
> The major issues are:
>
> There are some target specific information on how to define “general register” set and “all register” set,  we have to add a new specific target hook to get such target specific information and pass to middle-end. 

GENERAL_REGS and ALL_REGS are already concepts that target-independent
code knows about though.  I think the non-fixed subsets of those would
make good starting sets, which the target could whittle down it wanted
or needed to.

> For example, on X86, for CALL_USED_REGISTERS, we have:
>
> #define CALL_USED_REGISTERS                                     \
> /*ax,dx,cx,bx,si,di,bp,sp,st,st1,st2,st3,st4,st5,st6,st7*/      \
> {  1, 1, 1, 0, 4, 4, 0, 1, 1,  1,  1,  1,  1,  1,  1,  1,       \
> /*arg,flags,fpsr,frame*/                                        \
>     1,   1,    1,    1,                                         \
> /*xmm0,xmm1,xmm2,xmm3,xmm4,xmm5,xmm6,xmm7*/                     \
>      1,   1,   1,   1,   1,   1,   6,   6,                      \
> /* mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7*/   
>
>
> From the above, we can see “st0 to st7” are call_used_registers for x86, however, we should not zero these registers on x86. 
>
> Such details is only known by x86 backend. 
>
> I guess that other platforms might have similar issue. 

They might, but that doesn't disprove that there's a sensisble default
choice that works for most targets.

FWIW, stack registers themselves are already exposed outside targets
(see reg-stack.c, although since x86 is the only port that uses it,
the main part of it is effectively target-dependent at the moment).
Similarly for register windows.

> If we still want  a default definition in middle end to generate the zeroing insn for selected registers, I have to add another target hook, say, “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)” to check whether a register should be zeroed based on gpr_only (general register only)  and target specific decision.   I will provide a x86 implementation for this target hook in this patch. 
>
> Other targets have to implement this new target hook to utilize the default handler. 
>
> Let me know your opinion:
>
> A.  Will not provide default definition in middle end to generate the zeroing insn for selected registers.  Move the generation work all to target; X86 implementation will be provided;
>
> OR:
>
> B.  Will provide a default definition in middle end to generate the zeroing insn for selected registers. Then need to add a new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)”, same as A, X86 implementation will be provided in my patch. 

The kind of target hook interface I was thinking of was:

  HARD_REG_SET TARGET_EMIT_MOVE_ZEROS (const HARD_REG_SET &regs)

which:

- emits zeroing instructions for some target-specific subset of REGS

- returns the set of registers that were actually cleared

The default implementation would clear all registers in REGS,
using reg_raw_mode[R] as the mode for register R.  Targets could
then override the hook and:

- drop registers that shouldn't be cleared

- handle some or all of the remaining registers in a more optimal,
  target-specific way

The targets could then use the default implementation of the hook
to handle any residue.  E.g. the default implementation would be
able to handle general registers on x86.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-18 22:51                                                                   ` Segher Boessenkool
@ 2020-09-21 14:13                                                                     ` Qing Zhao
  2020-09-21 20:34                                                                       ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-21 14:13 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Richard Sandiford, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor



> On Sep 18, 2020, at 5:51 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Hi!
> 
> On Fri, Sep 18, 2020 at 03:31:12PM -0500, Qing Zhao wrote:
>> Let me know your opinion:
>> 
>> A.  Will not provide default definition in middle end to generate the zeroing insn for selected registers.  Move the generation work all to target; X86 implementation will be provided;
>> 
>> OR:
>> 
>> B.  Will provide a default definition in middle end to generate the zeroing insn for selected registers. Then need to add a new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)”, same as A, X86 implementation will be provided in my patch. 
> 
> Is this just to make the xor thing work?  i386 has a peephole to
> transform the mov to a xor for this (and the backend could just handle
> it in its mov<M> patterns, maybe a peephole was easier for i386, no
> idea).

You mean what’s the purpose of the new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)?

The purpose of this new target hook is for the target to delete some of the call_used registers that should not be zeroed, for example, the stack registers in X86. (St0-st7). 
For other platforms, there might be other call_used registers that should not be zeroed. 

Qing

> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-21  7:23                                                                   ` Richard Sandiford
@ 2020-09-21 14:29                                                                     ` Qing Zhao
  2020-09-21 15:35                                                                       ` Richard Sandiford
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-21 14:29 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor



> On Sep 21, 2020, at 2:23 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>> Hi, Richard,
>> 
>> During my implementation of the new version of the patch. I still feel that it’s not practical to add a default definition in the middle end to just use move patterns to zero each selected register. 
>> 
>> The major issues are:
>> 
>> There are some target specific information on how to define “general register” set and “all register” set,  we have to add a new specific target hook to get such target specific information and pass to middle-end. 
> 
> GENERAL_REGS and ALL_REGS are already concepts that target-independent
> code knows about though.  I think the non-fixed subsets of those would
> make good starting sets, which the target could whittle down it wanted
> or needed to.

Yes, this is what I am currently doing:  

First, the middle end computes the initial need_zeroed_hardregs based on user request, data flow, and function abi. Then pass this “need_zeroed_hardregs” to target hook;
Then, the target hook will delete some of the registers that should not be zeroed in that specific target from “need_zeroed_hardregs”, for example, stack_regs on x86.

> 
>> For example, on X86, for CALL_USED_REGISTERS, we have:
>> 
>> #define CALL_USED_REGISTERS                                     \
>> /*ax,dx,cx,bx,si,di,bp,sp,st,st1,st2,st3,st4,st5,st6,st7*/      \
>> {  1, 1, 1, 0, 4, 4, 0, 1, 1,  1,  1,  1,  1,  1,  1,  1,       \
>> /*arg,flags,fpsr,frame*/                                        \
>>    1,   1,    1,    1,                                         \
>> /*xmm0,xmm1,xmm2,xmm3,xmm4,xmm5,xmm6,xmm7*/                     \
>>     1,   1,   1,   1,   1,   1,   6,   6,                      \
>> /* mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7*/   
>> 
>> 
>> From the above, we can see “st0 to st7” are call_used_registers for x86, however, we should not zero these registers on x86. 
>> 
>> Such details is only known by x86 backend. 
>> 
>> I guess that other platforms might have similar issue. 
> 
> They might, but that doesn't disprove that there's a sensisble default
> choice that works for most targets.
> 
> FWIW, stack registers themselves are already exposed outside targets
> (see reg-stack.c, although since x86 is the only port that uses it,
> the main part of it is effectively target-dependent at the moment).
> Similarly for register windows.

Yes, the stack_regs currently can be referenced as STACK_REG_P in middle end. So for X86, we might be able to identify this in middle end.

However, my major concern is other platforms that we are not very familiar with, there might be some special registers on that platform that should not be zeroed,  and currently, there is no way to identify them in middle end.

For such platform, the default handler will not be correct. 
> 
>> If we still want  a default definition in middle end to generate the zeroing insn for selected registers, I have to add another target hook, say, “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)” to check whether a register should be zeroed based on gpr_only (general register only)  and target specific decision.   I will provide a x86 implementation for this target hook in this patch. 
>> 
>> Other targets have to implement this new target hook to utilize the default handler. 
>> 
>> Let me know your opinion:
>> 
>> A.  Will not provide default definition in middle end to generate the zeroing insn for selected registers.  Move the generation work all to target; X86 implementation will be provided;
>> 
>> OR:
>> 
>> B.  Will provide a default definition in middle end to generate the zeroing insn for selected registers. Then need to add a new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)”, same as A, X86 implementation will be provided in my patch. 
> 
> The kind of target hook interface I was thinking of was:
> 
>  HARD_REG_SET TARGET_EMIT_MOVE_ZEROS (const HARD_REG_SET &regs)
> 
> which:
> 
> - emits zeroing instructions for some target-specific subset of REGS
> 
> - returns the set of registers that were actually cleared
> 
> The default implementation would clear all registers in REGS,
> using reg_raw_mode[R] as the mode for register R.  Targets could
> then override the hook and:
> 
> - drop registers that shouldn't be cleared
> 
> - handle some or all of the remaining registers in a more optimal,
>  target-specific way
> 
> The targets could then use the default implementation of the hook
> to handle any residue.  E.g. the default implementation would be
> able to handle general registers on x86.

Even for the general registers on X86, we need some special optimization for optimal code generation, for example, we might want to optimize 
A “mov” to xor on X86;

My major concern with the default implementation of the hook is:

If a target has some special registers that should not be zeroed, and we do not provide an overridden implementation for this target, then the default implementation will generate incorrect code for this target. 

How to resolve this issue?

thanks.

Qing

> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-21 14:29                                                                     ` Qing Zhao
@ 2020-09-21 15:35                                                                       ` Richard Sandiford
  2020-09-21 16:34                                                                         ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Richard Sandiford @ 2020-09-21 15:35 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> My major concern with the default implementation of the hook is:
>
> If a target has some special registers that should not be zeroed, and we do not provide an overridden implementation for this target, then the default implementation will generate incorrect code for this target. 

That's OK.  The default behaviour of hooks and macros often needs
to be corrected by target code.  For example, going back to some
of the macros and hooks we talked about earlier:

- EPILOGUE_USES by default returns false for all registers.
  This would be the wrong behaviour for any target that currently
  defines EPILOGUE_USES to something else.

- TARGET_HARD_REGNO_SCRATCH_OK by default returns true for all registers.
  This would be the wrong behaviour for any target that currently defines
  the hook to do something else.

And in general, if there's a target-specific reason that something
has to be different from normal, it's better where possible to expose
the underlying concept that makes that different behaviour necessary,
rather than expose the downstream effects of that concept.  For example,
IMO it's a historical mistake that targets that support interrupt
handlers need to change all of:

- TARGET_HARD_REGNO_SCRATCH_OK
- HARD_REGNO_RENAME_OK
- EPILOGUE_USES

to expose what is essentially one concept.  IMO we should instead
just expose the fact that certain functions have extra call-saved
registers.  (This is now possible with the function_abi stuff,
but most interrupt handler support predates that.)

So if there is some concept that prevents your new target hook being
correct for x86, I think we should try if possible to expose that
concept to target-independent code.  And in the case of stack registers,
that has already been done.

The same would apply to any other target for which the default turns out
not to be correct.

But in cases where there is no underlying concept that can sensibly
be extracted out, it's OK if targets need to override the default
to get correct behaviour.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-21 15:35                                                                       ` Richard Sandiford
@ 2020-09-21 16:34                                                                         ` Qing Zhao
  2020-09-21 19:11                                                                           ` Richard Sandiford
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-21 16:34 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor



> On Sep 21, 2020, at 10:35 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> My major concern with the default implementation of the hook is:
>> 
>> If a target has some special registers that should not be zeroed, and we do not provide an overridden implementation for this target, then the default implementation will generate incorrect code for this target. 
> 
> That's OK.  The default behaviour of hooks and macros often needs
> to be corrected by target code.  For example, going back to some
> of the macros and hooks we talked about earlier:
> 
> - EPILOGUE_USES by default returns false for all registers.
>  This would be the wrong behaviour for any target that currently
>  defines EPILOGUE_USES to something else.
> 
> - TARGET_HARD_REGNO_SCRATCH_OK by default returns true for all registers.
>  This would be the wrong behaviour for any target that currently defines
>  the hook to do something else.
> 
> And in general, if there's a target-specific reason that something
> has to be different from normal, it's better where possible to expose
> the underlying concept that makes that different behaviour necessary,
> rather than expose the downstream effects of that concept.  For example,
> IMO it's a historical mistake that targets that support interrupt
> handlers need to change all of:
> 
> - TARGET_HARD_REGNO_SCRATCH_OK
> - HARD_REGNO_RENAME_OK
> - EPILOGUE_USES
> 
> to expose what is essentially one concept.  IMO we should instead
> just expose the fact that certain functions have extra call-saved
> registers.  (This is now possible with the function_abi stuff,
> but most interrupt handler support predates that.)
> 
> So if there is some concept that prevents your new target hook being
> correct for x86, I think we should try if possible to expose that
> concept to target-independent code.  And in the case of stack registers,
> that has already been done.

I will exclude “stack registers” in the middle end to see whether this can resolve the issue with X86. 
> 
> The same would apply to any other target for which the default turns out
> not to be correct.
> 
> But in cases where there is no underlying concept that can sensibly
> be extracted out, it's OK if targets need to override the default
> to get correct behaviour.

Then, on the target that the default code is not right, and we haven’t provide overridden implementation, what should we inform the end user about this?
The user might see the documentation about -fzero-call-used-regs in gcc manual, and might try it on that specific target, but the default implementation is not correct, how to deal this?

Qing
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-21 16:34                                                                         ` Qing Zhao
@ 2020-09-21 19:11                                                                           ` Richard Sandiford
  2020-09-21 19:22                                                                             ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Richard Sandiford @ 2020-09-21 19:11 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> But in cases where there is no underlying concept that can sensibly
>> be extracted out, it's OK if targets need to override the default
>> to get correct behaviour.
>
> Then, on the target that the default code is not right, and we haven’t provide overridden implementation, what should we inform the end user about this?
> The user might see the documentation about -fzero-call-used-regs in gcc manual, and might try it on that specific target, but the default implementation is not correct, how to deal this?

The point is that we're trying to implement this in a target-independent
way, like for most compiler features.  If the option doesn't work for a
particular target, then that's a bug like any other.  The most we can
reasonably do is:

(a) try to implement the feature in a way that uses all the appropriate
    pieces of compiler infrastructure (what we've been discussing)

(b) add tests for the feature that run on all targets

It's possible that bugs could slip through even then.  But that's true
of anything.

Targets like x86 support many subtargets, many different compilation
modes, and many different compiler features (register asms, various
fancy function attributes, etc.).  So even after the option is
committed and is supposedly supported on x86, it's possible that
we'll find a bug in the feature on x86 itself.

I don't think anyone would suggest that we should warn the user that the
option might be buggy on x86 (it's initial target).  But I also don't
see any reason for believing that a bug on x86 is less likely than
a bug on other targets.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-21 19:11                                                                           ` Richard Sandiford
@ 2020-09-21 19:22                                                                             ` Qing Zhao
  2020-09-21 20:05                                                                               ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-21 19:22 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor



> On Sep 21, 2020, at 2:11 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> But in cases where there is no underlying concept that can sensibly
>>> be extracted out, it's OK if targets need to override the default
>>> to get correct behaviour.
>> 
>> Then, on the target that the default code is not right, and we haven’t provide overridden implementation, what should we inform the end user about this?
>> The user might see the documentation about -fzero-call-used-regs in gcc manual, and might try it on that specific target, but the default implementation is not correct, how to deal this?
> 
> The point is that we're trying to implement this in a target-independent
> way, like for most compiler features.  If the option doesn't work for a
> particular target, then that's a bug like any other.  The most we can
> reasonably do is:
> 
> (a) try to implement the feature in a way that uses all the appropriate
>    pieces of compiler infrastructure (what we've been discussing)
> 
> (b) add tests for the feature that run on all targets
> 
> It's possible that bugs could slip through even then.  But that's true
> of anything.
> 
> Targets like x86 support many subtargets, many different compilation
> modes, and many different compiler features (register asms, various
> fancy function attributes, etc.).  So even after the option is
> committed and is supposedly supported on x86, it's possible that
> we'll find a bug in the feature on x86 itself.
> 
> I don't think anyone would suggest that we should warn the user that the
> option might be buggy on x86 (it's initial target).  But I also don't
> see any reason for believing that a bug on x86 is less likely than
> a bug on other targets.

Okay, then I will add the default implementation as you suggested. And also provide the overriden optimized implementation on X86. 

Let me know if you have further suggestion.

Qing
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-21 19:22                                                                             ` Qing Zhao
@ 2020-09-21 20:05                                                                               ` Qing Zhao
  2020-09-22 16:31                                                                                 ` Richard Sandiford
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-21 20:05 UTC (permalink / raw)
  To: Qing Zhao, richard Sandiford
  Cc: Jakub Jelinek, Kees Cook, Segher Boessenkool, Uros Bizjak,
	Rodriguez Bahena, Victor, Kees Cook via Gcc-patches



> On Sep 21, 2020, at 2:22 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> 
> 
>> On Sep 21, 2020, at 2:11 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> But in cases where there is no underlying concept that can sensibly
>>>> be extracted out, it's OK if targets need to override the default
>>>> to get correct behaviour.
>>> 
>>> Then, on the target that the default code is not right, and we haven’t provide overridden implementation, what should we inform the end user about this?
>>> The user might see the documentation about -fzero-call-used-regs in gcc manual, and might try it on that specific target, but the default implementation is not correct, how to deal this?
>> 
>> The point is that we're trying to implement this in a target-independent
>> way, like for most compiler features.  If the option doesn't work for a
>> particular target, then that's a bug like any other.  The most we can
>> reasonably do is:
>> 
>> (a) try to implement the feature in a way that uses all the appropriate
>>   pieces of compiler infrastructure (what we've been discussing)
>> 
>> (b) add tests for the feature that run on all targets
>> 
>> It's possible that bugs could slip through even then.  But that's true
>> of anything.
>> 
>> Targets like x86 support many subtargets, many different compilation
>> modes, and many different compiler features (register asms, various
>> fancy function attributes, etc.).  So even after the option is
>> committed and is supposedly supported on x86, it's possible that
>> we'll find a bug in the feature on x86 itself.
>> 
>> I don't think anyone would suggest that we should warn the user that the
>> option might be buggy on x86 (it's initial target).  But I also don't
>> see any reason for believing that a bug on x86 is less likely than
>> a bug on other targets.
> 
> Okay, then I will add the default implementation as you suggested. And also provide the overriden optimized implementation on X86. 

For X86, looks like that in addition to stack registers (st0 to st7), mask registers (k0 to k7) also do not need to be zeroed, and also “mm0 to mm7”  should Not be zeroed too.

As I checked, MASK_REG_P and MMX_REG_P are x86 specific macros, can I use them in middle end similar as “STACK_REG_P”?

Qing
> 
> Let me know if you have further suggestion.
> 
> Qing
>> 
>> Thanks,
>> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-21 14:13                                                                     ` Qing Zhao
@ 2020-09-21 20:34                                                                       ` Segher Boessenkool
  2020-09-21 20:58                                                                         ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-21 20:34 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Sandiford, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

On Mon, Sep 21, 2020 at 09:13:58AM -0500, Qing Zhao wrote:
> > On Sep 18, 2020, at 5:51 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >> B.  Will provide a default definition in middle end to generate the zeroing insn for selected registers. Then need to add a new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)”, same as A, X86 implementation will be provided in my patch. 
> > 
> > Is this just to make the xor thing work?  i386 has a peephole to
> > transform the mov to a xor for this (and the backend could just handle
> > it in its mov<M> patterns, maybe a peephole was easier for i386, no
> > idea).
> 
> You mean what’s the purpose of the new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)?
> 
> The purpose of this new target hook is for the target to delete some of the call_used registers that should not be zeroed, for example, the stack registers in X86. (St0-st7). 

Oh, I didn't see the _P.  Maybe give it a better name?  Also, a better
interface altogether, a call per hard register is a bit much (and easily
avoidable).

> For other platforms, there might be other call_used registers that should not be zeroed. 

But you cannot *add* anything with this interface, and it cannot return
different results depending on which return insn this is.  It is not a
good abstraction IMO.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-21 20:34                                                                       ` Segher Boessenkool
@ 2020-09-21 20:58                                                                         ` Qing Zhao
  2020-09-22  0:25                                                                           ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-21 20:58 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Richard Sandiford, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor



> On Sep 21, 2020, at 3:34 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Mon, Sep 21, 2020 at 09:13:58AM -0500, Qing Zhao wrote:
>>> On Sep 18, 2020, at 5:51 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>> B.  Will provide a default definition in middle end to generate the zeroing insn for selected registers. Then need to add a new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)”, same as A, X86 implementation will be provided in my patch. 
>>> 
>>> Is this just to make the xor thing work?  i386 has a peephole to
>>> transform the mov to a xor for this (and the backend could just handle
>>> it in its mov<M> patterns, maybe a peephole was easier for i386, no
>>> idea).
>> 
>> You mean what’s the purpose of the new target hook “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)?
>> 
>> The purpose of this new target hook is for the target to delete some of the call_used registers that should not be zeroed, for example, the stack registers in X86. (St0-st7). 
> 
> Oh, I didn't see the _P.  Maybe give it a better name?  Also, a better
> interface altogether, a call per hard register is a bit much (and easily
> avoidable).
> 
>> For other platforms, there might be other call_used registers that should not be zeroed. 
> 
> But you cannot *add* anything with this interface, and it cannot return
> different results depending on which return insn this is.  It is not a
> good abstraction IMO.

This hook will not depend on which return insn.  It just check whether the specified register can be zeroed for this target, for example, it will exclude stack register (st0 to st7), MMX registers (mm0 to mm7) and mask registers (t0 to t7) for X86 target from zeroing. 

The information depending on which return should be reflected in the data flow information,  which we can easily get from middle-end’s data flow analysis. 

I have added such target hook in the previous patch as: 

https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550018.html <https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550018.html>

However, I got several comments on too much target specific details exposed unnecessary in the very beginning of the discussion. 

However, If we want to add a default implementation in the middle end as Richard suggested, this target hook might be necessary.

Qing

> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-21 20:58                                                                         ` Qing Zhao
@ 2020-09-22  0:25                                                                           ` Segher Boessenkool
  0 siblings, 0 replies; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-22  0:25 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Sandiford, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

On Mon, Sep 21, 2020 at 03:58:25PM -0500, Qing Zhao wrote:
> > On Sep 21, 2020, at 3:34 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > But you cannot *add* anything with this interface, and it cannot return
> > different results depending on which return insn this is.  It is not a
> > good abstraction IMO.
> 
> This hook will not depend on which return insn.

But good code generation very much *does*.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-21 20:05                                                                               ` Qing Zhao
@ 2020-09-22 16:31                                                                                 ` Richard Sandiford
  2020-09-22 18:25                                                                                   ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Richard Sandiford @ 2020-09-22 16:31 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Jakub Jelinek, Kees Cook, Segher Boessenkool, Uros Bizjak,
	Rodriguez Bahena, Victor, Kees Cook via Gcc-patches

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 21, 2020, at 2:22 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>> 
>> 
>> 
>>> On Sep 21, 2020, at 2:11 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>> But in cases where there is no underlying concept that can sensibly
>>>>> be extracted out, it's OK if targets need to override the default
>>>>> to get correct behaviour.
>>>> 
>>>> Then, on the target that the default code is not right, and we haven’t provide overridden implementation, what should we inform the end user about this?
>>>> The user might see the documentation about -fzero-call-used-regs in gcc manual, and might try it on that specific target, but the default implementation is not correct, how to deal this?
>>> 
>>> The point is that we're trying to implement this in a target-independent
>>> way, like for most compiler features.  If the option doesn't work for a
>>> particular target, then that's a bug like any other.  The most we can
>>> reasonably do is:
>>> 
>>> (a) try to implement the feature in a way that uses all the appropriate
>>>   pieces of compiler infrastructure (what we've been discussing)
>>> 
>>> (b) add tests for the feature that run on all targets
>>> 
>>> It's possible that bugs could slip through even then.  But that's true
>>> of anything.
>>> 
>>> Targets like x86 support many subtargets, many different compilation
>>> modes, and many different compiler features (register asms, various
>>> fancy function attributes, etc.).  So even after the option is
>>> committed and is supposedly supported on x86, it's possible that
>>> we'll find a bug in the feature on x86 itself.
>>> 
>>> I don't think anyone would suggest that we should warn the user that the
>>> option might be buggy on x86 (it's initial target).  But I also don't
>>> see any reason for believing that a bug on x86 is less likely than
>>> a bug on other targets.
>> 
>> Okay, then I will add the default implementation as you suggested. And also provide the overriden optimized implementation on X86. 
>
> For X86, looks like that in addition to stack registers (st0 to st7), mask registers (k0 to k7) also do not need to be zeroed, and also “mm0 to mm7”  should Not be zeroed too.
>
> As I checked, MASK_REG_P and MMX_REG_P are x86 specific macros, can I use them in middle end similar as “STACK_REG_P”?

No, those are x86-specific like you say.

Taking each in turn: what is the reason for not clearing mask registers?
And what is the reason for not clearing mm0-7?  In each case, is it a
performance or a correctness issue?

Although the registers themselves are target-specific, the reason
for excluding them might be something that could be exposed to
target-independent code.

As a general comment, with at least three sets of excluded registers,
the “all” in one of the suggested option values is beginning to feel
like a misnomer.  (Maybe that has already been dropped though.)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-17 19:07                                                                                                   ` Qing Zhao
@ 2020-09-22 17:06                                                                                                     ` Richard Sandiford
  2020-09-22 21:32                                                                                                       ` Qing Zhao
  2020-09-22 22:37                                                                                                       ` Segher Boessenkool
  0 siblings, 2 replies; 188+ messages in thread
From: Richard Sandiford @ 2020-09-22 17:06 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 17, 2020, at 11:27 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>>> On Sep 17, 2020, at 1:17 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>> 
>>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>> Segher and Richard, 
>>>>> 
>>>>> Now there are two major concerns from the discussion so far:
>>>>> 
>>>>> 1. (From Richard):  Inserting zero insns should be done after pass_thread_prologue_and_epilogue since later passes (for example, pass_regrename) might introduce new used caller-saved registers. 
>>>>>    So, we should do this in the beginning of pass_late_compilation (some targets wouldn’t cope with doing it later). 
>>>>> 
>>>>> 2. (From Segher): The inserted zero insns should stay together with the return, no other insns should move in-between zero insns and return insns. Otherwise, a valid gadget could be formed. 
>>>>> 
>>>>> I think that both of the above 2 concerns are important and should be addressed for the correct implementation. 
>>>>> 
>>>>> In order to support 1,  we cannot implementing it in “targetm.gen_return()” and “targetm.gen_simple_return()”  since “targetm.gen_return()” and “targetm.gen_simple_return()” are called during pass_thread_prologue_and_epilogue, at that time, the use information still not correct. 
>>>>> 
>>>>> In order to support 2, enhancing EPILOGUE_USES to include the zeroed registgers is NOT enough to prevent all the zero insns from moving around.
>>>> 
>>>> Right.  The purpose of EPILOGUE_USES was instead to stop the moves from
>>>> being deleted as dead.
>>>> 
>>>>> More restrictions need to be added to these new zero insns.  (I think that marking these new zeroed registers as “unspec_volatile” at RTL level is necessary to prevent them from deleting from moving around). 
>>>>> 
>>>>> 
>>>>> So, based on the above, I propose the following approach that will resolve the above 2 concerns:
>>>>> 
>>>>> 1. Add 2 new target hooks:
>>>>>  A. targetm.pro_epilogue_use (reg)
>>>>>  This hook should return a UNSPEC_VOLATILE rtx to mark a register in use to
>>>>>  prevent deleting register setting instructions in prologue and epilogue.
>>>>> 
>>>>>  B. targetm.gen_zero_call_used_regs(need_zeroed_hardregs)
>>>>>  This hook will gen a sequence of zeroing insns that zero the registers that specified in NEED_ZEROED_HARDREGS.
>>>>> 
>>>>>   A default handler of “gen_zero_call_used_regs” could be defined in middle end, which use mov insns to zero registers, and then use “targetm.pro_epilogue_use(reg)” to mark each zeroed registers. 
>>>> 
>>>> This sounds like you're going back to using:
>>>> 
>>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>>>       (const_int 0 [0])) "t10.c":11:1 -1
>>>>    (nil))
>>>> (insn 19 18 20 2 (unspec_volatile [
>>>>           (reg:SI 1 dx)
>>>>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>>>    (nil))
>>>> 
>>>> This also doesn't prevent the zeroing from being moved around.  Like the
>>>> EPILOGUE_USES approach, it only prevents the clearing from being removed
>>>> as dead.  I still think that the EPILOGUE_USES approach is the better
>>>> way of doing that.
>>> 
>>> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>> 
>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>>> ;; all of memory.  This blocks insns from being moved across this point.
>> 
>> Heh, it looks like that comment dates back to 1994. :-)
>> 
>> The comment is no longer correct though.  I wasn't around at the time,
>> but I assume the comment was only locally true even then.
>> 
>> If what the comment said was true, then something like:
>> 
>> (define_insn "cld"
>>  [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>  ""
>>  "cld"
>>  [(set_attr "length" "1")
>>   (set_attr "length_immediate" "0")
>>   (set_attr "modrm" "0")])
>> 
>> would invalidate the entire register file and so would require all values
>> to be spilt to the stack around the CLD.
>
> Okay, thanks for the info. 
> then, what’s the current definition of UNSPEC_VOLATILE? 

I'm not sure it's written down anywhere TBH.  rtl.texi just says:

  @code{unspec_volatile} is used for volatile operations and operations
  that may trap; @code{unspec} is used for other operations.

which seems like a cyclic definition: volatile expressions are defined
to be expressions that are volatile.

But IMO the semantics are that unspec_volatile patterns with a given
set of inputs and outputs act for dataflow purposes like volatile asms
with the same inputs and outputs.  The semantics of asm volatile are
at least slightly more well-defined (if only by example); see extend.texi
for details.  In particular:

  Note that the compiler can move even @code{volatile asm} instructions relative
  to other code, including across jump instructions. For example, on many 
  targets there is a system register that controls the rounding mode of 
  floating-point operations. Setting it with a @code{volatile asm} statement,
  as in the following PowerPC example, does not work reliably.

  @example
  asm volatile("mtfsf 255, %0" : : "f" (fpenv));
  sum = x + y;
  @end example

  The compiler may move the addition back before the @code{volatile asm}
  statement. To make it work as expected, add an artificial dependency to
  the @code{asm} by referencing a variable in the subsequent code, for
  example:

  @example
  asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
  sum = x + y;
  @end example

which is very similar to the unspec_volatile case we're talking about.

To take an x86 example:

  void
  f (char *x)
  {
    asm volatile ("");
    x[0] = 0;
    asm volatile ("");
    x[1] = 0;
    asm volatile ("");
  }

gets optimised to:

        xorl    %eax, %eax
        movw    %ax, (%rdi)

with the two stores being merged.  The same thing is IMO valid for
unspec_volatile.  In both cases, you would need some kind of memory
clobber to prevent the move and merge from happening.

>>> I am not very familiar with how the unspec_volatile actually works in gcc’s data flow analysis, my understanding  from the above is, the RTL insns marked with UNSPEC_volatile would be served as a barrier that no other insns can move across this point. At the same time, since the marked RTL insns is considered to use and clobber all hard registers and memory, it cannot be deleted either. 
>> 
>> UNSPEC_VOLATILEs can't be deleted.  And they can't be reordered relative
>> to other UNSPEC_VOLATILEs.  But the problem with:
>> 
>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>       (const_int 0 [0])) "t10.c":11:1 -1
>>    (nil))
>> (insn 19 18 20 2 (unspec_volatile [
>>           (reg:SI 1 dx)
>>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>    (nil))
>> 
>> is that the volatile occurs *after* the zeroing instruction.  So at best
>> it can stop insn 18 moving further down, to be closer to the return
>> instruction.  There's nothing to stop insn 18 moving further up,
>> away from the return instruction, which AIUI is what you're trying
>> to prevent.  E.g. suppose we had:
>> 
>> (insn 17 … pop a register other than dx from the stack …)
>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>       (const_int 0 [0])) "t10.c":11:1 -1
>>    (nil))
>> (insn 19 18 20 2 (unspec_volatile [
>>           (reg:SI 1 dx)
>>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>    (nil))
>> 
>> There is nothing to stop an rtl pass reordering that to:
>> 
>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>       (const_int 0 [0])) "t10.c":11:1 -1
>>    (nil))
>> (insn 17 … pop a register other than dx from the stack …)
>> (insn 19 18 20 2 (unspec_volatile [
>>           (reg:SI 1 dx)
>>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>    (nil))
>
> Yes, agreed. And then the volatile marking insn should be put BEFORE the zeroing insn. 
>
>> 
>> There's also no dataflow reason why this couldn't be reordered to:
>> 
>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>       (const_int 0 [0])) "t10.c":11:1 -1
>>    (nil))
>> (insn 19 18 20 2 (unspec_volatile [
>>           (reg:SI 1 dx)
>>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>    (nil))
>> (insn 17 … pop a register other than dx from the stack …)
>> 
>
> This is the place I don’t quite agree at this moment, maybe I still not quite understand the “UNSPEC_volatile”.
>
> I checked several places in GCC that handle “UNSPEC_VOLATILE”, for example,  for the routine “can_move_insns_across” in gcc/df-problem.c:
>
>       if (NONDEBUG_INSN_P (insn))
>         {
>           if (volatile_insn_p (PATTERN (insn)))
>             return false;
>
> From my understanding of reading the code, when an insn is UNSPEC_VOLATILE, another insn will NOT be able to move across it. 
>
> Then for the above example, the insn 17 should Not be moved across insn 19 either.
>
> Let me know if I miss anything important. 

The above is conservatively correct.  But not all passes do it.
E.g. combine does have a similar approach:

  /* If INSN contains volatile references (specifically volatile MEMs),
     we cannot combine across any other volatile references.
     Even if INSN doesn't contain volatile references, any intervening
     volatile insn might affect machine state.  */

  is_volatile_p = volatile_refs_p (PATTERN (insn))
    ? volatile_refs_p
    : volatile_insn_p;

And like you say, the passes that use can_move_insns_across will be
conservative too.  But not many passes use that function.

Passes like fwprop.c, postreload-gcse.c and ree.c do not (AFAIK) worry
about volatile asms or unspec_volatiles, and can move code across them.
And that's kind-of inevitable.  Having an “everything barrier” makes
life very hard for global optimisation.

>> So…
>> 
>>> So, I thought that “UNSPEC_volatile” should be stronger than “EPILOGUE_USES”. And it can serve the purpose of preventing zeroing insns from deleting and moving. 
>> 
>> …both EPILOGUE_USES and UNSPEC_VOLATILE would be effective ways of
>> stopping insn 18 from being deleted.  But an UNSPEC_VOLATILE after
>> the instruction would IMO be counterproductive: it would stop the
>> zeroing instructions that we want to be close to the return instruction
>> from moving “closer” to the return instruction, but it wouldn't do the
>> same for unrelated instructions.  So if anything, the unspec_volatile
>> could increase the chances that something unrelated to the register
>> zeroing is moved later than the register zeroing.  E.g. this could
>> happen when filling delayed branch slots.
>> 
>>>> I don't think there's a foolproof way of preventing an unknown target
>>>> machine_reorg pass from moving the instructions around.  But since we
>>>> don't have unknown machine_reorgs (at least not in-tree), I think
>>>> instead we should be prepared to patch machine_reorgs where necessary
>>>> to ensure that they do the right thing.
>>>> 
>>>> If you want to increase the chances that machine_reorgs don't need to be
>>>> patched, you could either:
>>>> 
>>>> (a) to make the zeroing instructions themselves volatile or
>>>> (b) to insert a volatile reference to the register before (rather than
>>>>   after) the zeroing instruction
>>>> 
>>>> IMO (b) is the way to go, because it avoids the need to define special
>>>> volatile move patterns for each type of register.  (b) would be needed
>>>> on top of (rather than instead of) the EPILOGUE_USES thing.
>>>> 
>>> Okay, will take approach b. 
>>> 
>>> But I still don’t quite understand why we still need “EPILOUGE_USES”? What’s the additional benefit from EPILOGUE_USES?
>> 
>> The asm for (b) goes before the instruction, so we'd have:
>> 
>> (insn 17 … new asm …)
>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>       (const_int 0 [0])) "t10.c":11:1 -1
>>    (nil))
>> (insn 19 … return …)
>> 
>> But something has to tell the df machinery that the value of edx
>> matters on return from the function, otherwise insn 18 could be
>> deleted as dead.  Adding edx to EPILOGUE_USES provides that information
>> and stops the instruction from being deleted.
>
>
> In the above, insn 17 will be something like:
>
> (insn 17 ...(unspec_volatile [  (reg:SI 1 dx)
>     ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1 
> (nil))

In the example above, insn 17 would be an asm that clobbers dx
(instead of using dx).

> So, the reg edx is marked as “UNSPEC_volatile” already, that should mean the value of edx matters on return from the function already, my understanding is that df should automatically pick up the “UNSPEC_VOLATILE” insn and it’s operands.   “UNSPEC_VOLATILE” insn should serve the same purpose as putting “edx” to EPILOGUE_USES. 
>
> Do I miss anything here?

The point is that any use of dx at insn 17 comes before the definition
in insn 18.  So a use in insn 17 would keep alive any store to dx that
happend before insn 17.  But it would not keep the store in insn 18 live,
since insn 18 executes later.

>>>> I don't think we need a new target-specific unspec_volatile code to do (b).
>>>> We can just use an automatically-generated volatile asm to clobber the
>>>> registers first.  See e.g. how expand_asm_memory_blockage handles memory
>>>> scheduling barriers.
>>> /* Generate asm volatile("" : : : "memory") as the memory blockage.  */
>>> 
>>> static void
>>> expand_asm_memory_blockage (void)
>>> {
>>>  rtx asm_op, clob;
>>> 
>>>  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>>>                                 rtvec_alloc (0), rtvec_alloc (0),
>>>                                 rtvec_alloc (0), UNKNOWN_LOCATION);
>>>  MEM_VOLATILE_P (asm_op) = 1;
>>> 
>>>  clob = gen_rtx_SCRATCH (VOIDmode);
>>>  clob = gen_rtx_MEM (BLKmode, clob);
>>>  clob = gen_rtx_CLOBBER (VOIDmode, clob);
>>> 
>>>  emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>>> }
>>> 
>>> 
>>> As the following? 
>>> 
>>> /* Generate asm volatile("" : : : “regno") for REGNO.   */
>>> 
>>> static void
>>> expand_asm_reg_volatile (machine_mode mode, unsigned int regno)
>>> {
>>>  rtx asm_op, clob;
>>> 
>>>  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>>>                                 rtvec_alloc (0), rtvec_alloc (0),
>>>                                 rtvec_alloc (0), UNKNOWN_LOCATION);
>>>  MEM_VOLATILE_P (asm_op) = 1;
>>> 
>>>  clob = gen_rtx_REG (mode, regno);
>>>  clob = gen_rtx_CLOBBER (VOIDmode, clob);
>>> 
>>>  emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>>> }
>>> 
>>> Is the above correct? 
>> 
>> Yeah, looks good.  You should be able to clobber all the registers you
>> want to clear in one asm.
>
> How to do this?

Rather than create:

  gen_rtvec (2, asm_op, clob)

with just the asm and one clobber, you can create:

  gen_rtvec (N + 1, asm_op, clob1, …, clobN)

with N clobbers side-by-side.  When N is variable (as it probably would
be in your case), it's easier to use rtvec_alloc and fill in the fields
using RTVEC_ELT.  E.g.:

  rtvec v = rtvec_alloc (N + 1);
  RTVEC_ELT (v, 0) = asm_op;
  RTVEC_ELT (v, 1) = clob1;
  …
  RTVEC_ELT (v, N) = clobN;

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-22 16:31                                                                                 ` Richard Sandiford
@ 2020-09-22 18:25                                                                                   ` Qing Zhao
  2020-09-22 18:35                                                                                     ` H.J. Lu
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-22 18:25 UTC (permalink / raw)
  To: Richard Sandiford, H.J. Lu
  Cc: Jakub Jelinek, Kees Cook, Segher Boessenkool, Uros Bizjak,
	Rodriguez Bahena, Victor, Kees Cook via Gcc-patches

Hi, Hongjiu, 


> On Sep 22, 2020, at 11:31 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> On Sep 21, 2020, at 2:22 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>> 
>>> 
>>> 
>>>> On Sep 21, 2020, at 2:11 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>> 
>>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>>> But in cases where there is no underlying concept that can sensibly
>>>>>> be extracted out, it's OK if targets need to override the default
>>>>>> to get correct behaviour.
>>>>> 
>>>>> Then, on the target that the default code is not right, and we haven’t provide overridden implementation, what should we inform the end user about this?
>>>>> The user might see the documentation about -fzero-call-used-regs in gcc manual, and might try it on that specific target, but the default implementation is not correct, how to deal this?
>>>> 
>>>> The point is that we're trying to implement this in a target-independent
>>>> way, like for most compiler features.  If the option doesn't work for a
>>>> particular target, then that's a bug like any other.  The most we can
>>>> reasonably do is:
>>>> 
>>>> (a) try to implement the feature in a way that uses all the appropriate
>>>>  pieces of compiler infrastructure (what we've been discussing)
>>>> 
>>>> (b) add tests for the feature that run on all targets
>>>> 
>>>> It's possible that bugs could slip through even then.  But that's true
>>>> of anything.
>>>> 
>>>> Targets like x86 support many subtargets, many different compilation
>>>> modes, and many different compiler features (register asms, various
>>>> fancy function attributes, etc.).  So even after the option is
>>>> committed and is supposedly supported on x86, it's possible that
>>>> we'll find a bug in the feature on x86 itself.
>>>> 
>>>> I don't think anyone would suggest that we should warn the user that the
>>>> option might be buggy on x86 (it's initial target).  But I also don't
>>>> see any reason for believing that a bug on x86 is less likely than
>>>> a bug on other targets.
>>> 
>>> Okay, then I will add the default implementation as you suggested. And also provide the overriden optimized implementation on X86. 
>> 
>> For X86, looks like that in addition to stack registers (st0 to st7), mask registers (k0 to k7) also do not need to be zeroed, and also “mm0 to mm7”  should Not be zeroed too.
>> 
>> As I checked, MASK_REG_P and MMX_REG_P are x86 specific macros, can I use them in middle end similar as “STACK_REG_P”?
> 
> No, those are x86-specific like you say.
> 
> Taking each in turn: what is the reason for not clearing mask registers?
> And what is the reason for not clearing mm0-7?  In each case, is it a
> performance or a correctness issue?

Could you please provide more information on the above questions? (Why we exclude mask registers and mm0-7 registers from ALL on x86?)

thanks.

Qing

> 
> Although the registers themselves are target-specific, the reason
> for excluding them might be something that could be exposed to
> target-independent code.
> 
> As a general comment, with at least three sets of excluded registers,
> the “all” in one of the suggested option values is beginning to feel
> like a misnomer.  (Maybe that has already been dropped though.)
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-22 18:25                                                                                   ` Qing Zhao
@ 2020-09-22 18:35                                                                                     ` H.J. Lu
  2020-09-22 19:34                                                                                       ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: H.J. Lu @ 2020-09-22 18:35 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Sandiford, Jakub Jelinek, Kees Cook, Segher Boessenkool,
	Uros Bizjak, Rodriguez Bahena, Victor, Kees Cook via Gcc-patches

On Tue, Sep 22, 2020 at 11:25 AM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
> Hi, Hongjiu,
>
>
> > On Sep 22, 2020, at 11:31 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> >
> > Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> >>> On Sep 21, 2020, at 2:22 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> >>>
> >>>
> >>>
> >>>> On Sep 21, 2020, at 2:11 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> >>>>
> >>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> >>>>>> But in cases where there is no underlying concept that can sensibly
> >>>>>> be extracted out, it's OK if targets need to override the default
> >>>>>> to get correct behaviour.
> >>>>>
> >>>>> Then, on the target that the default code is not right, and we haven’t provide overridden implementation, what should we inform the end user about this?
> >>>>> The user might see the documentation about -fzero-call-used-regs in gcc manual, and might try it on that specific target, but the default implementation is not correct, how to deal this?
> >>>>
> >>>> The point is that we're trying to implement this in a target-independent
> >>>> way, like for most compiler features.  If the option doesn't work for a
> >>>> particular target, then that's a bug like any other.  The most we can
> >>>> reasonably do is:
> >>>>
> >>>> (a) try to implement the feature in a way that uses all the appropriate
> >>>>  pieces of compiler infrastructure (what we've been discussing)
> >>>>
> >>>> (b) add tests for the feature that run on all targets
> >>>>
> >>>> It's possible that bugs could slip through even then.  But that's true
> >>>> of anything.
> >>>>
> >>>> Targets like x86 support many subtargets, many different compilation
> >>>> modes, and many different compiler features (register asms, various
> >>>> fancy function attributes, etc.).  So even after the option is
> >>>> committed and is supposedly supported on x86, it's possible that
> >>>> we'll find a bug in the feature on x86 itself.
> >>>>
> >>>> I don't think anyone would suggest that we should warn the user that the
> >>>> option might be buggy on x86 (it's initial target).  But I also don't
> >>>> see any reason for believing that a bug on x86 is less likely than
> >>>> a bug on other targets.
> >>>
> >>> Okay, then I will add the default implementation as you suggested. And also provide the overriden optimized implementation on X86.
> >>
> >> For X86, looks like that in addition to stack registers (st0 to st7), mask registers (k0 to k7) also do not need to be zeroed, and also “mm0 to mm7”  should Not be zeroed too.
> >>
> >> As I checked, MASK_REG_P and MMX_REG_P are x86 specific macros, can I use them in middle end similar as “STACK_REG_P”?
> >
> > No, those are x86-specific like you say.
> >
> > Taking each in turn: what is the reason for not clearing mask registers?
> > And what is the reason for not clearing mm0-7?  In each case, is it a
> > performance or a correctness issue?
>
> Could you please provide more information on the above questions? (Why we exclude mask registers and mm0-7 registers from ALL on x86?)
>

No particular reason.  You can add them.

H.J.

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-22 18:35                                                                                     ` H.J. Lu
@ 2020-09-22 19:34                                                                                       ` Qing Zhao
  2020-09-23 10:43                                                                                         ` Richard Sandiford
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-22 19:34 UTC (permalink / raw)
  To: H.J. Lu, Richard Sandiford
  Cc: Jakub Jelinek, Kees Cook, Segher Boessenkool, Uros Bizjak,
	Rodriguez Bahena, Victor, Kees Cook via Gcc-patches



> On Sep 22, 2020, at 1:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> 
> On Tue, Sep 22, 2020 at 11:25 AM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>> 
>> Hi, Hongjiu,
>> 
>> 
>>> On Sep 22, 2020, at 11:31 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>> On Sep 21, 2020, at 2:22 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Sep 21, 2020, at 2:11 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>>> 
>>>>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>>>>> But in cases where there is no underlying concept that can sensibly
>>>>>>>> be extracted out, it's OK if targets need to override the default
>>>>>>>> to get correct behaviour.
>>>>>>> 
>>>>>>> Then, on the target that the default code is not right, and we haven’t provide overridden implementation, what should we inform the end user about this?
>>>>>>> The user might see the documentation about -fzero-call-used-regs in gcc manual, and might try it on that specific target, but the default implementation is not correct, how to deal this?
>>>>>> 
>>>>>> The point is that we're trying to implement this in a target-independent
>>>>>> way, like for most compiler features.  If the option doesn't work for a
>>>>>> particular target, then that's a bug like any other.  The most we can
>>>>>> reasonably do is:
>>>>>> 
>>>>>> (a) try to implement the feature in a way that uses all the appropriate
>>>>>> pieces of compiler infrastructure (what we've been discussing)
>>>>>> 
>>>>>> (b) add tests for the feature that run on all targets
>>>>>> 
>>>>>> It's possible that bugs could slip through even then.  But that's true
>>>>>> of anything.
>>>>>> 
>>>>>> Targets like x86 support many subtargets, many different compilation
>>>>>> modes, and many different compiler features (register asms, various
>>>>>> fancy function attributes, etc.).  So even after the option is
>>>>>> committed and is supposedly supported on x86, it's possible that
>>>>>> we'll find a bug in the feature on x86 itself.
>>>>>> 
>>>>>> I don't think anyone would suggest that we should warn the user that the
>>>>>> option might be buggy on x86 (it's initial target).  But I also don't
>>>>>> see any reason for believing that a bug on x86 is less likely than
>>>>>> a bug on other targets.
>>>>> 
>>>>> Okay, then I will add the default implementation as you suggested. And also provide the overriden optimized implementation on X86.
>>>> 
>>>> For X86, looks like that in addition to stack registers (st0 to st7), mask registers (k0 to k7) also do not need to be zeroed, and also “mm0 to mm7”  should Not be zeroed too.
>>>> 
>>>> As I checked, MASK_REG_P and MMX_REG_P are x86 specific macros, can I use them in middle end similar as “STACK_REG_P”?
>>> 
>>> No, those are x86-specific like you say.
>>> 
>>> Taking each in turn: what is the reason for not clearing mask registers?
>>> And what is the reason for not clearing mm0-7?  In each case, is it a
>>> performance or a correctness issue?
>> 
>> Could you please provide more information on the above questions? (Why we exclude mask registers and mm0-7 registers from ALL on x86?)
>> 
> 
> No particular reason.  You can add them.

Okay, thanks.

Then I guess that the reason we didn’t zero mask registers and mm0-7 registers on x86  is mainly for the performance consideration.
There might not be too much benefit for mitigating ROP attack if we zero these additional registers, but we will got much more performance overhead.

What’s you opinion, Richard?

Qing




> 
> H.J.


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-22 17:06                                                                                                     ` Richard Sandiford
@ 2020-09-22 21:32                                                                                                       ` Qing Zhao
  2020-09-23 11:05                                                                                                         ` Richard Sandiford
  2020-09-22 22:37                                                                                                       ` Segher Boessenkool
  1 sibling, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-22 21:32 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor



> On Sep 22, 2020, at 12:06 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>> 
>>>> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>>> 
>>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>>>> ;; all of memory.  This blocks insns from being moved across this point.
>>> 
>>> Heh, it looks like that comment dates back to 1994. :-)
>>> 
>>> The comment is no longer correct though.  I wasn't around at the time,
>>> but I assume the comment was only locally true even then.
>>> 
>>> If what the comment said was true, then something like:
>>> 
>>> (define_insn "cld"
>>> [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>> ""
>>> "cld"
>>> [(set_attr "length" "1")
>>>  (set_attr "length_immediate" "0")
>>>  (set_attr "modrm" "0")])
>>> 
>>> would invalidate the entire register file and so would require all values
>>> to be spilt to the stack around the CLD.
>> 
>> Okay, thanks for the info. 
>> then, what’s the current definition of UNSPEC_VOLATILE? 
> 
> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
> 
>  @code{unspec_volatile} is used for volatile operations and operations
>  that may trap; @code{unspec} is used for other operations.
> 
> which seems like a cyclic definition: volatile expressions are defined
> to be expressions that are volatile.
> 
> But IMO the semantics are that unspec_volatile patterns with a given
> set of inputs and outputs act for dataflow purposes like volatile asms
> with the same inputs and outputs.  The semantics of asm volatile are
> at least slightly more well-defined (if only by example); see extend.texi
> for details.  In particular:
> 
>  Note that the compiler can move even @code{volatile asm} instructions relative
>  to other code, including across jump instructions. For example, on many 
>  targets there is a system register that controls the rounding mode of 
>  floating-point operations. Setting it with a @code{volatile asm} statement,
>  as in the following PowerPC example, does not work reliably.
> 
>  @example
>  asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>  sum = x + y;
>  @end example
> 
>  The compiler may move the addition back before the @code{volatile asm}
>  statement. To make it work as expected, add an artificial dependency to
>  the @code{asm} by referencing a variable in the subsequent code, for
>  example:
> 
>  @example
>  asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>  sum = x + y;
>  @end example
> 
> which is very similar to the unspec_volatile case we're talking about.
> 
> To take an x86 example:
> 
>  void
>  f (char *x)
>  {
>    asm volatile ("");
>    x[0] = 0;
>    asm volatile ("");
>    x[1] = 0;
>    asm volatile ("");
>  }

If we change the above as the following: (but it might not correct on the asm format):

Void
F (char *x)
{
asm volatile (“x[0]”);
x[0] = 0;
asm volatile (“x[1]"); 
x[1] = 0;
 asm volatile ("”);
}

Will the moving and merging be blocked?


I found the following code in df-scan.c:

static void
df_uses_record (class df_collection_rec *collection_rec,
                rtx *loc, enum df_ref_type ref_type,
                basic_block bb, struct df_insn_info *insn_info,
                int flags)
{
…

    case ASM_OPERANDS:
    case UNSPEC_VOLATILE:
    case TRAP_IF:
    case ASM_INPUT:
…
        if (code == ASM_OPERANDS)
          {
            int j;

            for (j = 0; j < ASM_OPERANDS_INPUT_LENGTH (x); j++)
              df_uses_record (collection_rec, &ASM_OPERANDS_INPUT (x, j),
                              DF_REF_REG_USE, bb, insn_info, flags);
            return;
          }
        break;
…
}


Looks like ONLY the operands of  “ASM_OPERANDS” are recorded as USES in df analysis,  the operands of “UNSPEC_VOLATILE” are NOT. 

If we use “ASM_OPERANDS” instead of “UNSPEXC_VOLATILE” as you suggested, the data flow analysis should automatically pick up the operands of “ASM_OPERANDS”, and fix the data flow, right?


> 
> gets optimised to:
> 
>        xorl    %eax, %eax
>        movw    %ax, (%rdi)
> 
> with the two stores being merged.  The same thing is IMO valid for
> unspec_volatile.  In both cases, you would need some kind of memory
> clobber to prevent the move and merge from happening.
> 
>>> 
>>> There's also no dataflow reason why this couldn't be reordered to:
>>> 
>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>>      (const_int 0 [0])) "t10.c":11:1 -1
>>>   (nil))
>>> (insn 19 18 20 2 (unspec_volatile [
>>>          (reg:SI 1 dx)
>>>      ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>>   (nil))
>>> (insn 17 … pop a register other than dx from the stack …)
>>> 
>> 
>> This is the place I don’t quite agree at this moment, maybe I still not quite understand the “UNSPEC_volatile”.
>> 
>> I checked several places in GCC that handle “UNSPEC_VOLATILE”, for example,  for the routine “can_move_insns_across” in gcc/df-problem.c:
>> 
>>      if (NONDEBUG_INSN_P (insn))
>>        {
>>          if (volatile_insn_p (PATTERN (insn)))
>>            return false;
>> 
>> From my understanding of reading the code, when an insn is UNSPEC_VOLATILE, another insn will NOT be able to move across it. 
>> 
>> Then for the above example, the insn 17 should Not be moved across insn 19 either.
>> 
>> Let me know if I miss anything important. 
> 
> The above is conservatively correct.  But not all passes do it.
> E.g. combine does have a similar approach:
> 
>  /* If INSN contains volatile references (specifically volatile MEMs),
>     we cannot combine across any other volatile references.
>     Even if INSN doesn't contain volatile references, any intervening
>     volatile insn might affect machine state.  */
> 
>  is_volatile_p = volatile_refs_p (PATTERN (insn))
>    ? volatile_refs_p
>    : volatile_insn_p;
> 
> And like you say, the passes that use can_move_insns_across will be
> conservative too.  But not many passes use that function.

Okay, I see. 
> 
> Passes like fwprop.c, postreload-gcse.c and ree.c do not (AFAIK) worry
> about volatile asms or unspec_volatiles, and can move code across them.
> And that's kind-of inevitable.  Having an “everything barrier” makes
> life very hard for global optimisation.

Okay, so, it’s intentionally not making UNSPEC_VOLATILE as an “everything barrier”? 

(But I do feel that the design for UNSPEC_volatile is not clean)

> 
>>> The asm for (b) goes before the instruction, so we'd have:
>>> 
>>> (insn 17 … new asm …)
>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>>      (const_int 0 [0])) "t10.c":11:1 -1
>>>   (nil))
>>> (insn 19 … return …)
>>> 
>>> But something has to tell the df machinery that the value of edx
>>> matters on return from the function, otherwise insn 18 could be
>>> deleted as dead.  Adding edx to EPILOGUE_USES provides that information
>>> and stops the instruction from being deleted.
>> 
>> 
>> In the above, insn 17 will be something like:
>> 
>> (insn 17 ...(unspec_volatile [  (reg:SI 1 dx)
>>    ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1 
>> (nil))
> 
> In the example above, insn 17 would be an asm that clobbers dx
> (instead of using dx).
> 
>> So, the reg edx is marked as “UNSPEC_volatile” already, that should mean the value of edx matters on return from the function already, my understanding is that df should automatically pick up the “UNSPEC_VOLATILE” insn and it’s operands.   “UNSPEC_VOLATILE” insn should serve the same purpose as putting “edx” to EPILOGUE_USES. 
>> 
>> Do I miss anything here?
> 
> The point is that any use of dx at insn 17 comes before the definition
> in insn 18.  So a use in insn 17 would keep alive any store to dx that
> happend before insn 17.  But it would not keep the store in insn 18 live,
> since insn 18 executes later.

Okay, I see. 
> 
>>>>> I don't think we need a new target-specific unspec_volatile code to do (b).
>>>>> We can just use an automatically-generated volatile asm to clobber the
>>>>> registers first.  See e.g. how expand_asm_memory_blockage handles memory
>>>>> scheduling barriers.
>>>> /* Generate asm volatile("" : : : "memory") as the memory blockage.  */
>>>> 
>>>> static void
>>>> expand_asm_memory_blockage (void)
>>>> {
>>>> rtx asm_op, clob;
>>>> 
>>>> asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>>>>                                rtvec_alloc (0), rtvec_alloc (0),
>>>>                                rtvec_alloc (0), UNKNOWN_LOCATION);
>>>> MEM_VOLATILE_P (asm_op) = 1;
>>>> 
>>>> clob = gen_rtx_SCRATCH (VOIDmode);
>>>> clob = gen_rtx_MEM (BLKmode, clob);
>>>> clob = gen_rtx_CLOBBER (VOIDmode, clob);
>>>> 
>>>> emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>>>> }
>>>> 
>>>> 
>>>> As the following? 
>>>> 
>>>> /* Generate asm volatile("" : : : “regno") for REGNO.   */
>>>> 
>>>> static void
>>>> expand_asm_reg_volatile (machine_mode mode, unsigned int regno)
>>>> {
>>>> rtx asm_op, clob;
>>>> 
>>>> asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>>>>                                rtvec_alloc (0), rtvec_alloc (0),
>>>>                                rtvec_alloc (0), UNKNOWN_LOCATION);
>>>> MEM_VOLATILE_P (asm_op) = 1;
>>>> 
>>>> clob = gen_rtx_REG (mode, regno);
>>>> clob = gen_rtx_CLOBBER (VOIDmode, clob);
>>>> 
>>>> emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, asm_op, clob)));
>>>> }
>>>> 
>>>> Is the above correct? 
>>> 
>>> Yeah, looks good.  You should be able to clobber all the registers you
>>> want to clear in one asm.
>> 
>> How to do this?
> 
> Rather than create:
> 
>  gen_rtvec (2, asm_op, clob)
> 
> with just the asm and one clobber, you can create:
> 
>  gen_rtvec (N + 1, asm_op, clob1, …, clobN)
> 
> with N clobbers side-by-side.  When N is variable (as it probably would
> be in your case), it's easier to use rtvec_alloc and fill in the fields
> using RTVEC_ELT.  E.g.:
> 
>  rtvec v = rtvec_alloc (N + 1);
>  RTVEC_ELT (v, 0) = asm_op;
>  RTVEC_ELT (v, 1) = clob1;
>  …
>  RTVEC_ELT (v, N) = clobN;

Thanks.

Qing
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-22 17:06                                                                                                     ` Richard Sandiford
  2020-09-22 21:32                                                                                                       ` Qing Zhao
@ 2020-09-22 22:37                                                                                                       ` Segher Boessenkool
  2020-09-23 14:28                                                                                                         ` Qing Zhao
  1 sibling, 1 reply; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-22 22:37 UTC (permalink / raw)
  To: Qing Zhao, Kees Cook, Kees Cook via Gcc-patches, Jakub Jelinek,
	Uros Bizjak, Rodriguez Bahena, Victor, richard.sandiford

Hi!

On Tue, Sep 22, 2020 at 06:06:30PM +0100, Richard Sandiford wrote:
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> > Okay, thanks for the info. 
> > then, what’s the current definition of UNSPEC_VOLATILE? 
> 
> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
> 
>   @code{unspec_volatile} is used for volatile operations and operations
>   that may trap; @code{unspec} is used for other operations.
> 
> which seems like a cyclic definition: volatile expressions are defined
> to be expressions that are volatile.

volatile_insn_p returns true for unspec_volatile (and all other volatile
things).  Unfortunately the comment on this function is just as confused
as pretty much everything else :-/

> But IMO the semantics are that unspec_volatile patterns with a given
> set of inputs and outputs act for dataflow purposes like volatile asms
> with the same inputs and outputs.  The semantics of asm volatile are
> at least slightly more well-defined (if only by example); see extend.texi
> for details.  In particular:
> 
>   Note that the compiler can move even @code{volatile asm} instructions relative
>   to other code, including across jump instructions. For example, on many 
>   targets there is a system register that controls the rounding mode of 
>   floating-point operations. Setting it with a @code{volatile asm} statement,
>   as in the following PowerPC example, does not work reliably.
> 
>   @example
>   asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>   sum = x + y;
>   @end example
> 
>   The compiler may move the addition back before the @code{volatile asm}
>   statement. To make it work as expected, add an artificial dependency to
>   the @code{asm} by referencing a variable in the subsequent code, for
>   example:
> 
>   @example
>   asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>   sum = x + y;
>   @end example
> 
> which is very similar to the unspec_volatile case we're talking about.

So just like volatile memory accesses, they have an (unknown) side
effect, which means they have to execute on the real machine as on the
abstract machine (wrt sequence points).  All side effects have to happen
exactly as often as proscribed, and in the same order.  Just like
volatile asm, too.

And there is no magic to it, there are no other effects.

> To take an x86 example:
> 
>   void
>   f (char *x)
>   {
>     asm volatile ("");
>     x[0] = 0;
>     asm volatile ("");
>     x[1] = 0;
>     asm volatile ("");
>   }
> 
> gets optimised to:
> 
>         xorl    %eax, %eax
>         movw    %ax, (%rdi)

(If you use "#" or "#smth" you can see those in the generated asm --
completely empty asm is helpfully (uh...) not printed.)

> with the two stores being merged.  The same thing is IMO valid for
> unspec_volatile.  In both cases, you would need some kind of memory
> clobber to prevent the move and merge from happening.

Even then, x[] could be optimised away completely (with whole program
optimisation, or something).  The only way to really prevent the
compiler from optimising memory accesses is to make it not see the
details (with an asm or an unspec, for example).

> The above is conservatively correct.  But not all passes do it.
> E.g. combine does have a similar approach:
> 
>   /* If INSN contains volatile references (specifically volatile MEMs),
>      we cannot combine across any other volatile references.

And this is correct, and the *minimum* to do even (this could change the
order of the side effects, depending how combine places the resulting
insns in I2 and I3).

>      Even if INSN doesn't contain volatile references, any intervening
>      volatile insn might affect machine state.  */

Confusingly stated, but essentially correct (it is possible we place
the volatile at I2, and everything would still be sequenced correctly,
but combine does not guarantee that).

>   is_volatile_p = volatile_refs_p (PATTERN (insn))
>     ? volatile_refs_p
>     : volatile_insn_p;

Too much subtlety in there, heh.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-22 19:34                                                                                       ` Qing Zhao
@ 2020-09-23 10:43                                                                                         ` Richard Sandiford
  2020-09-23 13:54                                                                                           ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Richard Sandiford @ 2020-09-23 10:43 UTC (permalink / raw)
  To: Qing Zhao
  Cc: H.J. Lu, Jakub Jelinek, Kees Cook, Segher Boessenkool,
	Uros Bizjak, Rodriguez Bahena, Victor, Kees Cook via Gcc-patches

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 22, 2020, at 1:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Sep 22, 2020 at 11:25 AM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>>>> On Sep 22, 2020, at 11:31 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>> Taking each in turn: what is the reason for not clearing mask registers?
>>>> And what is the reason for not clearing mm0-7?  In each case, is it a
>>>> performance or a correctness issue?
>>> 
>>> Could you please provide more information on the above questions? (Why we exclude mask registers and mm0-7 registers from ALL on x86?)
>>> 
>> 
>> No particular reason.  You can add them.
>
> Okay, thanks.
>
> Then I guess that the reason we didn’t zero mask registers and mm0-7 registers on x86  is mainly for the performance consideration.
> There might not be too much benefit for mitigating ROP attack if we zero these additional registers, but we will got much more performance overhead.
>
> What’s you opinion, Richard?

Dropping them is fine with me FWIW.  That seems like a natural use
for the new hook: drop zeroing that isn't actively wrong, but isn't
likely to be useful either.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-22 21:32                                                                                                       ` Qing Zhao
@ 2020-09-23 11:05                                                                                                         ` Richard Sandiford
  2020-09-23 14:14                                                                                                           ` Qing Zhao
  2020-09-23 23:46                                                                                                           ` Segher Boessenkool
  0 siblings, 2 replies; 188+ messages in thread
From: Richard Sandiford @ 2020-09-23 11:05 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 22, 2020, at 12:06 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>> 
>>>>> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>>>> 
>>>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>>>>> ;; all of memory.  This blocks insns from being moved across this point.
>>>> 
>>>> Heh, it looks like that comment dates back to 1994. :-)
>>>> 
>>>> The comment is no longer correct though.  I wasn't around at the time,
>>>> but I assume the comment was only locally true even then.
>>>> 
>>>> If what the comment said was true, then something like:
>>>> 
>>>> (define_insn "cld"
>>>> [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>>> ""
>>>> "cld"
>>>> [(set_attr "length" "1")
>>>>  (set_attr "length_immediate" "0")
>>>>  (set_attr "modrm" "0")])
>>>> 
>>>> would invalidate the entire register file and so would require all values
>>>> to be spilt to the stack around the CLD.
>>> 
>>> Okay, thanks for the info. 
>>> then, what’s the current definition of UNSPEC_VOLATILE? 
>> 
>> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
>> 
>>  @code{unspec_volatile} is used for volatile operations and operations
>>  that may trap; @code{unspec} is used for other operations.
>> 
>> which seems like a cyclic definition: volatile expressions are defined
>> to be expressions that are volatile.
>> 
>> But IMO the semantics are that unspec_volatile patterns with a given
>> set of inputs and outputs act for dataflow purposes like volatile asms
>> with the same inputs and outputs.  The semantics of asm volatile are
>> at least slightly more well-defined (if only by example); see extend.texi
>> for details.  In particular:
>> 
>>  Note that the compiler can move even @code{volatile asm} instructions relative
>>  to other code, including across jump instructions. For example, on many 
>>  targets there is a system register that controls the rounding mode of 
>>  floating-point operations. Setting it with a @code{volatile asm} statement,
>>  as in the following PowerPC example, does not work reliably.
>> 
>>  @example
>>  asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>>  sum = x + y;
>>  @end example
>> 
>>  The compiler may move the addition back before the @code{volatile asm}
>>  statement. To make it work as expected, add an artificial dependency to
>>  the @code{asm} by referencing a variable in the subsequent code, for
>>  example:
>> 
>>  @example
>>  asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>>  sum = x + y;
>>  @end example
>> 
>> which is very similar to the unspec_volatile case we're talking about.
>> 
>> To take an x86 example:
>> 
>>  void
>>  f (char *x)
>>  {
>>    asm volatile ("");
>>    x[0] = 0;
>>    asm volatile ("");
>>    x[1] = 0;
>>    asm volatile ("");
>>  }
>
> If we change the above as the following: (but it might not correct on the asm format):
>
> Void
> F (char *x)
> {
> asm volatile (“x[0]”);
> x[0] = 0;
> asm volatile (“x[1]"); 
> x[1] = 0;
>  asm volatile ("”);
> }
>
> Will the moving and merging be blocked?

That would stop assignments moving up, but it wouldn't stop x[0] moving
down across the x[1] asm.  Using:

  asm volatile ("" ::: "memory");

would prevent moves in both directions, which was what I meant in my
later comment about memory clobbers.

In each case, the same would be true for unspec_volatile.

> I found the following code in df-scan.c:
>
> static void
> df_uses_record (class df_collection_rec *collection_rec,
>                 rtx *loc, enum df_ref_type ref_type,
>                 basic_block bb, struct df_insn_info *insn_info,
>                 int flags)
> {
> …
>
>     case ASM_OPERANDS:
>     case UNSPEC_VOLATILE:
>     case TRAP_IF:
>     case ASM_INPUT:
> …
>         if (code == ASM_OPERANDS)
>           {
>             int j;
>
>             for (j = 0; j < ASM_OPERANDS_INPUT_LENGTH (x); j++)
>               df_uses_record (collection_rec, &ASM_OPERANDS_INPUT (x, j),
>                               DF_REF_REG_USE, bb, insn_info, flags);
>             return;
>           }
>         break;
> …
> }
>
>
> Looks like ONLY the operands of  “ASM_OPERANDS” are recorded as USES in df analysis,  the operands of “UNSPEC_VOLATILE” are NOT. 

The recursion code after the switch statement handles the operands of
unspec_volatile.

> If we use “ASM_OPERANDS” instead of “UNSPEXC_VOLATILE” as you suggested, the data flow analysis should automatically pick up the operands of “ASM_OPERANDS”, and fix the data flow, right?

Using a volatile asm or an unspec_volatile would be equally correct.
The reason for preferring a volatile asm is that it doesn't require
target-specific .md patterns.

Of course, as mentioned before, “correct” in this case is: make a good
but not foolproof attempt at trying to prevent later passes from moving
the zeroing instructions further away from the return instruction
(or, equivalently, moving other instructions closer to the return
instruction).  Remember that we arrived here from a discussion about
whether the volatile insns would be enough to prevent machine_reorg and
other passes from moving instructions around (modulo bugs in those passes).
My position was that the volatile insns would help, but that we might
still find cases where a machine_reorg makes a behaviourally-correct
transformation that we don't want.

>> Passes like fwprop.c, postreload-gcse.c and ree.c do not (AFAIK) worry
>> about volatile asms or unspec_volatiles, and can move code across them.
>> And that's kind-of inevitable.  Having an “everything barrier” makes
>> life very hard for global optimisation.
>
> Okay, so, it’s intentionally not making UNSPEC_VOLATILE as an “everything barrier”? 

Yeah.

> (But I do feel that the design for UNSPEC_volatile is not clean)

Agreed.  But I think that's partly because what it's trying to achieve
isn't clean either.  It's a catch-all for “something is happening,
but we're not saying what”.  And not saying what is itself unclean. ;-)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-23 10:43                                                                                         ` Richard Sandiford
@ 2020-09-23 13:54                                                                                           ` Qing Zhao
  2020-09-23 14:22                                                                                             ` Richard Sandiford
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-23 13:54 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: H.J. Lu, Jakub Jelinek, Kees Cook, Segher Boessenkool,
	Uros Bizjak, Rodriguez Bahena, Victor, Kees Cook via Gcc-patches



> On Sep 23, 2020, at 5:43 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> On Sep 22, 2020, at 1:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Tue, Sep 22, 2020 at 11:25 AM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>>>>> On Sep 22, 2020, at 11:31 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>> Taking each in turn: what is the reason for not clearing mask registers?
>>>>> And what is the reason for not clearing mm0-7?  In each case, is it a
>>>>> performance or a correctness issue?
>>>> 
>>>> Could you please provide more information on the above questions? (Why we exclude mask registers and mm0-7 registers from ALL on x86?)
>>>> 
>>> 
>>> No particular reason.  You can add them.
>> 
>> Okay, thanks.
>> 
>> Then I guess that the reason we didn’t zero mask registers and mm0-7 registers on x86  is mainly for the performance consideration.
>> There might not be too much benefit for mitigating ROP attack if we zero these additional registers, but we will got much more performance overhead.
>> 
>> What’s you opinion, Richard?
> 
> Dropping them is fine with me FWIW.  That seems like a natural use
> for the new hook: drop zeroing that isn't actively wrong, but isn't
> likely to be useful either.

Okay, I will add a  new hook for this purpose.

Qing
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-23 11:05                                                                                                         ` Richard Sandiford
@ 2020-09-23 14:14                                                                                                           ` Qing Zhao
  2020-09-23 14:32                                                                                                             ` Richard Sandiford
  2020-09-23 23:46                                                                                                           ` Segher Boessenkool
  1 sibling, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-23 14:14 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor



> On Sep 23, 2020, at 6:05 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>> On Sep 22, 2020, at 12:06 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>>> 
>>>>>> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>>>>> 
>>>>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>>>>>> ;; all of memory.  This blocks insns from being moved across this point.
>>>>> 
>>>>> Heh, it looks like that comment dates back to 1994. :-)
>>>>> 
>>>>> The comment is no longer correct though.  I wasn't around at the time,
>>>>> but I assume the comment was only locally true even then.
>>>>> 
>>>>> If what the comment said was true, then something like:
>>>>> 
>>>>> (define_insn "cld"
>>>>> [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>>>> ""
>>>>> "cld"
>>>>> [(set_attr "length" "1")
>>>>> (set_attr "length_immediate" "0")
>>>>> (set_attr "modrm" "0")])
>>>>> 
>>>>> would invalidate the entire register file and so would require all values
>>>>> to be spilt to the stack around the CLD.
>>>> 
>>>> Okay, thanks for the info. 
>>>> then, what’s the current definition of UNSPEC_VOLATILE? 
>>> 
>>> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
>>> 
>>> @code{unspec_volatile} is used for volatile operations and operations
>>> that may trap; @code{unspec} is used for other operations.
>>> 
>>> which seems like a cyclic definition: volatile expressions are defined
>>> to be expressions that are volatile.
>>> 
>>> But IMO the semantics are that unspec_volatile patterns with a given
>>> set of inputs and outputs act for dataflow purposes like volatile asms
>>> with the same inputs and outputs.  The semantics of asm volatile are
>>> at least slightly more well-defined (if only by example); see extend.texi
>>> for details.  In particular:
>>> 
>>> Note that the compiler can move even @code{volatile asm} instructions relative
>>> to other code, including across jump instructions. For example, on many 
>>> targets there is a system register that controls the rounding mode of 
>>> floating-point operations. Setting it with a @code{volatile asm} statement,
>>> as in the following PowerPC example, does not work reliably.
>>> 
>>> @example
>>> asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>>> sum = x + y;
>>> @end example
>>> 
>>> The compiler may move the addition back before the @code{volatile asm}
>>> statement. To make it work as expected, add an artificial dependency to
>>> the @code{asm} by referencing a variable in the subsequent code, for
>>> example:
>>> 
>>> @example
>>> asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>>> sum = x + y;
>>> @end example
>>> 
>>> which is very similar to the unspec_volatile case we're talking about.
>>> 
>>> To take an x86 example:
>>> 
>>> void
>>> f (char *x)
>>> {
>>>   asm volatile ("");
>>>   x[0] = 0;
>>>   asm volatile ("");
>>>   x[1] = 0;
>>>   asm volatile ("");
>>> }
>> 
>> If we change the above as the following: (but it might not correct on the asm format):
>> 
>> Void
>> F (char *x)
>> {
>> asm volatile (“x[0]”);
>> x[0] = 0;
>> asm volatile (“x[1]"); 
>> x[1] = 0;
>> asm volatile ("”);
>> }
>> 
>> Will the moving and merging be blocked?
> 
> That would stop assignments moving up, but it wouldn't stop x[0] moving
> down across the x[1] asm.  Using:
> 
>  asm volatile ("" ::: "memory");
> 
> would prevent moves in both directions, which was what I meant in my
> later comment about memory clobbers.
> 
> In each case, the same would be true for unspec_volatile.

So, is the following good enough:

asm volatile (reg1, reg2, … regN, memory)
mov reg1, 0
mov reg2, 0
...
mov regN,0
asm volatile (reg1, reg2,… regN, memory)
return


I.e, just add one “asm volatile” insn whose operands include all registers and memory BEFORE and AFTER the whole zeroing sequence.

Or, we have to add one “asm volatile” insn before and after each “mov” insn? 


> 
>> I found the following code in df-scan.c:
>> 
>> static void
>> df_uses_record (class df_collection_rec *collection_rec,
>>                rtx *loc, enum df_ref_type ref_type,
>>                basic_block bb, struct df_insn_info *insn_info,
>>                int flags)
>> {
>> …
>> 
>>    case ASM_OPERANDS:
>>    case UNSPEC_VOLATILE:
>>    case TRAP_IF:
>>    case ASM_INPUT:
>> …
>>        if (code == ASM_OPERANDS)
>>          {
>>            int j;
>> 
>>            for (j = 0; j < ASM_OPERANDS_INPUT_LENGTH (x); j++)
>>              df_uses_record (collection_rec, &ASM_OPERANDS_INPUT (x, j),
>>                              DF_REF_REG_USE, bb, insn_info, flags);
>>            return;
>>          }
>>        break;
>> …
>> }
>> 
>> 
>> Looks like ONLY the operands of  “ASM_OPERANDS” are recorded as USES in df analysis,  the operands of “UNSPEC_VOLATILE” are NOT. 
> 
> The recursion code after the switch statement handles the operands of
> unspec_volatile.

Okay, I see. 
So, these two are actually equal to each other. 

> 
>> If we use “ASM_OPERANDS” instead of “UNSPEXC_VOLATILE” as you suggested, the data flow analysis should automatically pick up the operands of “ASM_OPERANDS”, and fix the data flow, right?
> 
> Using a volatile asm or an unspec_volatile would be equally correct.
> The reason for preferring a volatile asm is that it doesn't require
> target-specific .md patterns.
Okay.

Then is there any benefit to use “UNSPEC_volatile” over “volatile asm”?
> 
> Of course, as mentioned before, “correct” in this case is: make a good
> but not foolproof attempt at trying to prevent later passes from moving
> the zeroing instructions further away from the return instruction
> (or, equivalently, moving other instructions closer to the return
> instruction).  Remember that we arrived here from a discussion about
> whether the volatile insns would be enough to prevent machine_reorg and
> other passes from moving instructions around (modulo bugs in those passes).
> My position was that the volatile insns would help, but that we might
> still find cases where a machine_reorg makes a behaviourally-correct
> transformation that we don't want.
So, you mean after adding “volatile asm” or “UNSPEC_volatile”,  although 
most of the insn movement can be prevented, there might still be small possibitly 
Some unwanted transformation might happen?

> 
>>> Passes like fwprop.c, postreload-gcse.c and ree.c do not (AFAIK) worry
>>> about volatile asms or unspec_volatiles, and can move code across them.
>>> And that's kind-of inevitable.  Having an “everything barrier” makes
>>> life very hard for global optimisation.
>> 
>> Okay, so, it’s intentionally not making UNSPEC_VOLATILE as an “everything barrier”? 
> 
> Yeah.
> 
>> (But I do feel that the design for UNSPEC_volatile is not clean)
> 
> Agreed.  But I think that's partly because what it's trying to achieve
> isn't clean either.  It's a catch-all for “something is happening,
> but we're not saying what”.  And not saying what is itself unclean. ;-)

thanks.

Qing
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-23 13:54                                                                                           ` Qing Zhao
@ 2020-09-23 14:22                                                                                             ` Richard Sandiford
  2020-09-23 14:35                                                                                               ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Richard Sandiford @ 2020-09-23 14:22 UTC (permalink / raw)
  To: Qing Zhao
  Cc: H.J. Lu, Jakub Jelinek, Kees Cook, Segher Boessenkool,
	Uros Bizjak, Rodriguez Bahena, Victor, Kees Cook via Gcc-patches

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 23, 2020, at 5:43 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> On Sep 22, 2020, at 1:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Tue, Sep 22, 2020 at 11:25 AM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>>>>>> On Sep 22, 2020, at 11:31 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>>> Taking each in turn: what is the reason for not clearing mask registers?
>>>>>> And what is the reason for not clearing mm0-7?  In each case, is it a
>>>>>> performance or a correctness issue?
>>>>> 
>>>>> Could you please provide more information on the above questions? (Why we exclude mask registers and mm0-7 registers from ALL on x86?)
>>>>> 
>>>> 
>>>> No particular reason.  You can add them.
>>> 
>>> Okay, thanks.
>>> 
>>> Then I guess that the reason we didn’t zero mask registers and mm0-7 registers on x86  is mainly for the performance consideration.
>>> There might not be too much benefit for mitigating ROP attack if we zero these additional registers, but we will got much more performance overhead.
>>> 
>>> What’s you opinion, Richard?
>> 
>> Dropping them is fine with me FWIW.  That seems like a natural use
>> for the new hook: drop zeroing that isn't actively wrong, but isn't
>> likely to be useful either.
>
> Okay, I will add a  new hook for this purpose.

It doesn't need to be a new hook.  The one I mentioned before
would enough:

> The kind of target hook interface I was thinking of was:
>
>   HARD_REG_SET TARGET_EMIT_MOVE_ZEROS (const HARD_REG_SET &regs)
>
> which:
>
> - emits zeroing instructions for some target-specific subset of REGS
>
> - returns the set of registers that were actually cleared

Not clearing mm0-7 and k0-7 would come under the first bullet point.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-22 22:37                                                                                                       ` Segher Boessenkool
@ 2020-09-23 14:28                                                                                                         ` Qing Zhao
  2020-09-23 23:40                                                                                                           ` Segher Boessenkool
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-23 14:28 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Kees Cook, Kees Cook via Gcc-patches, Jakub Jelinek, Uros Bizjak,
	Rodriguez Bahena, Victor, richard.sandiford



> On Sep 22, 2020, at 5:37 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Hi!
> 
> On Tue, Sep 22, 2020 at 06:06:30PM +0100, Richard Sandiford wrote:
>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> Okay, thanks for the info. 
>>> then, what’s the current definition of UNSPEC_VOLATILE? 
>> 
>> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
>> 
>>  @code{unspec_volatile} is used for volatile operations and operations
>>  that may trap; @code{unspec} is used for other operations.
>> 
>> which seems like a cyclic definition: volatile expressions are defined
>> to be expressions that are volatile.
> 
> volatile_insn_p returns true for unspec_volatile (and all other volatile
> things).  Unfortunately the comment on this function is just as confused
> as pretty much everything else :-/
> 
>> But IMO the semantics are that unspec_volatile patterns with a given
>> set of inputs and outputs act for dataflow purposes like volatile asms
>> with the same inputs and outputs.  The semantics of asm volatile are
>> at least slightly more well-defined (if only by example); see extend.texi
>> for details.  In particular:
>> 
>>  Note that the compiler can move even @code{volatile asm} instructions relative
>>  to other code, including across jump instructions. For example, on many 
>>  targets there is a system register that controls the rounding mode of 
>>  floating-point operations. Setting it with a @code{volatile asm} statement,
>>  as in the following PowerPC example, does not work reliably.
>> 
>>  @example
>>  asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>>  sum = x + y;
>>  @end example
>> 
>>  The compiler may move the addition back before the @code{volatile asm}
>>  statement. To make it work as expected, add an artificial dependency to
>>  the @code{asm} by referencing a variable in the subsequent code, for
>>  example:
>> 
>>  @example
>>  asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>>  sum = x + y;
>>  @end example
>> 
>> which is very similar to the unspec_volatile case we're talking about.
> 
> So just like volatile memory accesses, they have an (unknown) side
> effect, which means they have to execute on the real machine as on the
> abstract machine (wrt sequence points).  All side effects have to happen
> exactly as often as proscribed, and in the same order.  Just like
> volatile asm, too.
Don’t quite understand the above, what do you mean by “they have to 
execute on the real machine as on the abstract machine”?

> 
> And there is no magic to it, there are no other effects.
> 
>> To take an x86 example:
>> 
>>  void
>>  f (char *x)
>>  {
>>    asm volatile ("");
>>    x[0] = 0;
>>    asm volatile ("");
>>    x[1] = 0;
>>    asm volatile ("");
>>  }
>> 
>> gets optimised to:
>> 
>>        xorl    %eax, %eax
>>        movw    %ax, (%rdi)
> 
> (If you use "#" or "#smth" you can see those in the generated asm --
> completely empty asm is helpfully (uh...) not printed.)

Can you explain this in more details?

> 
>> with the two stores being merged.  The same thing is IMO valid for
>> unspec_volatile.  In both cases, you would need some kind of memory
>> clobber to prevent the move and merge from happening.
> 
> Even then, x[] could be optimised away completely (with whole program
> optimisation, or something).  The only way to really prevent the
> compiler from optimising memory accesses is to make it not see the
> details (with an asm or an unspec, for example).
You mean with a asm volatile (“” “memory”)?

> 
>> The above is conservatively correct.  But not all passes do it.
>> E.g. combine does have a similar approach:
>> 
>>  /* If INSN contains volatile references (specifically volatile MEMs),
>>     we cannot combine across any other volatile references.
> 
> And this is correct, and the *minimum* to do even (this could change the
> order of the side effects, depending how combine places the resulting
> insns in I2 and I3).

Please clarify what “L2 and L3” are?
> 
>>     Even if INSN doesn't contain volatile references, any intervening
>>     volatile insn might affect machine state.  */
> 
> Confusingly stated, but essentially correct (it is possible we place
> the volatile at I2, and everything would still be sequenced correctly,
> but combine does not guarantee that).

thanks.

Qing
> 
>>  is_volatile_p = volatile_refs_p (PATTERN (insn))
>>    ? volatile_refs_p
>>    : volatile_insn_p;
> 
> Too much subtlety in there, heh.
> 
> 
> Segher


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-23 14:14                                                                                                           ` Qing Zhao
@ 2020-09-23 14:32                                                                                                             ` Richard Sandiford
  2020-09-23 14:48                                                                                                               ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Richard Sandiford @ 2020-09-23 14:32 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 23, 2020, at 6:05 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>>> On Sep 22, 2020, at 12:06 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>>>> 
>>>>>>> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>>>>>> 
>>>>>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>>>>>>> ;; all of memory.  This blocks insns from being moved across this point.
>>>>>> 
>>>>>> Heh, it looks like that comment dates back to 1994. :-)
>>>>>> 
>>>>>> The comment is no longer correct though.  I wasn't around at the time,
>>>>>> but I assume the comment was only locally true even then.
>>>>>> 
>>>>>> If what the comment said was true, then something like:
>>>>>> 
>>>>>> (define_insn "cld"
>>>>>> [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>>>>> ""
>>>>>> "cld"
>>>>>> [(set_attr "length" "1")
>>>>>> (set_attr "length_immediate" "0")
>>>>>> (set_attr "modrm" "0")])
>>>>>> 
>>>>>> would invalidate the entire register file and so would require all values
>>>>>> to be spilt to the stack around the CLD.
>>>>> 
>>>>> Okay, thanks for the info. 
>>>>> then, what’s the current definition of UNSPEC_VOLATILE? 
>>>> 
>>>> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
>>>> 
>>>> @code{unspec_volatile} is used for volatile operations and operations
>>>> that may trap; @code{unspec} is used for other operations.
>>>> 
>>>> which seems like a cyclic definition: volatile expressions are defined
>>>> to be expressions that are volatile.
>>>> 
>>>> But IMO the semantics are that unspec_volatile patterns with a given
>>>> set of inputs and outputs act for dataflow purposes like volatile asms
>>>> with the same inputs and outputs.  The semantics of asm volatile are
>>>> at least slightly more well-defined (if only by example); see extend.texi
>>>> for details.  In particular:
>>>> 
>>>> Note that the compiler can move even @code{volatile asm} instructions relative
>>>> to other code, including across jump instructions. For example, on many 
>>>> targets there is a system register that controls the rounding mode of 
>>>> floating-point operations. Setting it with a @code{volatile asm} statement,
>>>> as in the following PowerPC example, does not work reliably.
>>>> 
>>>> @example
>>>> asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>>>> sum = x + y;
>>>> @end example
>>>> 
>>>> The compiler may move the addition back before the @code{volatile asm}
>>>> statement. To make it work as expected, add an artificial dependency to
>>>> the @code{asm} by referencing a variable in the subsequent code, for
>>>> example:
>>>> 
>>>> @example
>>>> asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>>>> sum = x + y;
>>>> @end example
>>>> 
>>>> which is very similar to the unspec_volatile case we're talking about.
>>>> 
>>>> To take an x86 example:
>>>> 
>>>> void
>>>> f (char *x)
>>>> {
>>>>   asm volatile ("");
>>>>   x[0] = 0;
>>>>   asm volatile ("");
>>>>   x[1] = 0;
>>>>   asm volatile ("");
>>>> }
>>> 
>>> If we change the above as the following: (but it might not correct on the asm format):
>>> 
>>> Void
>>> F (char *x)
>>> {
>>> asm volatile (“x[0]”);
>>> x[0] = 0;
>>> asm volatile (“x[1]"); 
>>> x[1] = 0;
>>> asm volatile ("”);
>>> }
>>> 
>>> Will the moving and merging be blocked?
>> 
>> That would stop assignments moving up, but it wouldn't stop x[0] moving
>> down across the x[1] asm.  Using:
>> 
>>  asm volatile ("" ::: "memory");
>> 
>> would prevent moves in both directions, which was what I meant in my
>> later comment about memory clobbers.
>> 
>> In each case, the same would be true for unspec_volatile.
>
> So, is the following good enough:
>
> asm volatile (reg1, reg2, … regN, memory)
> mov reg1, 0
> mov reg2, 0
> ...
> mov regN,0
> asm volatile (reg1, reg2,… regN, memory)
> return
>
>
> I.e, just add one “asm volatile” insn whose operands include all registers and memory BEFORE and AFTER the whole zeroing sequence.

It isn't clear from your syntax whether the asm volatile arguments
are uses or clobbers.  The idea was:

- There would be an asm volatile before the moves that clobbers (but does
  not use) (mem:BLK (scratch)) and the zeroed registers.

- EPILOGUE_USES would make the zeroed registers live after the return.

> Or, we have to add one “asm volatile” insn before and after each “mov” insn? 

No, the idea with the multiple clobber thing was to have a single asm.

>>> If we use “ASM_OPERANDS” instead of “UNSPEXC_VOLATILE” as you suggested, the data flow analysis should automatically pick up the operands of “ASM_OPERANDS”, and fix the data flow, right?
>> 
>> Using a volatile asm or an unspec_volatile would be equally correct.
>> The reason for preferring a volatile asm is that it doesn't require
>> target-specific .md patterns.
> Okay.
>
> Then is there any benefit to use “UNSPEC_volatile” over “volatile asm”?

In general, yes: you can use the full .md functionality with
unspec_volatiles, such as splitting insns, adding match_scratches
with different clobber requirements, writing custom output code,
setting attributes, etc.

But there isn't an advantage to using unspec_volatile in this case,
where the instruction doesn't actually do anything.

>> Of course, as mentioned before, “correct” in this case is: make a good
>> but not foolproof attempt at trying to prevent later passes from moving
>> the zeroing instructions further away from the return instruction
>> (or, equivalently, moving other instructions closer to the return
>> instruction).  Remember that we arrived here from a discussion about
>> whether the volatile insns would be enough to prevent machine_reorg and
>> other passes from moving instructions around (modulo bugs in those passes).
>> My position was that the volatile insns would help, but that we might
>> still find cases where a machine_reorg makes a behaviourally-correct
>> transformation that we don't want.
> So, you mean after adding “volatile asm” or “UNSPEC_volatile”,  although 
> most of the insn movement can be prevented, there might still be small possibitly 
> Some unwanted transformation might happen?

I wouldn't want to quantify the possibility.  The point is just that the
possibility exists.  The unspec_volatile does not prevent movement of
unrelated non-volatile operations.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-23 14:22                                                                                             ` Richard Sandiford
@ 2020-09-23 14:35                                                                                               ` Qing Zhao
  2020-09-23 14:40                                                                                                 ` Richard Sandiford
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-23 14:35 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: H.J. Lu, Jakub Jelinek, Kees Cook, Segher Boessenkool,
	Uros Bizjak, Rodriguez Bahena, Victor, Kees Cook via Gcc-patches



> On Sep 23, 2020, at 9:22 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>> On Sep 23, 2020, at 5:43 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>> On Sep 22, 2020, at 1:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Tue, Sep 22, 2020 at 11:25 AM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>>>>>>> On Sep 22, 2020, at 11:31 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>>>> Taking each in turn: what is the reason for not clearing mask registers?
>>>>>>> And what is the reason for not clearing mm0-7?  In each case, is it a
>>>>>>> performance or a correctness issue?
>>>>>> 
>>>>>> Could you please provide more information on the above questions? (Why we exclude mask registers and mm0-7 registers from ALL on x86?)
>>>>>> 
>>>>> 
>>>>> No particular reason.  You can add them.
>>>> 
>>>> Okay, thanks.
>>>> 
>>>> Then I guess that the reason we didn’t zero mask registers and mm0-7 registers on x86  is mainly for the performance consideration.
>>>> There might not be too much benefit for mitigating ROP attack if we zero these additional registers, but we will got much more performance overhead.
>>>> 
>>>> What’s you opinion, Richard?
>>> 
>>> Dropping them is fine with me FWIW.  That seems like a natural use
>>> for the new hook: drop zeroing that isn't actively wrong, but isn't
>>> likely to be useful either.
>> 
>> Okay, I will add a  new hook for this purpose.
> 
> It doesn't need to be a new hook.  The one I mentioned before
> would enough:
> 
>> The kind of target hook interface I was thinking of was:
>> 
>>  HARD_REG_SET TARGET_EMIT_MOVE_ZEROS (const HARD_REG_SET &regs)
>> 
>> which:
>> 
>> - emits zeroing instructions for some target-specific subset of REGS
>> 
>> - returns the set of registers that were actually cleared
> 
> Not clearing mm0-7 and k0-7 would come under the first bullet point.
This makes sense.

However, how about the second bullet point:

- returns the set of registers that were actually cleared

Should we delete mm0-7 and k0-7 from the return set even though they are not zeroed by the target?

Qing


> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-23 14:35                                                                                               ` Qing Zhao
@ 2020-09-23 14:40                                                                                                 ` Richard Sandiford
  2020-09-23 14:49                                                                                                   ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Richard Sandiford @ 2020-09-23 14:40 UTC (permalink / raw)
  To: Qing Zhao
  Cc: H.J. Lu, Jakub Jelinek, Kees Cook, Segher Boessenkool,
	Uros Bizjak, Rodriguez Bahena, Victor, Kees Cook via Gcc-patches

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> Dropping them is fine with me FWIW.  That seems like a natural use
>>>> for the new hook: drop zeroing that isn't actively wrong, but isn't
>>>> likely to be useful either.
>>> 
>>> Okay, I will add a  new hook for this purpose.
>> 
>> It doesn't need to be a new hook.  The one I mentioned before
>> would enough:
>> 
>>> The kind of target hook interface I was thinking of was:
>>> 
>>>  HARD_REG_SET TARGET_EMIT_MOVE_ZEROS (const HARD_REG_SET &regs)
>>> 
>>> which:
>>> 
>>> - emits zeroing instructions for some target-specific subset of REGS
>>> 
>>> - returns the set of registers that were actually cleared
>> 
>> Not clearing mm0-7 and k0-7 would come under the first bullet point.
> This makes sense.
>
> However, how about the second bullet point:
>
> - returns the set of registers that were actually cleared
>
> Should we delete mm0-7 and k0-7 from the return set even though they are not zeroed by the target?

Yes, the point of the return value is to tell the caller what the
hook actually did.  If the hook didn't clear mm0-7 then the returned
set shouldn't include mm0-7.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-23 14:32                                                                                                             ` Richard Sandiford
@ 2020-09-23 14:48                                                                                                               ` Qing Zhao
  2020-09-23 15:21                                                                                                                 ` Richard Sandiford
  0 siblings, 1 reply; 188+ messages in thread
From: Qing Zhao @ 2020-09-23 14:48 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor



> On Sep 23, 2020, at 9:32 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> On Sep 23, 2020, at 6:05 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>>>> On Sep 22, 2020, at 12:06 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>>>>> 
>>>>>>>> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>>>>>>> 
>>>>>>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>>>>>>>> ;; all of memory.  This blocks insns from being moved across this point.
>>>>>>> 
>>>>>>> Heh, it looks like that comment dates back to 1994. :-)
>>>>>>> 
>>>>>>> The comment is no longer correct though.  I wasn't around at the time,
>>>>>>> but I assume the comment was only locally true even then.
>>>>>>> 
>>>>>>> If what the comment said was true, then something like:
>>>>>>> 
>>>>>>> (define_insn "cld"
>>>>>>> [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>>>>>> ""
>>>>>>> "cld"
>>>>>>> [(set_attr "length" "1")
>>>>>>> (set_attr "length_immediate" "0")
>>>>>>> (set_attr "modrm" "0")])
>>>>>>> 
>>>>>>> would invalidate the entire register file and so would require all values
>>>>>>> to be spilt to the stack around the CLD.
>>>>>> 
>>>>>> Okay, thanks for the info. 
>>>>>> then, what’s the current definition of UNSPEC_VOLATILE? 
>>>>> 
>>>>> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
>>>>> 
>>>>> @code{unspec_volatile} is used for volatile operations and operations
>>>>> that may trap; @code{unspec} is used for other operations.
>>>>> 
>>>>> which seems like a cyclic definition: volatile expressions are defined
>>>>> to be expressions that are volatile.
>>>>> 
>>>>> But IMO the semantics are that unspec_volatile patterns with a given
>>>>> set of inputs and outputs act for dataflow purposes like volatile asms
>>>>> with the same inputs and outputs.  The semantics of asm volatile are
>>>>> at least slightly more well-defined (if only by example); see extend.texi
>>>>> for details.  In particular:
>>>>> 
>>>>> Note that the compiler can move even @code{volatile asm} instructions relative
>>>>> to other code, including across jump instructions. For example, on many 
>>>>> targets there is a system register that controls the rounding mode of 
>>>>> floating-point operations. Setting it with a @code{volatile asm} statement,
>>>>> as in the following PowerPC example, does not work reliably.
>>>>> 
>>>>> @example
>>>>> asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>>>>> sum = x + y;
>>>>> @end example
>>>>> 
>>>>> The compiler may move the addition back before the @code{volatile asm}
>>>>> statement. To make it work as expected, add an artificial dependency to
>>>>> the @code{asm} by referencing a variable in the subsequent code, for
>>>>> example:
>>>>> 
>>>>> @example
>>>>> asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>>>>> sum = x + y;
>>>>> @end example
>>>>> 
>>>>> which is very similar to the unspec_volatile case we're talking about.
>>>>> 
>>>>> To take an x86 example:
>>>>> 
>>>>> void
>>>>> f (char *x)
>>>>> {
>>>>>  asm volatile ("");
>>>>>  x[0] = 0;
>>>>>  asm volatile ("");
>>>>>  x[1] = 0;
>>>>>  asm volatile ("");
>>>>> }
>>>> 
>>>> If we change the above as the following: (but it might not correct on the asm format):
>>>> 
>>>> Void
>>>> F (char *x)
>>>> {
>>>> asm volatile (“x[0]”);
>>>> x[0] = 0;
>>>> asm volatile (“x[1]"); 
>>>> x[1] = 0;
>>>> asm volatile ("”);
>>>> }
>>>> 
>>>> Will the moving and merging be blocked?
>>> 
>>> That would stop assignments moving up, but it wouldn't stop x[0] moving
>>> down across the x[1] asm.  Using:
>>> 
>>> asm volatile ("" ::: "memory");
>>> 
>>> would prevent moves in both directions, which was what I meant in my
>>> later comment about memory clobbers.
>>> 
>>> In each case, the same would be true for unspec_volatile.
>> 
>> So, is the following good enough:
>> 
>> asm volatile (reg1, reg2, … regN, memory)
>> mov reg1, 0
>> mov reg2, 0
>> ...
>> mov regN,0
>> asm volatile (reg1, reg2,… regN, memory)
>> return
>> 
>> 
>> I.e, just add one “asm volatile” insn whose operands include all registers and memory BEFORE and AFTER the whole zeroing sequence.
> 
> It isn't clear from your syntax whether the asm volatile arguments
> are uses or clobbers.

How can the syntax of asm volatile distinguish “Uses” and “Clobbers”? 

>  The idea was:
> 
> - There would be an asm volatile before the moves that clobbers (but does
>  not use) (mem:BLK (scratch)) and the zeroed registers.
> 
> - EPILOGUE_USES would make the zeroed registers live after the return.

Is EPILOGUE_USES the only way for this purpose? Will add another “asm volatile” immediately before the return serve the same purpose?


> 
>> Or, we have to add one “asm volatile” insn before and after each “mov” insn? 
> 
> No, the idea with the multiple clobber thing was to have a single asm.
Okay.
> 
>>>> If we use “ASM_OPERANDS” instead of “UNSPEXC_VOLATILE” as you suggested, the data flow analysis should automatically pick up the operands of “ASM_OPERANDS”, and fix the data flow, right?
>>> 
>>> Using a volatile asm or an unspec_volatile would be equally correct.
>>> The reason for preferring a volatile asm is that it doesn't require
>>> target-specific .md patterns.
>> Okay.
>> 
>> Then is there any benefit to use “UNSPEC_volatile” over “volatile asm”?
> 
> In general, yes: you can use the full .md functionality with
> unspec_volatiles, such as splitting insns, adding match_scratches
> with different clobber requirements, writing custom output code,
> setting attributes, etc.
> 
> But there isn't an advantage to using unspec_volatile in this case,
> where the instruction doesn't actually do anything.

Okay, I see. 

> 
>>> Of course, as mentioned before, “correct” in this case is: make a good
>>> but not foolproof attempt at trying to prevent later passes from moving
>>> the zeroing instructions further away from the return instruction
>>> (or, equivalently, moving other instructions closer to the return
>>> instruction).  Remember that we arrived here from a discussion about
>>> whether the volatile insns would be enough to prevent machine_reorg and
>>> other passes from moving instructions around (modulo bugs in those passes).
>>> My position was that the volatile insns would help, but that we might
>>> still find cases where a machine_reorg makes a behaviourally-correct
>>> transformation that we don't want.
>> So, you mean after adding “volatile asm” or “UNSPEC_volatile”,  although 
>> most of the insn movement can be prevented, there might still be small possibitly 
>> Some unwanted transformation might happen?
> 
> I wouldn't want to quantify the possibility.  The point is just that the
> possibility exists.  The unspec_volatile does not prevent movement of
> unrelated non-volatile operations.

Okay. 

thanks.

Qing
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-23 14:40                                                                                                 ` Richard Sandiford
@ 2020-09-23 14:49                                                                                                   ` Qing Zhao
  0 siblings, 0 replies; 188+ messages in thread
From: Qing Zhao @ 2020-09-23 14:49 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: H.J. Lu, Jakub Jelinek, Kees Cook, Segher Boessenkool,
	Uros Bizjak, Rodriguez Bahena, Victor, Kees Cook via Gcc-patches



> On Sep 23, 2020, at 9:40 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>> Dropping them is fine with me FWIW.  That seems like a natural use
>>>>> for the new hook: drop zeroing that isn't actively wrong, but isn't
>>>>> likely to be useful either.
>>>> 
>>>> Okay, I will add a  new hook for this purpose.
>>> 
>>> It doesn't need to be a new hook.  The one I mentioned before
>>> would enough:
>>> 
>>>> The kind of target hook interface I was thinking of was:
>>>> 
>>>> HARD_REG_SET TARGET_EMIT_MOVE_ZEROS (const HARD_REG_SET &regs)
>>>> 
>>>> which:
>>>> 
>>>> - emits zeroing instructions for some target-specific subset of REGS
>>>> 
>>>> - returns the set of registers that were actually cleared
>>> 
>>> Not clearing mm0-7 and k0-7 would come under the first bullet point.
>> This makes sense.
>> 
>> However, how about the second bullet point:
>> 
>> - returns the set of registers that were actually cleared
>> 
>> Should we delete mm0-7 and k0-7 from the return set even though they are not zeroed by the target?
> 
> Yes, the point of the return value is to tell the caller what the
> hook actually did.  If the hook didn't clear mm0-7 then the returned
> set shouldn't include mm0-7.

Okay.

Qing
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-23 14:48                                                                                                               ` Qing Zhao
@ 2020-09-23 15:21                                                                                                                 ` Richard Sandiford
  2020-09-23 16:08                                                                                                                   ` Qing Zhao
  0 siblings, 1 reply; 188+ messages in thread
From: Richard Sandiford @ 2020-09-23 15:21 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Sep 23, 2020, at 9:32 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> On Sep 23, 2020, at 6:05 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>> 
>>>> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>>>>> On Sep 22, 2020, at 12:06 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>>>>>> 
>>>>>>>>> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>>>>>>>> 
>>>>>>>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>>>>>>>>> ;; all of memory.  This blocks insns from being moved across this point.
>>>>>>>> 
>>>>>>>> Heh, it looks like that comment dates back to 1994. :-)
>>>>>>>> 
>>>>>>>> The comment is no longer correct though.  I wasn't around at the time,
>>>>>>>> but I assume the comment was only locally true even then.
>>>>>>>> 
>>>>>>>> If what the comment said was true, then something like:
>>>>>>>> 
>>>>>>>> (define_insn "cld"
>>>>>>>> [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>>>>>>> ""
>>>>>>>> "cld"
>>>>>>>> [(set_attr "length" "1")
>>>>>>>> (set_attr "length_immediate" "0")
>>>>>>>> (set_attr "modrm" "0")])
>>>>>>>> 
>>>>>>>> would invalidate the entire register file and so would require all values
>>>>>>>> to be spilt to the stack around the CLD.
>>>>>>> 
>>>>>>> Okay, thanks for the info. 
>>>>>>> then, what’s the current definition of UNSPEC_VOLATILE? 
>>>>>> 
>>>>>> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
>>>>>> 
>>>>>> @code{unspec_volatile} is used for volatile operations and operations
>>>>>> that may trap; @code{unspec} is used for other operations.
>>>>>> 
>>>>>> which seems like a cyclic definition: volatile expressions are defined
>>>>>> to be expressions that are volatile.
>>>>>> 
>>>>>> But IMO the semantics are that unspec_volatile patterns with a given
>>>>>> set of inputs and outputs act for dataflow purposes like volatile asms
>>>>>> with the same inputs and outputs.  The semantics of asm volatile are
>>>>>> at least slightly more well-defined (if only by example); see extend.texi
>>>>>> for details.  In particular:
>>>>>> 
>>>>>> Note that the compiler can move even @code{volatile asm} instructions relative
>>>>>> to other code, including across jump instructions. For example, on many 
>>>>>> targets there is a system register that controls the rounding mode of 
>>>>>> floating-point operations. Setting it with a @code{volatile asm} statement,
>>>>>> as in the following PowerPC example, does not work reliably.
>>>>>> 
>>>>>> @example
>>>>>> asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>>>>>> sum = x + y;
>>>>>> @end example
>>>>>> 
>>>>>> The compiler may move the addition back before the @code{volatile asm}
>>>>>> statement. To make it work as expected, add an artificial dependency to
>>>>>> the @code{asm} by referencing a variable in the subsequent code, for
>>>>>> example:
>>>>>> 
>>>>>> @example
>>>>>> asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>>>>>> sum = x + y;
>>>>>> @end example
>>>>>> 
>>>>>> which is very similar to the unspec_volatile case we're talking about.
>>>>>> 
>>>>>> To take an x86 example:
>>>>>> 
>>>>>> void
>>>>>> f (char *x)
>>>>>> {
>>>>>>  asm volatile ("");
>>>>>>  x[0] = 0;
>>>>>>  asm volatile ("");
>>>>>>  x[1] = 0;
>>>>>>  asm volatile ("");
>>>>>> }
>>>>> 
>>>>> If we change the above as the following: (but it might not correct on the asm format):
>>>>> 
>>>>> Void
>>>>> F (char *x)
>>>>> {
>>>>> asm volatile (“x[0]”);
>>>>> x[0] = 0;
>>>>> asm volatile (“x[1]"); 
>>>>> x[1] = 0;
>>>>> asm volatile ("”);
>>>>> }
>>>>> 
>>>>> Will the moving and merging be blocked?
>>>> 
>>>> That would stop assignments moving up, but it wouldn't stop x[0] moving
>>>> down across the x[1] asm.  Using:
>>>> 
>>>> asm volatile ("" ::: "memory");
>>>> 
>>>> would prevent moves in both directions, which was what I meant in my
>>>> later comment about memory clobbers.
>>>> 
>>>> In each case, the same would be true for unspec_volatile.
>>> 
>>> So, is the following good enough:
>>> 
>>> asm volatile (reg1, reg2, … regN, memory)
>>> mov reg1, 0
>>> mov reg2, 0
>>> ...
>>> mov regN,0
>>> asm volatile (reg1, reg2,… regN, memory)
>>> return
>>> 
>>> 
>>> I.e, just add one “asm volatile” insn whose operands include all registers and memory BEFORE and AFTER the whole zeroing sequence.
>> 
>> It isn't clear from your syntax whether the asm volatile arguments
>> are uses or clobbers.
>
> How can the syntax of asm volatile distinguish “Uses” and “Clobbers”? 

Well, I wasn't trying to discuss correct syntax, I just wasn't sure what
you meant.

As mentioned in the quote below, I was expecting the asm volatile
before the zeroing to include clobbers generated as discussed in
the earlier message:

  rtx asm_op = gen_rtx_ASM_OPERANDS (…);
  MEM_VOLATILE_P (asm_op) = 1;

  rtvec v = rtvec_alloc (N + 1);
  RTVEC_ELT (v, 0) = asm_op;
  RTVEC_ELT (v, 1) = gen_rtx_CLOBBER (VOIDmode, …);
  …
  RTVEC_ELT (v, N) = gen_rtx_CLOBBER (VOIDmode, …);

  emit_insn (gen_rtx_PARALLEL (VOIDmode, v));

But doing this after the zeroing would give:

  …clobber reg1 in an asm…
  …set reg1 to zero…
  …clobber reg1 in an asm…

Dataflow-wise, the second clobber overwrites the effect of the zeroing.
Since nothing uses reg1 between the zeroing and the clobber, the zeroing
could be removed as dead.

>>  The idea was:
>> 
>> - There would be an asm volatile before the moves that clobbers (but does
>>  not use) (mem:BLK (scratch)) and the zeroed registers.
>> 
>> - EPILOGUE_USES would make the zeroed registers live after the return.
>
> Is EPILOGUE_USES the only way for this purpose? Will add another “asm volatile” immediately before the return serve the same purpose?

Why do you want to use an asm to keep the instructions live though?

As I think I mentioned before (but sorry if I'm misremembering),
using an asm would be counterproductive on delayed-branch targets.
The delayed branch scheduler looks backwards for something that could
fill the delay slot.  If we have an asm after the zeroing instructions
that uses the zeroed registers, that would prevent any zeroing
instruction from filling the delay slot.  The delayed branch scheduler
would therefore try to fill the delay slot with something from before
the zeroing sequence, which is exactly what we'd like to avoid.

Also, using an asm after the sequence would allow a machine_reorg
pass to reuse the zeroed registers for something else between the
second asm and the return.

IMO, marking the zeroed registers as being live out of the function
is the simplest, most direct way of representing the fact that the
zeroing effect has to survive to the function return.  It's how we
make sure that the function return value remains live and how we make
sure that the restored call-preserved registers remain live.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-23 15:21                                                                                                                 ` Richard Sandiford
@ 2020-09-23 16:08                                                                                                                   ` Qing Zhao
  0 siblings, 0 replies; 188+ messages in thread
From: Qing Zhao @ 2020-09-23 16:08 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Segher Boessenkool, Kees Cook, Kees Cook via Gcc-patches,
	Jakub Jelinek, Uros Bizjak, Rodriguez Bahena, Victor



> On Sep 23, 2020, at 10:21 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>> On Sep 23, 2020, at 9:32 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>> On Sep 23, 2020, at 6:05 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>> 
>>>>> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>>>>>> On Sep 22, 2020, at 12:06 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> The following is what I see from i386.md: (I didn’t look at how “UNSPEC_volatile” is used in data flow analysis in GCC yet)
>>>>>>>>>> 
>>>>>>>>>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>>>>>>>>>> ;; all of memory.  This blocks insns from being moved across this point.
>>>>>>>>> 
>>>>>>>>> Heh, it looks like that comment dates back to 1994. :-)
>>>>>>>>> 
>>>>>>>>> The comment is no longer correct though.  I wasn't around at the time,
>>>>>>>>> but I assume the comment was only locally true even then.
>>>>>>>>> 
>>>>>>>>> If what the comment said was true, then something like:
>>>>>>>>> 
>>>>>>>>> (define_insn "cld"
>>>>>>>>> [(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
>>>>>>>>> ""
>>>>>>>>> "cld"
>>>>>>>>> [(set_attr "length" "1")
>>>>>>>>> (set_attr "length_immediate" "0")
>>>>>>>>> (set_attr "modrm" "0")])
>>>>>>>>> 
>>>>>>>>> would invalidate the entire register file and so would require all values
>>>>>>>>> to be spilt to the stack around the CLD.
>>>>>>>> 
>>>>>>>> Okay, thanks for the info. 
>>>>>>>> then, what’s the current definition of UNSPEC_VOLATILE? 
>>>>>>> 
>>>>>>> I'm not sure it's written down anywhere TBH.  rtl.texi just says:
>>>>>>> 
>>>>>>> @code{unspec_volatile} is used for volatile operations and operations
>>>>>>> that may trap; @code{unspec} is used for other operations.
>>>>>>> 
>>>>>>> which seems like a cyclic definition: volatile expressions are defined
>>>>>>> to be expressions that are volatile.
>>>>>>> 
>>>>>>> But IMO the semantics are that unspec_volatile patterns with a given
>>>>>>> set of inputs and outputs act for dataflow purposes like volatile asms
>>>>>>> with the same inputs and outputs.  The semantics of asm volatile are
>>>>>>> at least slightly more well-defined (if only by example); see extend.texi
>>>>>>> for details.  In particular:
>>>>>>> 
>>>>>>> Note that the compiler can move even @code{volatile asm} instructions relative
>>>>>>> to other code, including across jump instructions. For example, on many 
>>>>>>> targets there is a system register that controls the rounding mode of 
>>>>>>> floating-point operations. Setting it with a @code{volatile asm} statement,
>>>>>>> as in the following PowerPC example, does not work reliably.
>>>>>>> 
>>>>>>> @example
>>>>>>> asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>>>>>>> sum = x + y;
>>>>>>> @end example
>>>>>>> 
>>>>>>> The compiler may move the addition back before the @code{volatile asm}
>>>>>>> statement. To make it work as expected, add an artificial dependency to
>>>>>>> the @code{asm} by referencing a variable in the subsequent code, for
>>>>>>> example:
>>>>>>> 
>>>>>>> @example
>>>>>>> asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>>>>>>> sum = x + y;
>>>>>>> @end example
>>>>>>> 
>>>>>>> which is very similar to the unspec_volatile case we're talking about.
>>>>>>> 
>>>>>>> To take an x86 example:
>>>>>>> 
>>>>>>> void
>>>>>>> f (char *x)
>>>>>>> {
>>>>>>> asm volatile ("");
>>>>>>> x[0] = 0;
>>>>>>> asm volatile ("");
>>>>>>> x[1] = 0;
>>>>>>> asm volatile ("");
>>>>>>> }
>>>>>> 
>>>>>> If we change the above as the following: (but it might not correct on the asm format):
>>>>>> 
>>>>>> Void
>>>>>> F (char *x)
>>>>>> {
>>>>>> asm volatile (“x[0]”);
>>>>>> x[0] = 0;
>>>>>> asm volatile (“x[1]"); 
>>>>>> x[1] = 0;
>>>>>> asm volatile ("”);
>>>>>> }
>>>>>> 
>>>>>> Will the moving and merging be blocked?
>>>>> 
>>>>> That would stop assignments moving up, but it wouldn't stop x[0] moving
>>>>> down across the x[1] asm.  Using:
>>>>> 
>>>>> asm volatile ("" ::: "memory");
>>>>> 
>>>>> would prevent moves in both directions, which was what I meant in my
>>>>> later comment about memory clobbers.
>>>>> 
>>>>> In each case, the same would be true for unspec_volatile.
>>>> 
>>>> So, is the following good enough:
>>>> 
>>>> asm volatile (reg1, reg2, … regN, memory)
>>>> mov reg1, 0
>>>> mov reg2, 0
>>>> ...
>>>> mov regN,0
>>>> asm volatile (reg1, reg2,… regN, memory)
>>>> return
>>>> 
>>>> 
>>>> I.e, just add one “asm volatile” insn whose operands include all registers and memory BEFORE and AFTER the whole zeroing sequence.
>>> 
>>> It isn't clear from your syntax whether the asm volatile arguments
>>> are uses or clobbers.
>> 
>> How can the syntax of asm volatile distinguish “Uses” and “Clobbers”? 
> 
> Well, I wasn't trying to discuss correct syntax, I just wasn't sure what
> you meant.
> 
> As mentioned in the quote below, I was expecting the asm volatile
> before the zeroing to include clobbers generated as discussed in
> the earlier message:
> 
>  rtx asm_op = gen_rtx_ASM_OPERANDS (…);
>  MEM_VOLATILE_P (asm_op) = 1;
> 
>  rtvec v = rtvec_alloc (N + 1);
>  RTVEC_ELT (v, 0) = asm_op;
>  RTVEC_ELT (v, 1) = gen_rtx_CLOBBER (VOIDmode, …);
>  …
>  RTVEC_ELT (v, N) = gen_rtx_CLOBBER (VOIDmode, …);
> 
>  emit_insn (gen_rtx_PARALLEL (VOIDmode, v));
> 
> But doing this after the zeroing would give:
> 
>  …clobber reg1 in an asm…
>  …set reg1 to zero…
>  …clobber reg1 in an asm…
> 
> Dataflow-wise, the second clobber overwrites the effect of the zeroing.
> Since nothing uses reg1 between the zeroing and the clobber, the zeroing
> could be removed as dead.

Okay, I see.
Thanks for the explanation.

> 
>>> The idea was:
>>> 
>>> - There would be an asm volatile before the moves that clobbers (but does
>>> not use) (mem:BLK (scratch)) and the zeroed registers.
>>> 
>>> - EPILOGUE_USES would make the zeroed registers live after the return.
>> 
>> Is EPILOGUE_USES the only way for this purpose? Will add another “asm volatile” immediately before the return serve the same purpose?
> 
> Why do you want to use an asm to keep the instructions live though?

Just want to avoid changing of “EPILOGUE_USES” and make the implementation simpler… -:)
But I might be wrong here.

> 
> As I think I mentioned before (but sorry if I'm misremembering),
> using an asm would be counterproductive on delayed-branch targets.
> The delayed branch scheduler looks backwards for something that could
> fill the delay slot.  If we have an asm after the zeroing instructions
> that uses the zeroed registers, that would prevent any zeroing
> instruction from filling the delay slot.  The delayed branch scheduler
> would therefore try to fill the delay slot with something from before
> the zeroing sequence, which is exactly what we'd like to avoid.
> 
> Also, using an asm after the sequence would allow a machine_reorg
> pass to reuse the zeroed registers for something else between the
> second asm and the return.
> 
> IMO, marking the zeroed registers as being live out of the function
> is the simplest, most direct way of representing the fact that the
> zeroing effect has to survive to the function return.  It's how we
> make sure that the function return value remains live and how we make
> sure that the restored call-preserved registers remain live.

Okay, now I understand.

Thanks a lot for your patience. 

Qing
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-23 14:28                                                                                                         ` Qing Zhao
@ 2020-09-23 23:40                                                                                                           ` Segher Boessenkool
  0 siblings, 0 replies; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-23 23:40 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Kees Cook, Kees Cook via Gcc-patches, Jakub Jelinek, Uros Bizjak,
	Rodriguez Bahena, Victor, richard.sandiford

On Wed, Sep 23, 2020 at 09:28:33AM -0500, Qing Zhao wrote:
> > On Sep 22, 2020, at 5:37 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >> which is very similar to the unspec_volatile case we're talking about.
> > 
> > So just like volatile memory accesses, they have an (unknown) side
> > effect, which means they have to execute on the real machine as on the
> > abstract machine (wrt sequence points).  All side effects have to happen
> > exactly as often as proscribed, and in the same order.  Just like
> > volatile asm, too.
> Don’t quite understand the above, what do you mean by “they have to 
> execute on the real machine as on the abstract machine”?

Exactly as described in the C standard.

> > (If you use "#" or "#smth" you can see those in the generated asm --
> > completely empty asm is helpfully (uh...) not printed.)
> 
> Can you explain this in more details?

final.c...  see
            /* Output the insn using them.  */
            if (string[0])
              {
(it doesn't output anything if an asm template is the empty string!)

> > Even then, x[] could be optimised away completely (with whole program
> > optimisation, or something).  The only way to really prevent the
> > compiler from optimising memory accesses is to make it not see the
> > details (with an asm or an unspec, for example).
> You mean with a asm volatile (“” “memory”)?

No, I meant doing the memory access from asm.  The only way to get
exactly the machine instructions you want is to write it in assembler
(inline assembler usually can work, too).

> >> The above is conservatively correct.  But not all passes do it.
> >> E.g. combine does have a similar approach:
> >> 
> >>  /* If INSN contains volatile references (specifically volatile MEMs),
> >>     we cannot combine across any other volatile references.
> > 
> > And this is correct, and the *minimum* to do even (this could change the
> > order of the side effects, depending how combine places the resulting
> > insns in I2 and I3).
> 
> Please clarify what “L2 and L3” are?

I2 and I3.  Combine name the insns it combines I0, I1, I2, and I3, and
writes the new insns it generates to the places of I2 and I3.  (In both
cases all of the lower numbered insns can be omitted, e.g. combine I2,
I3 into a new I3.  That is the general gist; there is some other stuff,
like, erm, "other_insn" :-) .)


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
  2020-09-23 11:05                                                                                                         ` Richard Sandiford
  2020-09-23 14:14                                                                                                           ` Qing Zhao
@ 2020-09-23 23:46                                                                                                           ` Segher Boessenkool
  1 sibling, 0 replies; 188+ messages in thread
From: Segher Boessenkool @ 2020-09-23 23:46 UTC (permalink / raw)
  To: Qing Zhao, Kees Cook, Kees Cook via Gcc-patches, Jakub Jelinek,
	Uros Bizjak, Rodriguez Bahena, Victor, richard.sandiford

Hi!

On Wed, Sep 23, 2020 at 12:05:22PM +0100, Richard Sandiford wrote:
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> > (But I do feel that the design for UNSPEC_volatile is not clean)
> 
> Agreed.  But I think that's partly because what it's trying to achieve
> isn't clean either.  It's a catch-all for “something is happening,
> but we're not saying what”.  And not saying what is itself unclean. ;-)

It shares all those same issues with just unspec, there is nothing that
unspec_volatile adds that is weird like this.  But yes, that is a very
good reason to not use unspecs unless you have to: they hinder
optimisation much, and if that was your actual *goal*, you will often
find that they do not prevent every optimisation you wanted them to.


Segher

^ permalink raw reply	[flat|nested] 188+ messages in thread

end of thread, other threads:[~2020-09-23 23:48 UTC | newest]

Thread overview: 188+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-04 19:01 [PATCH 1/4] matcher-1.m: Change return type to int H.J. Lu
2020-05-04 19:01 ` [PATCH 2/4] x86: Add -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all] H.J. Lu
2020-05-04 23:19   ` Rodriguez Bahena, Victor
2020-05-05  8:14   ` Uros Bizjak
2020-05-05  8:20     ` Richard Biener
2020-07-14 14:45       ` [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all] Qing Zhao
2020-07-16 13:17         ` Victor Rodriguez
2020-07-28 20:05         ` PING " Qing Zhao
2020-07-31 17:57           ` Uros Bizjak
2020-08-03 15:42             ` Qing Zhao
2020-08-04  7:35               ` Richard Biener
2020-08-04 18:23                 ` H.J. Lu
2020-08-05  7:06                   ` Richard Biener
2020-08-05 12:26                     ` H.J. Lu
2020-08-05 12:30                       ` Richard Biener
2020-08-05 12:34                         ` H.J. Lu
2020-08-05 14:45                           ` H.J. Lu
2020-08-05 15:00                             ` Qing Zhao
2020-08-05 18:53                             ` Richard Biener
2020-08-05 19:08                               ` H.J. Lu
2020-08-05 20:22                               ` Qing Zhao
2020-08-06  8:37                                 ` Richard Biener
2020-08-06 15:45                                   ` Qing Zhao
2020-08-06 20:45                                   ` Kees Cook
2020-08-07  6:21                                     ` Richard Biener
2020-08-07 16:15                                       ` Qing Zhao
2020-08-05 21:35                 ` Qing Zhao
2020-08-06  8:31                   ` Richard Biener
2020-08-06  8:41                     ` Jakub Jelinek
2020-08-06  9:31                       ` Uros Bizjak
2020-08-06 14:56                     ` Qing Zhao
2020-08-06 23:37                     ` Segher Boessenkool
2020-08-07 16:06                       ` Qing Zhao
2020-08-07 22:59                         ` Segher Boessenkool
2020-08-10 16:34                           ` Qing Zhao
2020-08-10 19:51                             ` Qing Zhao
2020-08-19 20:05                       ` Qing Zhao
2020-08-19 22:57                         ` Segher Boessenkool
2020-08-19 23:27                           ` Qing Zhao
2020-08-24 14:47                             ` Rodriguez Bahena, Victor
2020-08-24 17:59                               ` Segher Boessenkool
2020-08-24 18:48                                 ` Qing Zhao
2020-08-24 20:26                                   ` Segher Boessenkool
2020-08-24 20:49                                     ` Qing Zhao
2020-09-04 15:18                                       ` Segher Boessenkool
2020-09-04 17:34                                         ` H.J. Lu
2020-09-04 18:09                                           ` Segher Boessenkool
2020-09-04 18:52                                             ` H.J. Lu
2020-09-07 14:06                                               ` Segher Boessenkool
2020-09-07 15:58                                                 ` H.J. Lu
2020-09-08 16:43                                                   ` Qing Zhao
2020-09-10 22:05                                                     ` Segher Boessenkool
2020-09-10 22:50                                                       ` Qing Zhao
2020-09-11 17:18                                                         ` Segher Boessenkool
2020-09-11 19:53                                                           ` Qing Zhao
2020-08-24 17:49                             ` Segher Boessenkool
2020-08-24 18:02                               ` Qing Zhao
2020-08-24 20:20                                 ` Segher Boessenkool
2020-08-24 20:43                                   ` Qing Zhao
2020-08-25  6:41                                     ` Uros Bizjak
2020-08-25 14:05                                       ` Qing Zhao
2020-08-25 22:31                                         ` Qing Zhao
2020-09-04 15:26                                     ` Segher Boessenkool
2020-08-25 21:54                                   ` Qing Zhao
2020-09-03 14:29                                     ` Qing Zhao
2020-09-03 15:08                                       ` Qing Zhao
2020-09-03 16:19                                         ` Qing Zhao
2020-09-03 17:13                                       ` Kees Cook
2020-09-03 17:43                                         ` Qing Zhao
2020-09-04  1:23                                           ` Rodriguez Bahena, Victor
2020-09-04 14:18                                             ` Qing Zhao
2020-09-07 13:06                                               ` Rodriguez Bahena, Victor
2020-09-08 15:00                                                 ` Qing Zhao
2020-09-10 19:07                                                   ` Kees Cook
2020-09-10 22:40                                                     ` Qing Zhao
2020-09-11 10:06                                                     ` Richard Sandiford
2020-09-11 16:14                                                       ` Segher Boessenkool
2020-09-11 16:52                                                         ` Qing Zhao
2020-09-11 17:13                                                           ` Segher Boessenkool
2020-09-11 19:40                                                             ` Qing Zhao
2020-09-11 20:05                                                               ` Segher Boessenkool
2020-09-11 20:17                                                                 ` Qing Zhao
2020-09-11 20:36                                                                   ` Segher Boessenkool
2020-09-11 21:12                                                                     ` Qing Zhao
2020-09-11 17:32                                                           ` Richard Sandiford
2020-09-11 20:01                                                             ` Segher Boessenkool
2020-09-11 20:14                                                             ` Qing Zhao
2020-09-11 21:03                                                               ` Segher Boessenkool
2020-09-11 21:29                                                                 ` Qing Zhao
2020-09-11 21:51                                                                   ` Segher Boessenkool
2020-09-11 22:41                                                                     ` Qing Zhao
2020-09-14 23:09                                                                       ` Segher Boessenkool
2020-09-15  3:07                                                                         ` Qing Zhao
2020-09-15 18:51                                                                           ` Segher Boessenkool
2020-09-11 21:44                                                               ` Richard Sandiford
2020-09-11 22:24                                                                 ` Qing Zhao
2020-09-11 22:56                                                                   ` Richard Sandiford
2020-09-14 14:56                                                                     ` Qing Zhao
2020-09-14 16:33                                                                       ` Richard Sandiford
2020-09-14 18:50                                                                         ` Qing Zhao
2020-09-14 19:20                                                                           ` Richard Sandiford
2020-09-14 20:24                                                                             ` Qing Zhao
2020-09-15  9:11                                                                               ` Richard Sandiford
2020-09-15 15:05                                                                                 ` Qing Zhao
2020-09-15 19:41                                                                                 ` Segher Boessenkool
2020-09-15 22:31                                                                                   ` Qing Zhao
2020-09-15 23:09                                                                                     ` Segher Boessenkool
2020-09-16  1:51                                                                                       ` Qing Zhao
2020-09-16 10:35                                                                                         ` Segher Boessenkool
2020-09-16 20:57                                                                                           ` Qing Zhao
2020-09-17  6:17                                                                                             ` Richard Sandiford
2020-09-17 14:40                                                                                               ` Qing Zhao
2020-09-17 16:27                                                                                                 ` Richard Sandiford
2020-09-17 19:07                                                                                                   ` Qing Zhao
2020-09-22 17:06                                                                                                     ` Richard Sandiford
2020-09-22 21:32                                                                                                       ` Qing Zhao
2020-09-23 11:05                                                                                                         ` Richard Sandiford
2020-09-23 14:14                                                                                                           ` Qing Zhao
2020-09-23 14:32                                                                                                             ` Richard Sandiford
2020-09-23 14:48                                                                                                               ` Qing Zhao
2020-09-23 15:21                                                                                                                 ` Richard Sandiford
2020-09-23 16:08                                                                                                                   ` Qing Zhao
2020-09-23 23:46                                                                                                           ` Segher Boessenkool
2020-09-22 22:37                                                                                                       ` Segher Boessenkool
2020-09-23 14:28                                                                                                         ` Qing Zhao
2020-09-23 23:40                                                                                                           ` Segher Boessenkool
2020-09-17 22:26                                                                                                   ` Segher Boessenkool
2020-09-14 23:35                                                                         ` Segher Boessenkool
2020-09-15 11:46                                                                           ` Richard Sandiford
2020-09-15 19:22                                                                             ` Segher Boessenkool
2020-09-14 23:20                                                                   ` Segher Boessenkool
2020-09-18 20:31                                                                 ` Qing Zhao
2020-09-18 22:51                                                                   ` Segher Boessenkool
2020-09-21 14:13                                                                     ` Qing Zhao
2020-09-21 20:34                                                                       ` Segher Boessenkool
2020-09-21 20:58                                                                         ` Qing Zhao
2020-09-22  0:25                                                                           ` Segher Boessenkool
2020-09-21  7:23                                                                   ` Richard Sandiford
2020-09-21 14:29                                                                     ` Qing Zhao
2020-09-21 15:35                                                                       ` Richard Sandiford
2020-09-21 16:34                                                                         ` Qing Zhao
2020-09-21 19:11                                                                           ` Richard Sandiford
2020-09-21 19:22                                                                             ` Qing Zhao
2020-09-21 20:05                                                                               ` Qing Zhao
2020-09-22 16:31                                                                                 ` Richard Sandiford
2020-09-22 18:25                                                                                   ` Qing Zhao
2020-09-22 18:35                                                                                     ` H.J. Lu
2020-09-22 19:34                                                                                       ` Qing Zhao
2020-09-23 10:43                                                                                         ` Richard Sandiford
2020-09-23 13:54                                                                                           ` Qing Zhao
2020-09-23 14:22                                                                                             ` Richard Sandiford
2020-09-23 14:35                                                                                               ` Qing Zhao
2020-09-23 14:40                                                                                                 ` Richard Sandiford
2020-09-23 14:49                                                                                                   ` Qing Zhao
2020-09-07 14:44                                             ` Segher Boessenkool
2020-09-08 15:05                                               ` Patrick McGehearty
2020-09-10 12:11                                                 ` Richard Sandiford
2020-09-10 14:34                                                   ` Qing Zhao
2020-09-10 14:59                                                     ` Rodriguez Bahena, Victor
2020-09-03 17:48                                         ` Ramana Radhakrishnan
2020-09-03 19:20                                           ` Qing Zhao
2020-09-04 15:43                                         ` Segher Boessenkool
2020-09-04 17:18                                           ` Qing Zhao
2020-09-04 18:04                                             ` Segher Boessenkool
2020-09-04 19:00                                               ` Qing Zhao
2020-09-07 14:36                                                 ` Segher Boessenkool
2020-09-08 14:55                                                   ` Qing Zhao
2020-09-10 21:56                                                     ` Segher Boessenkool
2020-08-24 14:36                           ` Rodriguez Bahena, Victor
2020-08-06 22:32                   ` Qing Zhao
2020-08-07 13:20           ` Alexandre Oliva
2020-08-07 17:04             ` Qing Zhao
2020-08-11  2:39               ` Alexandre Oliva
2020-08-11  5:57                 ` Kees Cook
2020-08-11 17:30                 ` Qing Zhao
2020-08-24 10:50                   ` Richard Biener
2020-08-24 14:48                     ` Qing Zhao
2020-08-25  5:16                     ` Alexandre Oliva
2020-08-25 14:19                       ` Jeff Law
2020-08-26 12:02                         ` Alexandre Oliva
2020-08-26 17:58                           ` Qing Zhao
2020-08-28  7:47                             ` Alexandre Oliva
2020-08-28 15:21                               ` Qing Zhao
2020-08-28 15:33                                 ` H.J. Lu
2020-08-26 18:36                           ` Jeff Law
2020-05-04 19:01 ` [PATCH 3/4] x86: Add ix86_any_return_p H.J. Lu
2020-05-04 19:01 ` [PATCH 4/4] Update gcc.target/i386/ret-thunk-2[234].c H.J. Lu
2020-05-05 16:29 ` [PATCH 1/4] matcher-1.m: Change return type to int Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).