public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] x86: Don't save callee-saved registers if not needed
@ 2024-01-22 15:45 H.J. Lu
  2024-01-22 15:45 ` [PATCH v2 1/2] x86: Add no_callee_saved_registers function attribute H.J. Lu
  2024-01-22 15:45 ` [PATCH v2 2/2] x86: Don't save callee-saved registers in noreturn functions H.J. Lu
  0 siblings, 2 replies; 5+ messages in thread
From: H.J. Lu @ 2024-01-22 15:45 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu, jh

Changes in v2:

1. Rebase against commit f9df00340e3
2. Don't add redundant clobbered_registers check in ix86_expand_call.

In some cases, there are no need to save callee-saved registers:

1. If a noreturn function doesn't throw nor support exceptions, it can
skip saving callee-saved registers.

2. When an interrupt handler is implemented by an assembly stub which does:

  1. Save all registers.
  2. Call a C function.
  3. Restore all registers.
  4. Return from interrupt.

it is completely unnecessary to save and restore any registers in the C
function called by the assembly stub, even if they would normally be
callee-saved.

This patch set adds no_callee_saved_registers function attribute, which
is complementary to no_caller_saved_registers function attribute, to
classify x86 backend call-saved register handling type with

  1. Default call-saved registers.
  2. No caller-saved registers with no_caller_saved_registers attribute.
  3. No callee-saved registers with no_callee_saved_registers attribute.

Functions of no callee-saved registers won't save callee-saved registers.
If a noreturn function doesn't throw nor support exceptions, it is
classified as the no callee-saved registers type.

With these changes, __libc_start_main in glibc 2.39, which is a noreturn
function, is changed from

__libc_start_main:
	endbr64
	push   %r15
	push   %r14
	mov    %rcx,%r14
	push   %r13
	push   %r12
	push   %rbp
	mov    %esi,%ebp
	push   %rbx
	mov    %rdx,%rbx
	sub    $0x28,%rsp
	mov    %rdi,(%rsp)
	mov    %fs:0x28,%rax
	mov    %rax,0x18(%rsp)
	xor    %eax,%eax
	test   %r9,%r9

to

__libc_start_main:
	endbr64
        sub    $0x28,%rsp
        mov    %esi,%ebp
        mov    %rdx,%rbx
        mov    %rcx,%r14
        mov    %rdi,(%rsp)
        mov    %fs:0x28,%rax
        mov    %rax,0x18(%rsp)
        xor    %eax,%eax
        test   %r9,%r9

In Linux kernel 6.7.0 on x86-64, do_exit is changed from

do_exit:
        endbr64
        call   <do_exit+0x9>
        push   %r15
        push   %r14
        push   %r13
        push   %r12
        mov    %rdi,%r12
        push   %rbp
        push   %rbx
        mov    %gs:0x0,%rbx
        sub    $0x28,%rsp
        mov    %gs:0x28,%rax
        mov    %rax,0x20(%rsp)
        xor    %eax,%eax
        call   *0x0(%rip)        # <do_exit+0x39>
        test   $0x2,%ah
        je     <do_exit+0x8d3>

to

do_exit:
        endbr64
        call   <do_exit+0x9>
        sub    $0x28,%rsp
        mov    %rdi,%r12
        mov    %gs:0x28,%rax
        mov    %rax,0x20(%rsp)
        xor    %eax,%eax
        mov    %gs:0x0,%rbx
        call   *0x0(%rip)        # <do_exit+0x2f>
        test   $0x2,%ah
        je     <do_exit+0x8c9>

I compared GCC master branch bootstrap and test times on a slow machine
with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13
with the backported patch.  The performance data isn't precise since the
measurements were done on different days with different GCC sources under
different 6.6 kernel versions.

GCC master branch build time in seconds:

before                after                  improvement
30043.75user          30013.16user           0%
1274.85system         1243.72system          2.4%

GCC master branch test time in seconds (new tests added):

before                after                  improvement
216035.90user         216547.51user          0
27365.51system        26658.54system         2.6%

Backported to GCC 13 to rebuild system glibc and kernel on Fedora 39.
Systems perform normally.

H.J. Lu (2):
  x86: Add no_callee_saved_registers function attribute
  x86: Don't save callee-saved registers in noreturn functions

 gcc/config/i386/i386-expand.cc                | 58 +++++++++++++--
 gcc/config/i386/i386-options.cc               | 61 ++++++++++++----
 gcc/config/i386/i386.cc                       | 70 +++++++++++++++----
 gcc/config/i386/i386.h                        | 20 +++++-
 gcc/doc/extend.texi                           |  8 +++
 .../gcc.dg/torture/no-callee-saved-run-1a.c   | 23 ++++++
 .../gcc.dg/torture/no-callee-saved-run-1b.c   | 59 ++++++++++++++++
 .../gcc.target/i386/no-callee-saved-1.c       | 30 ++++++++
 .../gcc.target/i386/no-callee-saved-10.c      | 46 ++++++++++++
 .../gcc.target/i386/no-callee-saved-11.c      | 11 +++
 .../gcc.target/i386/no-callee-saved-12.c      | 10 +++
 .../gcc.target/i386/no-callee-saved-13.c      | 16 +++++
 .../gcc.target/i386/no-callee-saved-14.c      | 16 +++++
 .../gcc.target/i386/no-callee-saved-15.c      | 17 +++++
 .../gcc.target/i386/no-callee-saved-16.c      | 16 +++++
 .../gcc.target/i386/no-callee-saved-17.c      | 16 +++++
 .../gcc.target/i386/no-callee-saved-18.c      | 51 ++++++++++++++
 .../gcc.target/i386/no-callee-saved-2.c       | 30 ++++++++
 .../gcc.target/i386/no-callee-saved-3.c       |  8 +++
 .../gcc.target/i386/no-callee-saved-4.c       |  8 +++
 .../gcc.target/i386/no-callee-saved-5.c       | 11 +++
 .../gcc.target/i386/no-callee-saved-6.c       | 12 ++++
 .../gcc.target/i386/no-callee-saved-7.c       | 49 +++++++++++++
 .../gcc.target/i386/no-callee-saved-8.c       | 50 +++++++++++++
 .../gcc.target/i386/no-callee-saved-9.c       | 49 +++++++++++++
 gcc/testsuite/gcc.target/i386/pr38534-1.c     | 26 +++++++
 gcc/testsuite/gcc.target/i386/pr38534-2.c     | 18 +++++
 gcc/testsuite/gcc.target/i386/pr38534-3.c     | 19 +++++
 gcc/testsuite/gcc.target/i386/pr38534-4.c     | 18 +++++
 .../gcc.target/i386/stack-check-17.c          | 19 ++---
 30 files changed, 797 insertions(+), 48 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/no-callee-saved-run-1a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/no-callee-saved-run-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-14.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-9.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-4.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/2] x86: Add no_callee_saved_registers function attribute
  2024-01-22 15:45 [PATCH v2 0/2] x86: Don't save callee-saved registers if not needed H.J. Lu
@ 2024-01-22 15:45 ` H.J. Lu
  2024-01-22 15:45 ` [PATCH v2 2/2] x86: Don't save callee-saved registers in noreturn functions H.J. Lu
  1 sibling, 0 replies; 5+ messages in thread
From: H.J. Lu @ 2024-01-22 15:45 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu, jh

When an interrupt handler is implemented by an assembly stub which does:

1. Save all registers.
2. Call a C function.
3. Restore all registers.
4. Return from interrupt.

it is completely unnecessary to save and restore any registers in the C
function called by the assembly stub, even if they would normally be
callee-saved.

Add no_callee_saved_registers function attribute, which is complementary
to no_caller_saved_registers function attribute, to mark a function which
doesn't have any callee-saved registers.  Such a function won't save and
restore any registers.  Classify function call-saved register handling
type with:

1. Default call-saved registers.
2. No caller-saved registers with no_caller_saved_registers attribute.
3. No callee-saved registers with no_callee_saved_registers attribute.

Disallow sibcall if callee is a no_callee_saved_registers function
and caller isn't a no_callee_saved_registers function.  Otherwise,
callee-saved registers won't be preserved.

After a no_callee_saved_registers function is called, all registers may
be clobbered.  If the calling function isn't a no_callee_saved_registers
function, we need to preserve all registers which aren't used by function
calls.

gcc/

	PR target/103503
	PR target/113312
	* config/i386/i386-expand.cc (ix86_expand_call): Set
	call_no_callee_saved_registers to true when calling function
	with no_callee_saved_registers attribute.  Replace
	no_caller_saved_registers check with call_saved_registers check.
	Clobber all registers that are not used by the callee with
	no_callee_saved_registers attribute.
	* config/i386/i386-options.cc (ix86_set_func_type): Set
	call_saved_registers to TYPE_NO_CALLEE_SAVED_REGISTERS for
	noreturn function.  Disallow no_callee_saved_registers with
	interrupt or no_caller_saved_registers attributes together.
	(ix86_set_current_function): Replace no_caller_saved_registers
	check with call_saved_registers check.
	(ix86_handle_no_caller_saved_registers_attribute): Renamed to ...
	(ix86_handle_call_saved_registers_attribute): This.
	(ix86_gnu_attributes): Add
	ix86_handle_call_saved_registers_attribute.
	* config/i386/i386.cc (ix86_conditional_register_usage): Replace
	no_caller_saved_registers check with call_saved_registers check.
	(ix86_function_ok_for_sibcall): Don't allow callee with
	no_callee_saved_registers attribute when the calling function
	has callee-saved registers.
	(ix86_comp_type_attributes): Also check
	no_callee_saved_registers.
	(ix86_epilogue_uses): Replace no_caller_saved_registers check
	with call_saved_registers check.
	(ix86_hard_regno_scratch_ok): Likewise.
	(ix86_save_reg): Replace no_caller_saved_registers check with
	call_saved_registers check.  Don't save any registers for
	TYPE_NO_CALLEE_SAVED_REGISTERS.  Save all registers with
	TYPE_DEFAULT_CALL_SAVED_REGISTERS if function with
	no_callee_saved_registers attribute is called.
	(find_drap_reg): Replace no_caller_saved_registers check with
	call_saved_registers check.
	* config/i386/i386.h (call_saved_registers_type): New enum.
	(machine_function): Replace no_caller_saved_registers with
	call_saved_registers.  Add call_no_callee_saved_registers.
	* doc/extend.texi: Document no_callee_saved_registers attribute.

gcc/testsuite/

	PR target/103503
	PR target/113312
	* gcc.dg/torture/no-callee-saved-run-1a.c: New file.
	* gcc.dg/torture/no-callee-saved-run-1b.c: Likewise.
	* gcc.target/i386/no-callee-saved-1.c: Likewise.
	* gcc.target/i386/no-callee-saved-2.c: Likewise.
	* gcc.target/i386/no-callee-saved-3.c: Likewise.
	* gcc.target/i386/no-callee-saved-4.c: Likewise.
	* gcc.target/i386/no-callee-saved-5.c: Likewise.
	* gcc.target/i386/no-callee-saved-6.c: Likewise.
	* gcc.target/i386/no-callee-saved-7.c: Likewise.
	* gcc.target/i386/no-callee-saved-8.c: Likewise.
	* gcc.target/i386/no-callee-saved-9.c: Likewise.
	* gcc.target/i386/no-callee-saved-10.c: Likewise.
	* gcc.target/i386/no-callee-saved-11.c: Likewise.
	* gcc.target/i386/no-callee-saved-12.c: Likewise.
	* gcc.target/i386/no-callee-saved-13.c: Likewise.
	* gcc.target/i386/no-callee-saved-14.c: Likewise.
	* gcc.target/i386/no-callee-saved-15.c: Likewise.
	* gcc.target/i386/no-callee-saved-16.c: Likewise.
	* gcc.target/i386/no-callee-saved-17.c: Likewise.
	* gcc.target/i386/no-callee-saved-18.c: Likewise.
---
 gcc/config/i386/i386-expand.cc                | 58 +++++++++++++--
 gcc/config/i386/i386-options.cc               | 49 +++++++++----
 gcc/config/i386/i386.cc                       | 70 +++++++++++++++----
 gcc/config/i386/i386.h                        | 20 +++++-
 gcc/doc/extend.texi                           |  8 +++
 .../gcc.dg/torture/no-callee-saved-run-1a.c   | 23 ++++++
 .../gcc.dg/torture/no-callee-saved-run-1b.c   | 59 ++++++++++++++++
 .../gcc.target/i386/no-callee-saved-1.c       | 30 ++++++++
 .../gcc.target/i386/no-callee-saved-10.c      | 46 ++++++++++++
 .../gcc.target/i386/no-callee-saved-11.c      | 11 +++
 .../gcc.target/i386/no-callee-saved-12.c      | 10 +++
 .../gcc.target/i386/no-callee-saved-13.c      | 16 +++++
 .../gcc.target/i386/no-callee-saved-14.c      | 16 +++++
 .../gcc.target/i386/no-callee-saved-15.c      | 17 +++++
 .../gcc.target/i386/no-callee-saved-16.c      | 16 +++++
 .../gcc.target/i386/no-callee-saved-17.c      | 16 +++++
 .../gcc.target/i386/no-callee-saved-18.c      | 51 ++++++++++++++
 .../gcc.target/i386/no-callee-saved-2.c       | 30 ++++++++
 .../gcc.target/i386/no-callee-saved-3.c       |  8 +++
 .../gcc.target/i386/no-callee-saved-4.c       |  8 +++
 .../gcc.target/i386/no-callee-saved-5.c       | 11 +++
 .../gcc.target/i386/no-callee-saved-6.c       | 12 ++++
 .../gcc.target/i386/no-callee-saved-7.c       | 49 +++++++++++++
 .../gcc.target/i386/no-callee-saved-8.c       | 50 +++++++++++++
 .../gcc.target/i386/no-callee-saved-9.c       | 49 +++++++++++++
 25 files changed, 697 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/no-callee-saved-run-1a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/no-callee-saved-run-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-14.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-9.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 52754e114f4..6c8c473c55b 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -9739,17 +9739,41 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
   rtx use = NULL, call;
   unsigned int vec_len = 0;
   tree fndecl;
+  bool call_no_callee_saved_registers = false;
 
   if (GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF)
     {
       fndecl = SYMBOL_REF_DECL (XEXP (fnaddr, 0));
-      if (fndecl
-	  && (lookup_attribute ("interrupt",
-				TYPE_ATTRIBUTES (TREE_TYPE (fndecl)))))
-	error ("interrupt service routine cannot be called directly");
+      if (fndecl)
+	{
+	  if (lookup_attribute ("interrupt",
+				TYPE_ATTRIBUTES (TREE_TYPE (fndecl))))
+	    error ("interrupt service routine cannot be called directly");
+	  else if (lookup_attribute ("no_callee_saved_registers",
+				     TYPE_ATTRIBUTES (TREE_TYPE (fndecl))))
+	    {
+	      cfun->machine->call_no_callee_saved_registers = true;
+	      call_no_callee_saved_registers = true;
+	    }
+	}
     }
   else
-    fndecl = NULL_TREE;
+    {
+      if (MEM_P (fnaddr))
+	{
+	  tree mem_expr = MEM_EXPR (fnaddr);
+	  if (mem_expr != nullptr
+	      && TREE_CODE (mem_expr) == MEM_REF
+	      && lookup_attribute ("no_callee_saved_registers",
+				   TYPE_ATTRIBUTES (TREE_TYPE (mem_expr))))
+	    {
+	      cfun->machine->call_no_callee_saved_registers = true;
+	      call_no_callee_saved_registers = true;
+	    }
+	}
+
+      fndecl = NULL_TREE;
+    }
 
   if (pop == const0_rtx)
     pop = NULL;
@@ -9884,13 +9908,15 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
       vec[vec_len++] = pop;
     }
 
-  if (cfun->machine->no_caller_saved_registers
+  static const char ix86_call_used_regs[] = CALL_USED_REGISTERS;
+
+  if ((cfun->machine->call_saved_registers
+       == TYPE_NO_CALLER_SAVED_REGISTERS)
       && (!fndecl
 	  || (!TREE_THIS_VOLATILE (fndecl)
 	      && !lookup_attribute ("no_caller_saved_registers",
 				    TYPE_ATTRIBUTES (TREE_TYPE (fndecl))))))
     {
-      static const char ix86_call_used_regs[] = CALL_USED_REGISTERS;
       bool is_64bit_ms_abi = (TARGET_64BIT
 			      && ix86_function_abi (fndecl) == MS_ABI);
       char c_mask = CALL_USED_REGISTERS_MASK (is_64bit_ms_abi);
@@ -9955,6 +9981,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
       clobber_reg (&use, gen_rtx_REG (DImode, R10_REG));
     }
 
+  if (call_no_callee_saved_registers)
+    {
+      /* After calling a no_callee_saved_registers function, all
+	 registers may be clobbered.  Clobber all registers that are
+	 not used by the callee.  */
+      bool is_64bit_ms_abi = (TARGET_64BIT
+			      && ix86_function_abi (fndecl) == MS_ABI);
+      char c_mask = CALL_USED_REGISTERS_MASK (is_64bit_ms_abi);
+      for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+	if (!fixed_regs[i]
+	    && !(ix86_call_used_regs[i] == 1
+		 || (ix86_call_used_regs[i] & c_mask))
+	    && !STACK_REGNO_P (i)
+	    && !MMX_REGNO_P (i))
+	  clobber_reg (&use,
+		       gen_rtx_REG (GET_MODE (regno_reg_rtx[i]), i));
+    }
+
   if (vec_len > 1)
     call = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (vec_len, vec));
   rtx_insn *call_insn = emit_call_insn (call);
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index b6f634e9a32..0cdea30599e 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -3371,6 +3371,10 @@ ix86_simd_clone_adjust (struct cgraph_node *node)
 static void
 ix86_set_func_type (tree fndecl)
 {
+  bool has_no_callee_saved_registers
+    = lookup_attribute ("no_callee_saved_registers",
+			TYPE_ATTRIBUTES (TREE_TYPE (fndecl)));
+
   if (cfun->machine->func_type == TYPE_UNKNOWN)
     {
       if (lookup_attribute ("interrupt",
@@ -3380,12 +3384,18 @@ ix86_set_func_type (tree fndecl)
 	    error_at (DECL_SOURCE_LOCATION (fndecl),
 		      "interrupt and naked attributes are not compatible");
 
+	  if (has_no_callee_saved_registers)
+	    error_at (DECL_SOURCE_LOCATION (fndecl),
+		      "%qs and %qs attributes are not compatible",
+		      "interrupt", "no_callee_saved_registers");
+
 	  int nargs = 0;
 	  for (tree arg = DECL_ARGUMENTS (fndecl);
 	       arg;
 	       arg = TREE_CHAIN (arg))
 	    nargs++;
-	  cfun->machine->no_caller_saved_registers = true;
+	  cfun->machine->call_saved_registers
+	    = TYPE_NO_CALLER_SAVED_REGISTERS;
 	  cfun->machine->func_type
 	    = nargs == 2 ? TYPE_EXCEPTION : TYPE_INTERRUPT;
 
@@ -3401,7 +3411,19 @@ ix86_set_func_type (tree fndecl)
 	  cfun->machine->func_type = TYPE_NORMAL;
 	  if (lookup_attribute ("no_caller_saved_registers",
 				TYPE_ATTRIBUTES (TREE_TYPE (fndecl))))
-	    cfun->machine->no_caller_saved_registers = true;
+	    cfun->machine->call_saved_registers
+	      = TYPE_NO_CALLER_SAVED_REGISTERS;
+	  if (has_no_callee_saved_registers)
+	    {
+	      if (cfun->machine->call_saved_registers
+		  == TYPE_NO_CALLER_SAVED_REGISTERS)
+		error_at (DECL_SOURCE_LOCATION (fndecl),
+			  "%qs and %qs attributes are not compatible",
+			  "no_caller_saved_registers",
+			  "no_callee_saved_registers");
+	      cfun->machine->call_saved_registers
+		= TYPE_NO_CALLEE_SAVED_REGISTERS;
+	    }
 	}
     }
 }
@@ -3571,7 +3593,7 @@ ix86_set_current_function (tree fndecl)
     }
   ix86_previous_fndecl = fndecl;
 
-  static bool prev_no_caller_saved_registers;
+  static call_saved_registers_type prev_call_saved_registers;
 
   /* 64-bit MS and SYSV ABI have different set of call used registers.
      Avoid expensive re-initialization of init_regs each time we switch
@@ -3582,12 +3604,13 @@ ix86_set_current_function (tree fndecl)
     reinit_regs ();
   /* Need to re-initialize init_regs if caller-saved registers are
      changed.  */
-  else if (prev_no_caller_saved_registers
-	   != cfun->machine->no_caller_saved_registers)
+  else if (prev_call_saved_registers
+	   != cfun->machine->call_saved_registers)
     reinit_regs ();
 
   if (cfun->machine->func_type != TYPE_NORMAL
-      || cfun->machine->no_caller_saved_registers)
+      || (cfun->machine->call_saved_registers
+	  == TYPE_NO_CALLER_SAVED_REGISTERS))
     {
       /* Don't allow SSE, MMX nor x87 instructions since they
 	 may change processor state.  */
@@ -3614,12 +3637,12 @@ ix86_set_current_function (tree fndecl)
 		   "the %<no_caller_saved_registers%> attribute", isa);
 	  /* Don't issue the same error twice.  */
 	  cfun->machine->func_type = TYPE_NORMAL;
-	  cfun->machine->no_caller_saved_registers = false;
+	  cfun->machine->call_saved_registers
+	    = TYPE_DEFAULT_CALL_SAVED_REGISTERS;
 	}
     }
 
-  prev_no_caller_saved_registers
-    = cfun->machine->no_caller_saved_registers;
+  prev_call_saved_registers = cfun->machine->call_saved_registers;
 }
 
 /* Implement the TARGET_OFFLOAD_OPTIONS hook.  */
@@ -4018,8 +4041,8 @@ ix86_handle_fndecl_attribute (tree *node, tree name, tree args, int,
 }
 
 static tree
-ix86_handle_no_caller_saved_registers_attribute (tree *, tree, tree,
-						 int, bool *)
+ix86_handle_call_saved_registers_attribute (tree *, tree, tree,
+					    int, bool *)
 {
   return NULL_TREE;
 }
@@ -4181,7 +4204,9 @@ static const attribute_spec ix86_gnu_attributes[] =
   { "interrupt", 0, 0, false, true, true, false,
     ix86_handle_interrupt_attribute, NULL },
   { "no_caller_saved_registers", 0, 0, false, true, true, false,
-    ix86_handle_no_caller_saved_registers_attribute, NULL },
+    ix86_handle_call_saved_registers_attribute, NULL },
+  { "no_callee_saved_registers", 0, 0, false, true, true, true,
+    ix86_handle_call_saved_registers_attribute, NULL },
   { "naked", 0, 0, true, false, false, false,
     ix86_handle_fndecl_attribute, NULL },
   { "indirect_branch", 1, 1, true, false, false, false,
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index c5eaeedc7e0..f10e745fb40 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -475,7 +475,9 @@ ix86_conditional_register_usage (void)
      except fixed_regs and registers used for function return value
      since aggregate_value_p checks call_used_regs[regno] on return
      value.  */
-  if (cfun && cfun->machine->no_caller_saved_registers)
+  if (cfun
+      && (cfun->machine->call_saved_registers
+	  == TYPE_NO_CALLER_SAVED_REGISTERS))
     for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
       if (!fixed_regs[i] && !ix86_function_value_regno_p (i))
 	call_used_regs[i] = 0;
@@ -944,7 +946,8 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
 
   /* Sibling call isn't OK if there are no caller-saved registers
      since all registers must be preserved before return.  */
-  if (cfun->machine->no_caller_saved_registers)
+  if (cfun->machine->call_saved_registers
+      == TYPE_NO_CALLER_SAVED_REGISTERS)
     return false;
 
   /* If we are generating position-independent code, we cannot sibcall
@@ -977,6 +980,14 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
       decl_or_type = type;
     }
 
+  /* Sibling call isn't OK if callee has no callee-saved registers
+     and the calling function has callee-saved registers.  */
+  if ((cfun->machine->call_saved_registers
+       != TYPE_NO_CALLEE_SAVED_REGISTERS)
+      && lookup_attribute ("no_callee_saved_registers",
+			   TYPE_ATTRIBUTES (type)))
+    return false;
+
   /* If outgoing reg parm stack space changes, we cannot do sibcall.  */
   if ((OUTGOING_REG_PARM_STACK_SPACE (type)
        != OUTGOING_REG_PARM_STACK_SPACE (TREE_TYPE (current_function_decl)))
@@ -1139,6 +1150,12 @@ ix86_comp_type_attributes (const_tree type1, const_tree type2)
       != ix86_function_regparm (type2, NULL))
     return 0;
 
+  if (lookup_attribute ("no_callee_saved_registers",
+			TYPE_ATTRIBUTES (type1))
+      != lookup_attribute ("no_callee_saved_registers",
+			   TYPE_ATTRIBUTES (type2)))
+    return 0;
+
   return 1;
 }
 \f
@@ -6569,7 +6586,8 @@ ix86_epilogue_uses (int regno)
      and restoring registers.  Don't explicitly save SP register since
      it is always preserved.  */
   return (epilogue_completed
-	  && cfun->machine->no_caller_saved_registers
+	  && (cfun->machine->call_saved_registers
+	      == TYPE_NO_CALLER_SAVED_REGISTERS)
 	  && !fixed_regs[regno]
 	  && !STACK_REGNO_P (regno)
 	  && !MMX_REGNO_P (regno));
@@ -6585,7 +6603,8 @@ ix86_hard_regno_scratch_ok (unsigned int regno)
      as a scratch register after epilogue and use REGNO as scratch
      register only if it has been used before to avoid saving and
      restoring it.  */
-  return (!cfun->machine->no_caller_saved_registers
+  return ((cfun->machine->call_saved_registers
+	   != TYPE_NO_CALLER_SAVED_REGISTERS)
 	  || (!epilogue_completed
 	      && df_regs_ever_live_p (regno)));
 }
@@ -6595,14 +6614,32 @@ ix86_hard_regno_scratch_ok (unsigned int regno)
 bool
 ix86_save_reg (unsigned int regno, bool maybe_eh_return, bool ignore_outlined)
 {
-  /* If there are no caller-saved registers, we preserve all registers,
-     except for MMX and x87 registers which aren't supported when saving
-     and restoring registers.  Don't explicitly save SP register since
-     it is always preserved.  */
-  if (cfun->machine->no_caller_saved_registers)
-    {
-      /* Don't preserve registers used for function return value.  */
-      rtx reg = crtl->return_rtx;
+  rtx reg;
+
+  switch (cfun->machine->call_saved_registers)
+    {
+    case TYPE_DEFAULT_CALL_SAVED_REGISTERS:
+      /* If any no_callee_saved_registers functions are called and this
+	 is not a no_callee_saved_registers function, we preserve all
+	 registers which aren't used by function calls, except for MMX
+	 and x87 registers which aren't supported when saving and
+	 restoring registers.  Don't explicitly save SP register since
+	 it is always preserved.  */
+      if (cfun->machine->call_no_callee_saved_registers)
+	return (!fixed_regs[regno]
+		&& !call_used_regs[regno]
+		&& !STACK_REGNO_P (regno)
+		&& !MMX_REGNO_P (regno));
+      break;
+
+    case TYPE_NO_CALLER_SAVED_REGISTERS:
+      /* If there are no caller-saved registers, we preserve all
+	 registers, except for MMX and x87 registers which aren't
+	 supported when saving and restoring registers.  Don't
+	 explicitly save SP register since it is always preserved.
+
+	 Don't preserve registers used for function return value.  */
+      reg = crtl->return_rtx;
       if (reg)
 	{
 	  unsigned int i = REGNO (reg);
@@ -6618,6 +6655,9 @@ ix86_save_reg (unsigned int regno, bool maybe_eh_return, bool ignore_outlined)
 	      && !MMX_REGNO_P (regno)
 	      && (regno != HARD_FRAME_POINTER_REGNUM
 		  || !frame_pointer_needed));
+
+    case TYPE_NO_CALLEE_SAVED_REGISTERS:
+      return false;
     }
 
   if (regno == REAL_PIC_OFFSET_TABLE_REGNUM
@@ -7717,7 +7757,8 @@ find_drap_reg (void)
 	 registers in epilogue, DRAP must not use caller-saved
 	 register in such case.  */
       if (DECL_STATIC_CHAIN (decl)
-	  || cfun->machine->no_caller_saved_registers
+	  || (cfun->machine->call_saved_registers
+	      == TYPE_NO_CALLER_SAVED_REGISTERS)
 	  || crtl->tail_call_emit)
 	return R13_REG;
 
@@ -7730,7 +7771,8 @@ find_drap_reg (void)
 	 registers in epilogue, DRAP must not use caller-saved
 	 register in such case.  */
       if (DECL_STATIC_CHAIN (decl)
-	  || cfun->machine->no_caller_saved_registers
+	  || (cfun->machine->call_saved_registers
+	      == TYPE_NO_CALLER_SAVED_REGISTERS)
 	  || crtl->tail_call_emit
 	  || crtl->calls_eh_return)
 	return DI_REG;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index b9c574e62e1..bf1ca6014f5 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2724,6 +2724,17 @@ enum function_type
   TYPE_EXCEPTION
 };
 
+enum call_saved_registers_type
+{
+  TYPE_DEFAULT_CALL_SAVED_REGISTERS = 0,
+  /* The current function is a function specified with the "interrupt"
+     or "no_caller_saved_registers" attribute.  */
+  TYPE_NO_CALLER_SAVED_REGISTERS,
+  /* The current function is a function specified with the "noreturn"
+     or "no_callee_saved_registers" attribute.  */
+  TYPE_NO_CALLEE_SAVED_REGISTERS
+};
+
 enum queued_insn_type
 {
   TYPE_NONE = 0,
@@ -2793,9 +2804,12 @@ struct GTY(()) machine_function {
   /* How to generate function return.  */
   ENUM_BITFIELD(indirect_branch) function_return_type : 3;
 
-  /* If true, the current function is a function specified with
-     the "interrupt" or "no_caller_saved_registers" attribute.  */
-  BOOL_BITFIELD no_caller_saved_registers : 1;
+  /* Call saved registers type.  */
+  ENUM_BITFIELD(call_saved_registers_type) call_saved_registers : 2;
+
+  /* If true, the current function calls no_callee_saved_registers
+     functions.  */
+  BOOL_BITFIELD call_no_callee_saved_registers : 1;
 
   /* If true, there is register available for argument passing.  This
      is used only in ix86_function_ok_for_sibcall by 32-bit to determine
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0bc586d120e..4cafa6d416b 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -6767,6 +6767,14 @@ On x86-32 targets, the @code{stdcall} attribute causes the compiler to
 assume that the called function pops off the stack space used to
 pass arguments, unless it takes a variable number of arguments.
 
+@cindex @code{no_callee_saved_registers} function attribute, x86
+@item no_callee_saved_registers
+Use this attribute to indicate that the specified function has no
+callee-saved registers. That is, all registers can be used as scratch
+registers. For example, this attribute can be used for a function
+called from the interrupt handler assembly stub which will preserve
+all registers and return from interrupt.
+
 @cindex @code{no_caller_saved_registers} function attribute, x86
 @item no_caller_saved_registers
 Use this attribute to indicate that the specified function has no
diff --git a/gcc/testsuite/gcc.dg/torture/no-callee-saved-run-1a.c b/gcc/testsuite/gcc.dg/torture/no-callee-saved-run-1a.c
new file mode 100644
index 00000000000..8c48ec0c79a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/no-callee-saved-run-1a.c
@@ -0,0 +1,23 @@
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-additional-sources no-callee-saved-run-1b.c } */
+
+extern void bar0 (int, int, int, int, int, int, int, int, int)
+   __attribute__ ((no_callee_saved_registers));
+
+void
+foo (void)
+{
+  bar0 (0, 1, 2, 3, 4, 5, 6, 7, 8);
+}
+
+int
+bar (int x)
+{
+  return x;
+}
+
+void
+bad (void)
+{
+  __builtin_abort ();
+}
diff --git a/gcc/testsuite/gcc.dg/torture/no-callee-saved-run-1b.c b/gcc/testsuite/gcc.dg/torture/no-callee-saved-run-1b.c
new file mode 100644
index 00000000000..b3ce7e72e85
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/no-callee-saved-run-1b.c
@@ -0,0 +1,59 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+
+extern void foo (void);
+extern void bad (void);
+extern int bar (int);
+
+void
+__attribute__ ((no_callee_saved_registers))
+bar0 (int i0, int i1, int i2, int i3, int i4, int i5, int i6,
+      int i7, int i8)
+{
+  if (i0 != 0)
+     bad ();
+
+  if (i1 != 1)
+     bad ();
+
+  if (i2 != 2)
+     bad ();
+
+  if (i3 != 3)
+     bad ();
+
+  if (i4 != 4)
+     bad ();
+
+  if (i5 != 5)
+     bad ();
+
+  if (i6 != 6)
+     bad ();
+
+  if (i7 != 7)
+     bad ();
+
+  if (i8 != 8)
+     bad ();
+
+  int a,b,c,d,e,f,i;
+  a = bar (5);
+  b = bar (a);
+  c = bar (b);
+  d = bar (c);
+  e = bar (d);
+  f = bar (e);
+  for (i = 1; i < 10; i++)
+  {
+    a += bar (a + i) + bar (b + i) +
+	 bar (c + i) + bar (d + i) +
+	 bar (e + i) + bar (f + i);
+  }
+}
+
+int
+main ()
+{
+  foo ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-1.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-1.c
new file mode 100644
index 00000000000..8fe36eb5198
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-1.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+
+extern int bar (int)
+#ifndef __x86_64__
+__attribute__ ((regparm(3)))
+#endif
+;
+
+__attribute__ ((no_callee_saved_registers))
+void
+foo (void *frame)
+{
+  int a,b,c,d,e,f,i;
+  a = bar (5);
+  b = bar (a);
+  c = bar (b);
+  d = bar (c);
+  e = bar (d);
+  f = bar (e);
+  for (i = 1; i < 10; i++)
+  {
+    a += bar (a + i) + bar (b + i) +
+	 bar (c + i) + bar (d + i) +
+	 bar (e + i) + bar (f + i);
+  }
+}
+
+/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-10.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-10.c
new file mode 100644
index 00000000000..87766c6cd88
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-10.c
@@ -0,0 +1,46 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mgeneral-regs-only -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+
+extern void bar (void) __attribute__ ((no_callee_saved_registers));
+
+__attribute__ ((no_caller_saved_registers))
+void
+foo (void)
+{
+  bar ();
+}
+
+/* foo must save and restore all caller saved registers since bar won't
+   preserve any.  */
+/* { dg-final { scan-assembler-not "jmp\[\\t \]+_?bar" } } */
+/* { dg-final { scan-assembler "call\[\\t \]+_?bar" } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)ax" 1 } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)cx" 1 } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)dx" 1 } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)bp" 1 } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)si" 1 } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)di" 1 } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r8" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r9" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r10" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r11" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r12" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r13" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r14" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r15" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)ax" 1 } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)cx" 1 } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)dx" 1 } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)bp" 1 } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)si" 1 } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)di" 1 } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r8" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r9" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r10" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r11" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r12" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r13" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r14" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r15" 1 { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-11.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-11.c
new file mode 100644
index 00000000000..902a764489e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-11.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+extern void foo (void); /* { dg-note "previous declaration" } */
+
+__attribute__ ((no_callee_saved_registers))
+void
+foo (void) /* { dg-error "conflicting types" } */
+{
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-12.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-12.c
new file mode 100644
index 00000000000..5524a4af29c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-12.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+extern void foo (void) __attribute__ ((no_callee_saved_registers)); /* { dg-note "previous declaration" } */
+
+void
+foo (void) /* { dg-error "conflicting types" } */
+{
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-13.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-13.c
new file mode 100644
index 00000000000..6757e72d848
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-13.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+
+extern void foo (void);
+
+__attribute__ ((no_callee_saved_registers))
+void
+bar (void)
+{
+  foo ();
+}
+
+/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
+/* { dg-final { scan-assembler-not "call\[\\t \]+_?foo" } } */
+/* { dg-final { scan-assembler "jmp\[\\t \]+_?foo" } } */
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-14.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-14.c
new file mode 100644
index 00000000000..2239e286e6a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-14.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+
+extern void bar (void) __attribute__ ((no_callee_saved_registers));
+
+__attribute__ ((no_callee_saved_registers))
+void
+foo (void)
+{
+  bar ();
+}
+
+/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
+/* { dg-final { scan-assembler "jmp\[\\t \]+_?bar" } } */
+/* { dg-final { scan-assembler-not "call\[\\t \]+_?bar" } } */
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-15.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-15.c
new file mode 100644
index 00000000000..10135fec9c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-15.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+
+typedef void (*fn_t) (void) __attribute__ ((no_callee_saved_registers));
+extern fn_t bar;
+
+__attribute__ ((no_callee_saved_registers))
+void
+foo (void)
+{
+  bar ();
+}
+
+/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
+/* { dg-final { scan-assembler "jmp" } } */
+/* { dg-final { scan-assembler-not "call\[\\t \]+" } } */
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-16.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-16.c
new file mode 100644
index 00000000000..112d1764f3e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-16.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+
+typedef void (*fn_t) (void) __attribute__ ((no_callee_saved_registers));
+
+__attribute__ ((no_callee_saved_registers))
+void
+foo (fn_t bar)
+{
+  bar ();
+}
+
+/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
+/* { dg-final { scan-assembler "jmp" } } */
+/* { dg-final { scan-assembler-not "call\[\\t \]+" } } */
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-17.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-17.c
new file mode 100644
index 00000000000..1fd5daadf08
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-17.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+
+extern void foo (void) __attribute__ ((no_caller_saved_registers));
+
+__attribute__ ((no_callee_saved_registers))
+void
+bar (void)
+{
+  foo ();
+}
+
+/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
+/* { dg-final { scan-assembler-not "call\[\\t \]+_?foo" } } */
+/* { dg-final { scan-assembler "jmp\[\\t \]+_?foo" } } */
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-18.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-18.c
new file mode 100644
index 00000000000..e7101009be4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-18.c
@@ -0,0 +1,51 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+
+#include <stdint.h>
+
+typedef void (*fn_t) (void) __attribute__ ((no_callee_saved_registers));
+
+void
+foo (uintptr_t p)
+{
+  ((fn_t) p) ();
+}
+
+/* foo must save and restore all caller saved registers since bar won't
+   preserve any.  */
+/* { dg-final { scan-assembler-not "jmp" } } */
+/* { dg-final { scan-assembler "call\[\\t \]+" } } */
+/* { dg-final { scan-assembler-not "push(?:l|q)\[\\t \]*%(?:e|r)ax" } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } } */
+/* { dg-final { scan-assembler-not "push(?:l|q)\[\\t \]*%(?:e|r)cx" } } */
+/* { dg-final { scan-assembler-not "push(?:l|q)\[\\t \]*%(?:e|r)dx" } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)bp" 1 } } */
+/* { dg-final { scan-assembler-times "pushl\[\\t \]*%esi" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%rsi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushl\[\\t \]*%edi" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%rdi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%r8" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%r9" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%r10" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%r11" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r12" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r13" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r14" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r15" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pop(?:l|q)\[\\t \]*%(?:e|r)ax" } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } } */
+/* { dg-final { scan-assembler-not "pop(?:l|q)\[\\t \]*%(?:e|r)cx" } } */
+/* { dg-final { scan-assembler-not "pop(?:l|q)\[\\t \]*%(?:e|r)dx" } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)bp" 1 } } */
+/* { dg-final { scan-assembler-times "popl\[\\t \]*%esi" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%rsi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popl\[\\t \]*%edi" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%rdi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%r8" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%r9" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%r10" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%r11" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r12" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r13" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r14" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r15" 1 { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-2.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-2.c
new file mode 100644
index 00000000000..ce4ab3b1799
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-2.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+
+extern int bar (int) __attribute__ ((no_caller_saved_registers))
+#ifndef __x86_64__
+__attribute__ ((regparm(3)))
+#endif
+;
+
+__attribute__ ((no_callee_saved_registers))
+void
+foo (void *frame)
+{
+  int a,b,c,d,e,f,i;
+  a = bar (5);
+  b = bar (a);
+  c = bar (b);
+  d = bar (c);
+  e = bar (d);
+  f = bar (e);
+  for (i = 1; i < 10; i++)
+  {
+    a += bar (a + i) + bar (b + i) +
+	 bar (c + i) + bar (d + i) +
+	 bar (e + i) + bar (f + i);
+  }
+}
+
+/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-3.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-3.c
new file mode 100644
index 00000000000..453272e11c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-3.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+__attribute__ ((no_callee_saved_registers, no_caller_saved_registers))
+void
+foo (void) /* { dg-error "attributes are not compatible" } */
+{
+}
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-4.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-4.c
new file mode 100644
index 00000000000..ec566aaf09f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-4.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mgeneral-regs-only" } */
+
+__attribute__ ((no_callee_saved_registers, interrupt))
+void
+foo (void *frame) /* { dg-error "attributes are not compatible" } */
+{
+}
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-5.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-5.c
new file mode 100644
index 00000000000..b28b211986a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-5.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef void (*fn_t) (void *) __attribute__ ((no_callee_saved_registers));
+
+void
+foo (void *frame)
+{
+}
+
+fn_t func = foo; /* { dg-error "incompatible pointer type" } */
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-6.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-6.c
new file mode 100644
index 00000000000..a7b3bdabf43
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-6.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef void (*fn_t) (void *) __attribute__ ((no_callee_saved_registers));
+
+__attribute__ ((no_callee_saved_registers))
+void
+foo (void *frame)
+{
+}
+
+fn_t func = foo;
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-7.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-7.c
new file mode 100644
index 00000000000..a1837fdfd4b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-7.c
@@ -0,0 +1,49 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+
+extern void bar (void) __attribute__ ((no_callee_saved_registers));
+
+void
+foo (void)
+{
+  bar ();
+}
+
+/* foo must save and restore all caller saved registers since bar won't
+   preserve any.  */
+/* { dg-final { scan-assembler-not "jmp\[\\t \]+_?bar" } } */
+/* { dg-final { scan-assembler "call\[\\t \]+_?bar" } } */
+/* { dg-final { scan-assembler-not "push(?:l|q)\[\\t \]*%(?:e|r)ax" } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } } */
+/* { dg-final { scan-assembler-not "push(?:l|q)\[\\t \]*%(?:e|r)cx" } } */
+/* { dg-final { scan-assembler-not "push(?:l|q)\[\\t \]*%(?:e|r)dx" } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)bp" 1 } } */
+/* { dg-final { scan-assembler-times "pushl\[\\t \]*%esi" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%rsi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushl\[\\t \]*%edi" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%rdi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%r8" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%r9" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%r10" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%r11" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r12" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r13" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r14" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r15" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pop(?:l|q)\[\\t \]*%(?:e|r)ax" } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } } */
+/* { dg-final { scan-assembler-not "pop(?:l|q)\[\\t \]*%(?:e|r)cx" } } */
+/* { dg-final { scan-assembler-not "pop(?:l|q)\[\\t \]*%(?:e|r)dx" } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)bp" 1 } } */
+/* { dg-final { scan-assembler-times "popl\[\\t \]*%esi" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%rsi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popl\[\\t \]*%edi" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%rdi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%r8" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%r9" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%r10" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%r11" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r12" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r13" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r14" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r15" 1 { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-8.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-8.c
new file mode 100644
index 00000000000..90b98a21aef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-8.c
@@ -0,0 +1,50 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+
+typedef void (*fn_t) (void) __attribute__ ((no_callee_saved_registers));
+extern fn_t bar;
+
+void
+foo (void)
+{
+  bar ();
+}
+
+/* foo must save and restore all caller saved registers since bar won't
+   preserve any.  */
+/* { dg-final { scan-assembler-not "jmp" } } */
+/* { dg-final { scan-assembler "call\[\\t \]+" } } */
+/* { dg-final { scan-assembler-not "push(?:l|q)\[\\t \]*%(?:e|r)ax" } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } } */
+/* { dg-final { scan-assembler-not "push(?:l|q)\[\\t \]*%(?:e|r)cx" } } */
+/* { dg-final { scan-assembler-not "push(?:l|q)\[\\t \]*%(?:e|r)dx" } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)bp" 1 } } */
+/* { dg-final { scan-assembler-times "pushl\[\\t \]*%esi" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%rsi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushl\[\\t \]*%edi" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%rdi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%r8" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%r9" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%r10" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%r11" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r12" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r13" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r14" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r15" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pop(?:l|q)\[\\t \]*%(?:e|r)ax" } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } } */
+/* { dg-final { scan-assembler-not "pop(?:l|q)\[\\t \]*%(?:e|r)cx" } } */
+/* { dg-final { scan-assembler-not "pop(?:l|q)\[\\t \]*%(?:e|r)dx" } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)bp" 1 } } */
+/* { dg-final { scan-assembler-times "popl\[\\t \]*%esi" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%rsi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popl\[\\t \]*%edi" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%rdi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%r8" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%r9" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%r10" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%r11" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r12" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r13" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r14" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r15" 1 { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-9.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-9.c
new file mode 100644
index 00000000000..e261100ac1a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-9.c
@@ -0,0 +1,49 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+
+typedef void (*fn_t) (void) __attribute__ ((no_callee_saved_registers));
+
+void
+foo (fn_t bar)
+{
+  bar ();
+}
+
+/* foo must save and restore all caller saved registers since bar won't
+   preserve any.  */
+/* { dg-final { scan-assembler-not "jmp" } } */
+/* { dg-final { scan-assembler "call\[\\t \]+" } } */
+/* { dg-final { scan-assembler-not "push(?:l|q)\[\\t \]*%(?:e|r)ax" } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } } */
+/* { dg-final { scan-assembler-not "push(?:l|q)\[\\t \]*%(?:e|r)cx" } } */
+/* { dg-final { scan-assembler-not "push(?:l|q)\[\\t \]*%(?:e|r)dx" } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)bp" 1 } } */
+/* { dg-final { scan-assembler-times "pushl\[\\t \]*%esi" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%rsi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushl\[\\t \]*%edi" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%rdi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%r8" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%r9" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%r10" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pushq\[\\t \]*%r11" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r12" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r13" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r14" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r15" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "pop(?:l|q)\[\\t \]*%(?:e|r)ax" } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } } */
+/* { dg-final { scan-assembler-not "pop(?:l|q)\[\\t \]*%(?:e|r)cx" } } */
+/* { dg-final { scan-assembler-not "pop(?:l|q)\[\\t \]*%(?:e|r)dx" } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)bp" 1 } } */
+/* { dg-final { scan-assembler-times "popl\[\\t \]*%esi" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%rsi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popl\[\\t \]*%edi" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%rdi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%r8" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%r9" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%r10" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "popq\[\\t \]*%r11" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r12" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r13" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r14" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r15" 1 { target { ! ia32 } } } } */
-- 
2.43.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] x86: Don't save callee-saved registers in noreturn functions
  2024-01-22 15:45 [PATCH v2 0/2] x86: Don't save callee-saved registers if not needed H.J. Lu
  2024-01-22 15:45 ` [PATCH v2 1/2] x86: Add no_callee_saved_registers function attribute H.J. Lu
@ 2024-01-22 15:45 ` H.J. Lu
  2024-01-22 16:58   ` Jan Hubicka
  1 sibling, 1 reply; 5+ messages in thread
From: H.J. Lu @ 2024-01-22 15:45 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu, jh

There is no need to save callee-saved registers in noreturn functions
if they don't throw nor support exceptions.  We can treat them the same
as functions with no_callee_saved_registers attribute.

Adjust stack-check-17.c for noreturn function which no longer saves any
registers.

With this change, __libc_start_main in glibc 2.39, which is a noreturn
function, is changed from

__libc_start_main:
	endbr64
	push   %r15
	push   %r14
	mov    %rcx,%r14
	push   %r13
	push   %r12
	push   %rbp
	mov    %esi,%ebp
	push   %rbx
	mov    %rdx,%rbx
	sub    $0x28,%rsp
	mov    %rdi,(%rsp)
	mov    %fs:0x28,%rax
	mov    %rax,0x18(%rsp)
	xor    %eax,%eax
	test   %r9,%r9

to

__libc_start_main:
	endbr64
        sub    $0x28,%rsp
        mov    %esi,%ebp
        mov    %rdx,%rbx
        mov    %rcx,%r14
        mov    %rdi,(%rsp)
        mov    %fs:0x28,%rax
        mov    %rax,0x18(%rsp)
        xor    %eax,%eax
        test   %r9,%r9

In Linux kernel 6.7.0 on x86-64, do_exit is changed from

do_exit:
        endbr64
        call   <do_exit+0x9>
        push   %r15
        push   %r14
        push   %r13
        push   %r12
        mov    %rdi,%r12
        push   %rbp
        push   %rbx
        mov    %gs:0x0,%rbx
        sub    $0x28,%rsp
        mov    %gs:0x28,%rax
        mov    %rax,0x20(%rsp)
        xor    %eax,%eax
        call   *0x0(%rip)        # <do_exit+0x39>
        test   $0x2,%ah
        je     <do_exit+0x8d3>

to

do_exit:
        endbr64
        call   <do_exit+0x9>
        sub    $0x28,%rsp
        mov    %rdi,%r12
        mov    %gs:0x28,%rax
        mov    %rax,0x20(%rsp)
        xor    %eax,%eax
        mov    %gs:0x0,%rbx
        call   *0x0(%rip)        # <do_exit+0x2f>
        test   $0x2,%ah
        je     <do_exit+0x8c9>

I compared GCC master branch bootstrap and test times on a slow machine
with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13
with the backported patch.  The performance data isn't precise since the
measurements were done on different days with different GCC sources under
different 6.6 kernel versions.

GCC master branch build time in seconds:

before                after                  improvement
30043.75user          30013.16user           0%
1274.85system         1243.72system          2.4%

GCC master branch test time in seconds (new tests added):

before                after                  improvement
216035.90user         216547.51user          0
27365.51system        26658.54system         2.6%

gcc/

	PR target/38534
	* config/i386/i386-options.cc (ix86_set_func_type): Don't
	save and restore callee saved registers for a noreturn function
	with nothrow or compiled with -fno-exceptions.

gcc/testsuite/

	PR target/38534
	* gcc.target/i386/pr38534-1.c: New file.
	* gcc.target/i386/pr38534-2.c: Likewise.
	* gcc.target/i386/pr38534-3.c: Likewise.
	* gcc.target/i386/pr38534-4.c: Likewise.
	* gcc.target/i386/stack-check-17.c: Updated.
---
 gcc/config/i386/i386-options.cc               | 16 ++++++++++--
 gcc/testsuite/gcc.target/i386/pr38534-1.c     | 26 +++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr38534-2.c     | 18 +++++++++++++
 gcc/testsuite/gcc.target/i386/pr38534-3.c     | 19 ++++++++++++++
 gcc/testsuite/gcc.target/i386/pr38534-4.c     | 18 +++++++++++++
 .../gcc.target/i386/stack-check-17.c          | 19 +++++---------
 6 files changed, 102 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-4.c

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 0cdea30599e..f965568947c 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -3371,9 +3371,21 @@ ix86_simd_clone_adjust (struct cgraph_node *node)
 static void
 ix86_set_func_type (tree fndecl)
 {
+  /* No need to save and restore callee-saved registers for a noreturn
+     function with nothrow or compiled with -fno-exceptions.
+
+     NB: Don't use TREE_THIS_VOLATILE to check if this is a noreturn
+     function.  The local-pure-const pass turns an interrupt function
+     into a noreturn function by setting TREE_THIS_VOLATILE.  Normally
+     the local-pure-const pass is run after ix86_set_func_type is called.
+     When the local-pure-const pass is enabled for LTO, the interrupt
+     function is marked as noreturn in the IR output, which leads the
+     incompatible attribute error in LTO1.  */
   bool has_no_callee_saved_registers
-    = lookup_attribute ("no_callee_saved_registers",
-			TYPE_ATTRIBUTES (TREE_TYPE (fndecl)));
+    = (((TREE_NOTHROW (fndecl) || !flag_exceptions)
+	&& lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl)))
+       || lookup_attribute ("no_callee_saved_registers",
+			    TYPE_ATTRIBUTES (TREE_TYPE (fndecl))));
 
   if (cfun->machine->func_type == TYPE_UNKNOWN)
     {
diff --git a/gcc/testsuite/gcc.target/i386/pr38534-1.c b/gcc/testsuite/gcc.target/i386/pr38534-1.c
new file mode 100644
index 00000000000..9297959e759
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr38534-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+
+#define ARRAY_SIZE 256
+
+extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE];
+extern int value (int, int, int)
+#ifndef __x86_64__
+__attribute__ ((regparm(3)))
+#endif
+;
+
+void
+__attribute__((noreturn))
+no_return_to_caller (void)
+{
+  unsigned i, j, k;
+  for (i = ARRAY_SIZE; i > 0; --i)
+    for (j = ARRAY_SIZE; j > 0; --j)
+      for (k = ARRAY_SIZE; k > 0; --k)
+	array[i - 1][j - 1][k - 1] = value (i, j, k);
+  while (1);
+}
+
+/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr38534-2.c b/gcc/testsuite/gcc.target/i386/pr38534-2.c
new file mode 100644
index 00000000000..1fb01363273
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr38534-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+
+extern void bar (void) __attribute__ ((no_callee_saved_registers));
+extern void fn (void) __attribute__ ((noreturn));
+
+__attribute__ ((noreturn))
+void
+foo (void)
+{
+  bar ();
+  fn ();
+}
+
+/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
+/* { dg-final { scan-assembler-not "jmp\[\\t \]+_?bar" } } */
+/* { dg-final { scan-assembler "call\[\\t \]+_?bar" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr38534-3.c b/gcc/testsuite/gcc.target/i386/pr38534-3.c
new file mode 100644
index 00000000000..87fc35f3fe9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr38534-3.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+
+typedef void (*fn_t) (void) __attribute__ ((no_callee_saved_registers));
+extern fn_t bar;
+extern void fn (void) __attribute__ ((noreturn));
+
+__attribute__ ((noreturn))
+void
+foo (void)
+{
+  bar ();
+  fn ();
+}
+
+/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
+/* { dg-final { scan-assembler-not "jmp" } } */
+/* { dg-final { scan-assembler "call\[\\t \]+" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr38534-4.c b/gcc/testsuite/gcc.target/i386/pr38534-4.c
new file mode 100644
index 00000000000..561ebeef194
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr38534-4.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+
+typedef void (*fn_t) (void) __attribute__ ((no_callee_saved_registers));
+extern void fn (void) __attribute__ ((noreturn));
+
+__attribute__ ((noreturn))
+void
+foo (fn_t bar)
+{
+  bar ();
+  fn ();
+}
+
+/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
+/* { dg-final { scan-assembler-not "jmp" } } */
+/* { dg-final { scan-assembler "call\[\\t \]+" } } */
diff --git a/gcc/testsuite/gcc.target/i386/stack-check-17.c b/gcc/testsuite/gcc.target/i386/stack-check-17.c
index b3e41cb3d25..061484e1319 100644
--- a/gcc/testsuite/gcc.target/i386/stack-check-17.c
+++ b/gcc/testsuite/gcc.target/i386/stack-check-17.c
@@ -23,19 +23,14 @@ f3 (void)
 /* Verify no explicit probes.  */
 /* { dg-final { scan-assembler-not "or\[ql\]" } } */
 
-/* We also want to verify we did not use a push/pop sequence
-   to probe *sp as the callee register saves are sufficient
-   to probe *sp.
-
-   y0/y1 are live across the call and thus must be allocated
+/* y0/y1 are live across the call and thus must be allocated
    into either a stack slot or callee saved register.  The former
    would be rather dumb.  So assume it does not happen.
 
-   So search for two/four pushes for the callee register saves/argument pushes
-   (plus one for the PIC register if needed on ia32) and no pops (since the
-   function has no reachable epilogue).  */
-/* { dg-final { scan-assembler-times "push\[ql\]" 2 { target { ! ia32 } } } }  */
-/* { dg-final { scan-assembler-times "push\[ql\]" 4 { target { ia32 && nonpic } } } }  */
-/* { dg-final { scan-assembler-times "push\[ql\]" 5 { target { ia32 && { ! nonpic } } } } }  */
-/* { dg-final { scan-assembler-not "pop" } } */
+   So search for a push/pop sequence for stack probe and 2 argument
+   pushes on ia32.  There is no need to save and restore the PIC
+   register on ia32 for a noreturn function.  */
+/* { dg-final { scan-assembler-times "push\[ql\]" 1 { target { ! ia32 } } } }  */
+/* { dg-final { scan-assembler-times "push\[ql\]" 3 { target ia32 } } }  */
+/* { dg-final { scan-assembler-times "pop" 1 } } */
 
-- 
2.43.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] x86: Don't save callee-saved registers in noreturn functions
  2024-01-22 15:45 ` [PATCH v2 2/2] x86: Don't save callee-saved registers in noreturn functions H.J. Lu
@ 2024-01-22 16:58   ` Jan Hubicka
  2024-01-22 18:37     ` H.J. Lu
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Hubicka @ 2024-01-22 16:58 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, ubizjak, hongtao.liu

> I compared GCC master branch bootstrap and test times on a slow machine
> with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13
> with the backported patch.  The performance data isn't precise since the
> measurements were done on different days with different GCC sources under
> different 6.6 kernel versions.
> 
> GCC master branch build time in seconds:
> 
> before                after                  improvement
> 30043.75user          30013.16user           0%
> 1274.85system         1243.72system          2.4%
> 
> GCC master branch test time in seconds (new tests added):
> 
> before                after                  improvement
> 216035.90user         216547.51user          0
> 27365.51system        26658.54system         2.6%

It is interesting - the system time difference comes from smaller
binary?  Is the difference any significant?
> 
> gcc/
> 
> 	PR target/38534
> 	* config/i386/i386-options.cc (ix86_set_func_type): Don't
> 	save and restore callee saved registers for a noreturn function
> 	with nothrow or compiled with -fno-exceptions.

In general this looks like good thing to do.  I wonder if that is not
something middle-end should understand for all targets.
Also I wonder about asynchronous stack unwinding.  If we want to unwind
stack from interrupt then we may need some registers to be saved (like
base pointer).

Honza
> 
> gcc/testsuite/
> 
> 	PR target/38534
> 	* gcc.target/i386/pr38534-1.c: New file.
> 	* gcc.target/i386/pr38534-2.c: Likewise.
> 	* gcc.target/i386/pr38534-3.c: Likewise.
> 	* gcc.target/i386/pr38534-4.c: Likewise.
> 	* gcc.target/i386/stack-check-17.c: Updated.
> ---
>  gcc/config/i386/i386-options.cc               | 16 ++++++++++--
>  gcc/testsuite/gcc.target/i386/pr38534-1.c     | 26 +++++++++++++++++++
>  gcc/testsuite/gcc.target/i386/pr38534-2.c     | 18 +++++++++++++
>  gcc/testsuite/gcc.target/i386/pr38534-3.c     | 19 ++++++++++++++
>  gcc/testsuite/gcc.target/i386/pr38534-4.c     | 18 +++++++++++++
>  .../gcc.target/i386/stack-check-17.c          | 19 +++++---------
>  6 files changed, 102 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-4.c
> 
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index 0cdea30599e..f965568947c 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -3371,9 +3371,21 @@ ix86_simd_clone_adjust (struct cgraph_node *node)
>  static void
>  ix86_set_func_type (tree fndecl)
>  {
> +  /* No need to save and restore callee-saved registers for a noreturn
> +     function with nothrow or compiled with -fno-exceptions.
> +
> +     NB: Don't use TREE_THIS_VOLATILE to check if this is a noreturn
> +     function.  The local-pure-const pass turns an interrupt function
> +     into a noreturn function by setting TREE_THIS_VOLATILE.  Normally
> +     the local-pure-const pass is run after ix86_set_func_type is called.
> +     When the local-pure-const pass is enabled for LTO, the interrupt
> +     function is marked as noreturn in the IR output, which leads the
> +     incompatible attribute error in LTO1.  */
>    bool has_no_callee_saved_registers
> -    = lookup_attribute ("no_callee_saved_registers",
> -			TYPE_ATTRIBUTES (TREE_TYPE (fndecl)));
> +    = (((TREE_NOTHROW (fndecl) || !flag_exceptions)
> +	&& lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl)))
> +       || lookup_attribute ("no_callee_saved_registers",
> +			    TYPE_ATTRIBUTES (TREE_TYPE (fndecl))));
>  
>    if (cfun->machine->func_type == TYPE_UNKNOWN)
>      {
> diff --git a/gcc/testsuite/gcc.target/i386/pr38534-1.c b/gcc/testsuite/gcc.target/i386/pr38534-1.c
> new file mode 100644
> index 00000000000..9297959e759
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr38534-1.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
> +
> +#define ARRAY_SIZE 256
> +
> +extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE];
> +extern int value (int, int, int)
> +#ifndef __x86_64__
> +__attribute__ ((regparm(3)))
> +#endif
> +;
> +
> +void
> +__attribute__((noreturn))
> +no_return_to_caller (void)
> +{
> +  unsigned i, j, k;
> +  for (i = ARRAY_SIZE; i > 0; --i)
> +    for (j = ARRAY_SIZE; j > 0; --j)
> +      for (k = ARRAY_SIZE; k > 0; --k)
> +	array[i - 1][j - 1][k - 1] = value (i, j, k);
> +  while (1);
> +}
> +
> +/* { dg-final { scan-assembler-not "push" } } */
> +/* { dg-final { scan-assembler-not "pop" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr38534-2.c b/gcc/testsuite/gcc.target/i386/pr38534-2.c
> new file mode 100644
> index 00000000000..1fb01363273
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr38534-2.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
> +
> +extern void bar (void) __attribute__ ((no_callee_saved_registers));
> +extern void fn (void) __attribute__ ((noreturn));
> +
> +__attribute__ ((noreturn))
> +void
> +foo (void)
> +{
> +  bar ();
> +  fn ();
> +}
> +
> +/* { dg-final { scan-assembler-not "push" } } */
> +/* { dg-final { scan-assembler-not "pop" } } */
> +/* { dg-final { scan-assembler-not "jmp\[\\t \]+_?bar" } } */
> +/* { dg-final { scan-assembler "call\[\\t \]+_?bar" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr38534-3.c b/gcc/testsuite/gcc.target/i386/pr38534-3.c
> new file mode 100644
> index 00000000000..87fc35f3fe9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr38534-3.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
> +
> +typedef void (*fn_t) (void) __attribute__ ((no_callee_saved_registers));
> +extern fn_t bar;
> +extern void fn (void) __attribute__ ((noreturn));
> +
> +__attribute__ ((noreturn))
> +void
> +foo (void)
> +{
> +  bar ();
> +  fn ();
> +}
> +
> +/* { dg-final { scan-assembler-not "push" } } */
> +/* { dg-final { scan-assembler-not "pop" } } */
> +/* { dg-final { scan-assembler-not "jmp" } } */
> +/* { dg-final { scan-assembler "call\[\\t \]+" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr38534-4.c b/gcc/testsuite/gcc.target/i386/pr38534-4.c
> new file mode 100644
> index 00000000000..561ebeef194
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr38534-4.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
> +
> +typedef void (*fn_t) (void) __attribute__ ((no_callee_saved_registers));
> +extern void fn (void) __attribute__ ((noreturn));
> +
> +__attribute__ ((noreturn))
> +void
> +foo (fn_t bar)
> +{
> +  bar ();
> +  fn ();
> +}
> +
> +/* { dg-final { scan-assembler-not "push" } } */
> +/* { dg-final { scan-assembler-not "pop" } } */
> +/* { dg-final { scan-assembler-not "jmp" } } */
> +/* { dg-final { scan-assembler "call\[\\t \]+" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/stack-check-17.c b/gcc/testsuite/gcc.target/i386/stack-check-17.c
> index b3e41cb3d25..061484e1319 100644
> --- a/gcc/testsuite/gcc.target/i386/stack-check-17.c
> +++ b/gcc/testsuite/gcc.target/i386/stack-check-17.c
> @@ -23,19 +23,14 @@ f3 (void)
>  /* Verify no explicit probes.  */
>  /* { dg-final { scan-assembler-not "or\[ql\]" } } */
>  
> -/* We also want to verify we did not use a push/pop sequence
> -   to probe *sp as the callee register saves are sufficient
> -   to probe *sp.
> -
> -   y0/y1 are live across the call and thus must be allocated
> +/* y0/y1 are live across the call and thus must be allocated
>     into either a stack slot or callee saved register.  The former
>     would be rather dumb.  So assume it does not happen.
>  
> -   So search for two/four pushes for the callee register saves/argument pushes
> -   (plus one for the PIC register if needed on ia32) and no pops (since the
> -   function has no reachable epilogue).  */
> -/* { dg-final { scan-assembler-times "push\[ql\]" 2 { target { ! ia32 } } } }  */
> -/* { dg-final { scan-assembler-times "push\[ql\]" 4 { target { ia32 && nonpic } } } }  */
> -/* { dg-final { scan-assembler-times "push\[ql\]" 5 { target { ia32 && { ! nonpic } } } } }  */
> -/* { dg-final { scan-assembler-not "pop" } } */
> +   So search for a push/pop sequence for stack probe and 2 argument
> +   pushes on ia32.  There is no need to save and restore the PIC
> +   register on ia32 for a noreturn function.  */
> +/* { dg-final { scan-assembler-times "push\[ql\]" 1 { target { ! ia32 } } } }  */
> +/* { dg-final { scan-assembler-times "push\[ql\]" 3 { target ia32 } } }  */
> +/* { dg-final { scan-assembler-times "pop" 1 } } */
>  
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] x86: Don't save callee-saved registers in noreturn functions
  2024-01-22 16:58   ` Jan Hubicka
@ 2024-01-22 18:37     ` H.J. Lu
  0 siblings, 0 replies; 5+ messages in thread
From: H.J. Lu @ 2024-01-22 18:37 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, ubizjak, hongtao.liu

On Mon, Jan 22, 2024 at 8:58 AM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> > I compared GCC master branch bootstrap and test times on a slow machine
> > with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13
> > with the backported patch.  The performance data isn't precise since the
> > measurements were done on different days with different GCC sources under
> > different 6.6 kernel versions.
> >
> > GCC master branch build time in seconds:
> >
> > before                after                  improvement
> > 30043.75user          30013.16user           0%
> > 1274.85system         1243.72system          2.4%
> >
> > GCC master branch test time in seconds (new tests added):
> >
> > before                after                  improvement
> > 216035.90user         216547.51user          0
> > 27365.51system        26658.54system         2.6%
>
> It is interesting - the system time difference comes from smaller
> binary?  Is the difference any significant?

I think it comes from

In Linux kernel 6.7.0 on x86-64, do_exit is changed from

do_exit:
        endbr64
        call   <do_exit+0x9>
        push   %r15
        push   %r14
        push   %r13
        push   %r12
        mov    %rdi,%r12
        push   %rbp
        push   %rbx
        mov    %gs:0x0,%rbx
        sub    $0x28,%rsp
        mov    %gs:0x28,%rax
        mov    %rax,0x20(%rsp)
        xor    %eax,%eax
        call   *0x0(%rip)        # <do_exit+0x39>
        test   $0x2,%ah
        je     <do_exit+0x8d3>

to

do_exit:
        endbr64
        call   <do_exit+0x9>
        sub    $0x28,%rsp
        mov    %rdi,%r12
        mov    %gs:0x28,%rax
        mov    %rax,0x20(%rsp)
        xor    %eax,%eax
        mov    %gs:0x0,%rbx
        call   *0x0(%rip)        # <do_exit+0x2f>
        test   $0x2,%ah
        je     <do_exit+0x8c9>

do_exit is called by every process when it exists.

> >
> > gcc/
> >
> >       PR target/38534
> >       * config/i386/i386-options.cc (ix86_set_func_type): Don't
> >       save and restore callee saved registers for a noreturn function
> >       with nothrow or compiled with -fno-exceptions.
>
> In general this looks like good thing to do.  I wonder if that is not
> something middle-end should understand for all targets.
> Also I wonder about asynchronous stack unwinding.  If we want to unwind
> stack from interrupt then we may need some registers to be saved (like
> base pointer).

It is compatible with -fasynchronous-unwind-tables.  From glibc test
debug/tst-longjmp_chk:

Starting program:
/export/build/gnu/tools-build/glibc-cet/build-x86_64-linux/debug/tst-longjmp_chk
--direct
warning: Unable to find libthread_db matching inferior's thread
library, thread debugging will not be available.

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6,
    no_tid=no_tid@entry=0) at pthread_kill.c:44
44       return INTERNAL_SYSCALL_ERROR_P (ret) ?
INTERNAL_SYSCALL_ERRNO (ret) : 0;
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>,
    signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x0000555555294a4b in __pthread_kill_internal (signo=6,
    threadid=<optimized out>) at pthread_kill.c:78
#2  0x000055555523da1a in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/posix/raise.c:26
#3  0x00005555552248b3 in __GI_abort () at abort.c:79
#4  0x0000555555225a7e in __libc_message_impl (
    fmt=fmt@entry=0x5555553b7171 "*** %s ***: terminated\n")
    at ../sysdeps/posix/libc_fatal.c:132
#5  0x0000555555324517 in __GI___fortify_fail (msg=<optimized out>)
    at fortify_fail.c:24
#6  0x0000555555323411 in ____longjmp_chk ()
    at ../sysdeps/x86_64/__longjmp.S:57
#7  0x0000555555324d6d in __GI___longjmp_chk (
    env=env@entry=0x55555555a200 <b>, val=val@entry=1)
    at ../setjmp/longjmp.c:41
#8  0x0000555555556a00 in do_test () at tst-longjmp_chk.c:70
#9  0x0000555555557388 in support_test_main (argc=1431675392,
    argv=0x7fffffffdd30, config=0x1, config@entry=0x7fffffffdbe0)
    at support_test_main.c:413
#10 0x000055555555673f in main (argc=<optimized out>, argv=<optimized out>)
    at ../support/test-driver.c:170
(gdb)

abort is a return function:

extern void abort (void) __THROW __attribute__ ((__noreturn__));

Callee-saved registers aren't saved:

Dump of assembler code for function __GI_abort:
   0x00005555552247de <+0>: endbr64
   0x00005555552247e2 <+4>: sub    $0xa8,%rsp
   0x00005555552247e9 <+11>: lea    0x1d1540(%rip),%rbx        #
0x5555553f5d30 <lock>
   0x00005555552247f0 <+18>: mov    %fs:0x28,%rax
   0x00005555552247f9 <+27>: mov    %rax,0x98(%rsp)
   0x0000555555224801 <+35>: xor    %eax,%eax
   0x0000555555224803 <+37>: mov    %fs:0x10,%rbp
   0x000055555522480c <+46>: cmp    %rbp,0x1d1525(%rip)        #
0x5555553f5d38 <lock+8>
   0x0000555555224813 <+53>: je     0x555555224833 <__GI_abort+85>
   0x0000555555224815 <+55>: mov    $0x1,%edx
   0x000055555522481a <+60>: lock cmpxchg %edx,0x1d150e(%rip)        #
0x5555553f5d30 <lock>
   0x0000555555224822 <+68>: je     0x55555522482c <__GI_abort+78>
   0x0000555555224824 <+70>: mov    %rbx,%rdi


> Honza
> >
> > gcc/testsuite/
> >
> >       PR target/38534
> >       * gcc.target/i386/pr38534-1.c: New file.
> >       * gcc.target/i386/pr38534-2.c: Likewise.
> >       * gcc.target/i386/pr38534-3.c: Likewise.
> >       * gcc.target/i386/pr38534-4.c: Likewise.
> >       * gcc.target/i386/stack-check-17.c: Updated.
> > ---
> >  gcc/config/i386/i386-options.cc               | 16 ++++++++++--
> >  gcc/testsuite/gcc.target/i386/pr38534-1.c     | 26 +++++++++++++++++++
> >  gcc/testsuite/gcc.target/i386/pr38534-2.c     | 18 +++++++++++++
> >  gcc/testsuite/gcc.target/i386/pr38534-3.c     | 19 ++++++++++++++
> >  gcc/testsuite/gcc.target/i386/pr38534-4.c     | 18 +++++++++++++
> >  .../gcc.target/i386/stack-check-17.c          | 19 +++++---------
> >  6 files changed, 102 insertions(+), 14 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-4.c
> >
> > diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> > index 0cdea30599e..f965568947c 100644
> > --- a/gcc/config/i386/i386-options.cc
> > +++ b/gcc/config/i386/i386-options.cc
> > @@ -3371,9 +3371,21 @@ ix86_simd_clone_adjust (struct cgraph_node *node)
> >  static void
> >  ix86_set_func_type (tree fndecl)
> >  {
> > +  /* No need to save and restore callee-saved registers for a noreturn
> > +     function with nothrow or compiled with -fno-exceptions.
> > +
> > +     NB: Don't use TREE_THIS_VOLATILE to check if this is a noreturn
> > +     function.  The local-pure-const pass turns an interrupt function
> > +     into a noreturn function by setting TREE_THIS_VOLATILE.  Normally
> > +     the local-pure-const pass is run after ix86_set_func_type is called.
> > +     When the local-pure-const pass is enabled for LTO, the interrupt
> > +     function is marked as noreturn in the IR output, which leads the
> > +     incompatible attribute error in LTO1.  */
> >    bool has_no_callee_saved_registers
> > -    = lookup_attribute ("no_callee_saved_registers",
> > -                     TYPE_ATTRIBUTES (TREE_TYPE (fndecl)));
> > +    = (((TREE_NOTHROW (fndecl) || !flag_exceptions)
> > +     && lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl)))
> > +       || lookup_attribute ("no_callee_saved_registers",
> > +                         TYPE_ATTRIBUTES (TREE_TYPE (fndecl))));
> >
> >    if (cfun->machine->func_type == TYPE_UNKNOWN)
> >      {
> > diff --git a/gcc/testsuite/gcc.target/i386/pr38534-1.c b/gcc/testsuite/gcc.target/i386/pr38534-1.c
> > new file mode 100644
> > index 00000000000..9297959e759
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr38534-1.c
> > @@ -0,0 +1,26 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
> > +
> > +#define ARRAY_SIZE 256
> > +
> > +extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE];
> > +extern int value (int, int, int)
> > +#ifndef __x86_64__
> > +__attribute__ ((regparm(3)))
> > +#endif
> > +;
> > +
> > +void
> > +__attribute__((noreturn))
> > +no_return_to_caller (void)
> > +{
> > +  unsigned i, j, k;
> > +  for (i = ARRAY_SIZE; i > 0; --i)
> > +    for (j = ARRAY_SIZE; j > 0; --j)
> > +      for (k = ARRAY_SIZE; k > 0; --k)
> > +     array[i - 1][j - 1][k - 1] = value (i, j, k);
> > +  while (1);
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "push" } } */
> > +/* { dg-final { scan-assembler-not "pop" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr38534-2.c b/gcc/testsuite/gcc.target/i386/pr38534-2.c
> > new file mode 100644
> > index 00000000000..1fb01363273
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr38534-2.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
> > +
> > +extern void bar (void) __attribute__ ((no_callee_saved_registers));
> > +extern void fn (void) __attribute__ ((noreturn));
> > +
> > +__attribute__ ((noreturn))
> > +void
> > +foo (void)
> > +{
> > +  bar ();
> > +  fn ();
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "push" } } */
> > +/* { dg-final { scan-assembler-not "pop" } } */
> > +/* { dg-final { scan-assembler-not "jmp\[\\t \]+_?bar" } } */
> > +/* { dg-final { scan-assembler "call\[\\t \]+_?bar" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr38534-3.c b/gcc/testsuite/gcc.target/i386/pr38534-3.c
> > new file mode 100644
> > index 00000000000..87fc35f3fe9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr38534-3.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
> > +
> > +typedef void (*fn_t) (void) __attribute__ ((no_callee_saved_registers));
> > +extern fn_t bar;
> > +extern void fn (void) __attribute__ ((noreturn));
> > +
> > +__attribute__ ((noreturn))
> > +void
> > +foo (void)
> > +{
> > +  bar ();
> > +  fn ();
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "push" } } */
> > +/* { dg-final { scan-assembler-not "pop" } } */
> > +/* { dg-final { scan-assembler-not "jmp" } } */
> > +/* { dg-final { scan-assembler "call\[\\t \]+" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr38534-4.c b/gcc/testsuite/gcc.target/i386/pr38534-4.c
> > new file mode 100644
> > index 00000000000..561ebeef194
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr38534-4.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
> > +
> > +typedef void (*fn_t) (void) __attribute__ ((no_callee_saved_registers));
> > +extern void fn (void) __attribute__ ((noreturn));
> > +
> > +__attribute__ ((noreturn))
> > +void
> > +foo (fn_t bar)
> > +{
> > +  bar ();
> > +  fn ();
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "push" } } */
> > +/* { dg-final { scan-assembler-not "pop" } } */
> > +/* { dg-final { scan-assembler-not "jmp" } } */
> > +/* { dg-final { scan-assembler "call\[\\t \]+" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/stack-check-17.c b/gcc/testsuite/gcc.target/i386/stack-check-17.c
> > index b3e41cb3d25..061484e1319 100644
> > --- a/gcc/testsuite/gcc.target/i386/stack-check-17.c
> > +++ b/gcc/testsuite/gcc.target/i386/stack-check-17.c
> > @@ -23,19 +23,14 @@ f3 (void)
> >  /* Verify no explicit probes.  */
> >  /* { dg-final { scan-assembler-not "or\[ql\]" } } */
> >
> > -/* We also want to verify we did not use a push/pop sequence
> > -   to probe *sp as the callee register saves are sufficient
> > -   to probe *sp.
> > -
> > -   y0/y1 are live across the call and thus must be allocated
> > +/* y0/y1 are live across the call and thus must be allocated
> >     into either a stack slot or callee saved register.  The former
> >     would be rather dumb.  So assume it does not happen.
> >
> > -   So search for two/four pushes for the callee register saves/argument pushes
> > -   (plus one for the PIC register if needed on ia32) and no pops (since the
> > -   function has no reachable epilogue).  */
> > -/* { dg-final { scan-assembler-times "push\[ql\]" 2 { target { ! ia32 } } } }  */
> > -/* { dg-final { scan-assembler-times "push\[ql\]" 4 { target { ia32 && nonpic } } } }  */
> > -/* { dg-final { scan-assembler-times "push\[ql\]" 5 { target { ia32 && { ! nonpic } } } } }  */
> > -/* { dg-final { scan-assembler-not "pop" } } */
> > +   So search for a push/pop sequence for stack probe and 2 argument
> > +   pushes on ia32.  There is no need to save and restore the PIC
> > +   register on ia32 for a noreturn function.  */
> > +/* { dg-final { scan-assembler-times "push\[ql\]" 1 { target { ! ia32 } } } }  */
> > +/* { dg-final { scan-assembler-times "push\[ql\]" 3 { target ia32 } } }  */
> > +/* { dg-final { scan-assembler-times "pop" 1 } } */
> >
> > --
> > 2.43.0
> >



-- 
H.J.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-01-22 18:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-22 15:45 [PATCH v2 0/2] x86: Don't save callee-saved registers if not needed H.J. Lu
2024-01-22 15:45 ` [PATCH v2 1/2] x86: Add no_callee_saved_registers function attribute H.J. Lu
2024-01-22 15:45 ` [PATCH v2 2/2] x86: Don't save callee-saved registers in noreturn functions H.J. Lu
2024-01-22 16:58   ` Jan Hubicka
2024-01-22 18:37     ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).