public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH] PR47440 - Use LCM for vzeroupper insertion
@ 2012-08-23  9:53 Uros Bizjak
  0 siblings, 0 replies; 3+ messages in thread
From: Uros Bizjak @ 2012-08-23  9:53 UTC (permalink / raw)
  To: gcc-patches
  Cc: Владимир
	Яковлев

Hello!

> 2012-08-25  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
>
>         * config/epiphany/epiphany.h (EMIT_MODE_SET): Added an argument.
>
>         * config/i386/i386-protos.h (emit_i387_cw_initialization):
>         Deleted
>         (emit_vzero): Added prototype
>         (ix86_mode_entry): Likewise
>         (ix86_mode_exit): Likewise
>         (ix86_emit_mode_set): Likewise.
>
>         * config/i386/i386.c (typedef struct block_info_def): Deleted
>         (define BLOCK_INFO): Deleted
>         (check_avx256_stores): Added checking for MEM_P
>         (move_or_delete_vzeroupper_2): Deleted
>         (move_or_delete_vzeroupper_1): Deleted
>         (move_or_delete_vzeroupper): Deleted
>         (ix86_maybe_emit_epilogue_vzeroupper): Deleted
>         (function_pass_avx256_p): Deleted
>         (ix86_function_ok_for_sibcall): Deleted disabling sibcall
>         (nit_cumulative_args): Deleted initialization of of avx256 fields of
>         cfun->machine
>         (ix86_emit_restore_sse_regs_using_mov): Deleted vzeroupper generation
>         (ix86_expand_epilogue): Likewise
>         (is_vzeroupper): New
>         (is_vzeroall): Likewise
>         (ix86_avx_u128_mode_needed): Likewise
>         (x86_mode_needed): Addad a switch case for AVX_U128
>         (x86_avx_u128_mode_after): New
>         (x86_mode_after): Likewise
>         (ix86_avx_u128_mode_entry): Likewise
>         (ix86_mode_entry): Likewise
>         (x86_avx_u128_mode_exit): Likewise
>         (ix86_mode_exit): Likewise
>         (x86_emit_vzeroupper): Likewise
>         (ix86_emit_mode_set): Likewise
>         (x86_expand_call): Deleted vzeroupper generation
>         (ix86_split_call_vzeroupper): Deleted
>         (x86_init_machine_status): Initialzed optimize_mode_switching
>         (ix86_expand_special_args_builtin): Changed
>         (ix86_reorg): Deletd a call of move_or_delete_vzeroupper.
>
>         * config/i386/i386.h (AVX_U128): New
>         (NUM_MODES_FOR_MODE_SWITCHING): Added AVX_U128_ANY
>         (MODE_AFTER): New
>         (MODE_ENTRY): Likewise
>         (MODE_EXIT): Likewise
>         (EMIT_MODE_SET): Changed
>         (machine_function): Deleted avx256 fields.
>
>         * config/i386/i386.md (UNSPEC_CALL_NEEDS_VZEROUPPER): Deleted
>         (define_insn_and_split "*call_vzeroupper): Likewise.
>         (define_insn_and_split "*call_rex64_ms_sysv_vzeroupper"): Likewise
>         (define_insn_and_split "*sibcall_vzeroupper"): Likewise
>         (define_insn_and_split "*call_pop_vzeroupper"): Likewise
>         (define_insn_and_split "*sibcall_pop_vzeroupper"): Likewise
>         (define_insn_and_split "*call_value_vzeroupper"): Likewise
>         (define_insn_and_split "*sibcall_value_vzeroupper"): Likewise
>         (define_insn_and_split "*call_value_rex64_ms_sysv_vzeroupper"): Likewise
>         (define_insn_and_split "*call_value_pop_vzeroupper"): Likewise
>         (define_insn_and_split "*sibcall_value_pop_vzeroupper"): Likewise
>         (define_expand "return"): Delete vzeroupper emitting
>         (define_expand "simple_return"): Likewise.
>
>         * config/sh/sh.h (MODE_AFTER): Added an argument
>         (EMIT_MODE_SET): Likewise.
>
>         * mode-switching.c (transp): Changed type
>         (make_preds_opaque): Added an argument
>         (optimize_mode_switching): Added code for VZEROUPPER elemination.
>
> 2012-08-20  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
>
>         * gcc.target/i386/avx-vzeroupper-5.c: Changed scan-assembler-times
>         gcc.target/i386/avx-vzeroupper-8.c: Likewise
>         gcc.target/i386/avx-vzeroupper-9.c: Likewise
>         gcc.target/i386/avx-vzeroupper-10.c: Likewise
>         gcc.target/i386/avx-vzeroupper-11.c: Likewise
>         gcc.target/i386/avx-vzeroupper-12.c: Likewise

Please split middle-end changes and improvements out of the patch to
discuss these changes with middle-end people first.

After these changes are approved, we will proceed with target-dependent changes.

Uros.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] PR47440 - Use LCM for vzeroupper insertion
@ 2012-08-23 11:38 Uros Bizjak
  0 siblings, 0 replies; 3+ messages in thread
From: Uros Bizjak @ 2012-08-23 11:38 UTC (permalink / raw)
  To: gcc-patches
  Cc: Владимир
	Яковлев

Hello!

> avx-vzeroupper-3 fails because reload moves AVX store through vzeroupper.
>
> Before reload
> (insn 2 27 3 2 (set (reg/v:V4DI 61 [ src ])

...

> After reload
> (insn 6 3 29 2 (set (reg:QI 0 ax)
>
> I think it is data flow analyze problem. Uros  refers to code at
> df-scan.c, line 3248 . He wrote
>
> This kind of defeat the purpose of UNSPEC_VOLATILE, and is probably
> the root cause of moves.  I don't know how to attack this efficiently,
> I suggest to ask on the list about the issue.
>
> What is possible solution in this case?

It looks to me that we have to introduce post-reload LCM insertion
pass. Please note that vzeroupper is defined with hard registers only
(and FWIW, vzero too), so there is no concept of virtuals in these
patterns.

The instruction that is inserted post-reload can be defined as:

--cut here--
;; Clear the upper 128bits of AVX registers, equivalent to a NOP
;; if the upper 128bits are unused.
(define_expand "avx_vzeroupper"
  [(match_par_dup 1 [(match_operand 0 "const_int_operand")])]
  "TARGET_AVX"
{
  int nregs = TARGET_64BIT ? 16 : 8;
  int regno;

  operands[1] = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (nregs + 1));

  XVECEXP (operands[1], 0, 0)
    = gen_rtx_UNSPEC_VOLATILE (VOIDmode, gen_rtvec (1, operands[0]),
			       UNSPECV_VZEROUPPER);

  for (regno = 0; regno < nregs; regno++)
    XVECEXP (operands[1], 0, regno + 1)
      = gen_rtx_SET (VOIDmode,
		     gen_rtx_REG (V8SImode, SSE_REGNO (regno)),
		     gen_rtx_VEC_MERGE (V4SImode,
					gen_rtx_REG (V4SImode,
						     SSE_REGNO (regno)),
					CONST0_RTX (V4SImode),
					const1_rtx));
})

(define_insn "*avx_vzeroupper"
  [(match_parallel 1 "vzeroupper_operation"
    [(unspec_volatile [(match_operand 0 "const_int_operand")]
    		      UNSPECV_VZEROUPPER)])]
  "TARGET_AVX"
  "vzeroupper\t# %0"
  [(set_attr "type" "sse")
   (set_attr "modrm" "0")
   (set_attr "memory" "none")
   (set_attr "prefix" "vex")
   (set_attr "mode" "OI")])
--cut here--

Also, vzeroupper and vzero that are generated via __builtin_ia32_*
should be generated in a different way. The call to builtin should
insert unspec_volatile marker that will be split post-reload to a real
pattern with all hard registers enumerated in the insn RTL body.

Uros.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] PR47440 - Use LCM for vzeroupper insertion
@ 2012-08-21 13:52 Vladimir Yakovlev
  0 siblings, 0 replies; 3+ messages in thread
From: Vladimir Yakovlev @ 2012-08-21 13:52 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 6668 bytes --]

Please review changes for vzeroupper placement using mode switching
technique. This fixes
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47440.
Bootstrap and make check was performed. I used compiler

Target: x86_64-unknown-linux-gnu
Configured with: ../gcc/configure --enable-languages=c,c++,fortran
--with-arch=corei7 --with-cpu=corei7 --with-fpmath=sse

There are fails.
FAIL: gcc.target/i386/avx-vzeroupper-3.c execution test
FAIL: gcc.target/x86_64/abi/avx/test_passing_unions.c execution,  -O2
FAIL: gcc.target/x86_64/abi/avx/test_passing_unions.c execution,  -O3
-fomit-frame-pointer
FAIL: gcc.target/x86_64/abi/avx/test_passing_unions.c execution,  -O3
-fomit-frame-pointer -funroll-loops
FAIL: gcc.target/x86_64/abi/avx/test_passing_unions.c execution,  -O3
-fomit-frame-pointer -funroll-all-loops -finline-functions
FAIL: gcc.target/x86_64/abi/avx/test_passing_unions.c execution,  -O3
-g

avx-vzeroupper-3 fails because reload moves AVX store through vzeroupper.

Before reload
(insn 2 27 3 2 (set (reg/v:V4DI 61 [ src ])
        (reg:V4DI 21 xmm0 [ src ]))
/export/users/vbyakovl/workspaces/vzu/gcc/gcc/testsuite/gcc.target/i386/avx-vzeroupper-3.c:22
1093 {*movv4di_internal}
     (expr_list:REG_DEAD (reg:V4DI 21 xmm0 [ src ])
        (nil)))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 29 2 (set (reg:QI 0 ax)
        (const_int 0 [0]))
/export/users/vbyakovl/workspaces/vzu/gcc/gcc/testsuite/gcc.target/i386/avx-vzeroupper-3.c:23
67 {*movqi_internal}
     (nil))
(insn 29 6 7 2 (unspec_volatile [
            (const_int 9 [0x9])
        ] UNSPECV_VZEROUPPER)
/export/users/vbyakovl/workspaces/vzu/gcc/gcc/testsuite/gcc.target/i386/avx-vzeroupper-3.c:23
1840 {avx_vzeroupper}
     (nil))
(call_insn 7 29 9 2 (call (mem:QI (symbol_ref:DI ("foo") [flags 0x3]
<function_decl 0x7f4cf8fe8500 foo>) [0 foo S1 A8])

After reload
(insn 6 3 29 2 (set (reg:QI 0 ax)
        (const_int 0 [0]))
/export/users/vbyakovl/workspaces/vzu/gcc/gcc/testsuite/gcc.target/i386/avx-vzeroupper-3.c:23
67 {*movqi_internal}
     (nil))
(insn 29 6 33 2 (unspec_volatile [
            (const_int 9 [0x9])
        ] UNSPECV_VZEROUPPER)
/export/users/vbyakovl/workspaces/vzu/gcc/gcc/testsuite/gcc.target/i386/avx-vzeroupper-3.c:23
1840 {avx_vzeroupper}
     (nil))
(insn 33 29 7 2 (set (mem/c:V4DI (reg/f:DI 7 sp) [4 S32 A256])
        (reg:V4DI 21 xmm0))
/export/users/vbyakovl/workspaces/vzu/gcc/gcc/testsuite/gcc.target/i386/avx-vzeroupper-3.c:23
1093 {*movv4di_internal}
     (nil))
(call_insn 7 33 9 2 (call (mem:QI (symbol_ref:DI ("foo") [flags 0x3]
<function_decl 0x7f4cf8fe8500 foo>) [0 foo S1 A8])

I think it is data flow analyze problem. Uros  refers to code at
df-scan.c, line 3248 . He wrote

This kind of defeat the purpose of UNSPEC_VOLATILE, and is probably
the root cause of moves.  I don't know how to attack this efficiently,
I suggest to ask on the list about the issue.

What is possible solution in this case?

test_passing_unions. It is a compiler error also. I failed PR54342 on
this issue.

Ok for trunk?

2012-08-25  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>

        * config/epiphany/epiphany.h (EMIT_MODE_SET): Added an argument.

        * config/i386/i386-protos.h (emit_i387_cw_initialization):
        Deleted
        (emit_vzero): Added prototype
        (ix86_mode_entry): Likewise
        (ix86_mode_exit): Likewise
        (ix86_emit_mode_set): Likewise.

        * config/i386/i386.c (typedef struct block_info_def): Deleted
        (define BLOCK_INFO): Deleted
        (check_avx256_stores): Added checking for MEM_P
        (move_or_delete_vzeroupper_2): Deleted
        (move_or_delete_vzeroupper_1): Deleted
        (move_or_delete_vzeroupper): Deleted
        (ix86_maybe_emit_epilogue_vzeroupper): Deleted
        (function_pass_avx256_p): Deleted
        (ix86_function_ok_for_sibcall): Deleted disabling sibcall
        (nit_cumulative_args): Deleted initialization of of avx256 fields of
        cfun->machine
        (ix86_emit_restore_sse_regs_using_mov): Deleted vzeroupper generation
        (ix86_expand_epilogue): Likewise
        (is_vzeroupper): New
        (is_vzeroall): Likewise
        (ix86_avx_u128_mode_needed): Likewise
        (x86_mode_needed): Addad a switch case for AVX_U128
        (x86_avx_u128_mode_after): New
        (x86_mode_after): Likewise
        (ix86_avx_u128_mode_entry): Likewise
        (ix86_mode_entry): Likewise
        (x86_avx_u128_mode_exit): Likewise
        (ix86_mode_exit): Likewise
        (x86_emit_vzeroupper): Likewise
        (ix86_emit_mode_set): Likewise
        (x86_expand_call): Deleted vzeroupper generation
        (ix86_split_call_vzeroupper): Deleted
        (x86_init_machine_status): Initialzed optimize_mode_switching
        (ix86_expand_special_args_builtin): Changed
        (ix86_reorg): Deletd a call of move_or_delete_vzeroupper.

        * config/i386/i386.h (AVX_U128): New
        (NUM_MODES_FOR_MODE_SWITCHING): Added AVX_U128_ANY
        (MODE_AFTER): New
        (MODE_ENTRY): Likewise
        (MODE_EXIT): Likewise
        (EMIT_MODE_SET): Changed
        (machine_function): Deleted avx256 fields.

        * config/i386/i386.md (UNSPEC_CALL_NEEDS_VZEROUPPER): Deleted
        (define_insn_and_split "*call_vzeroupper): Likewise.
        (define_insn_and_split "*call_rex64_ms_sysv_vzeroupper"): Likewise
        (define_insn_and_split "*sibcall_vzeroupper"): Likewise
        (define_insn_and_split "*call_pop_vzeroupper"): Likewise
        (define_insn_and_split "*sibcall_pop_vzeroupper"): Likewise
        (define_insn_and_split "*call_value_vzeroupper"): Likewise
        (define_insn_and_split "*sibcall_value_vzeroupper"): Likewise
        (define_insn_and_split "*call_value_rex64_ms_sysv_vzeroupper"): Likewise
        (define_insn_and_split "*call_value_pop_vzeroupper"): Likewise
        (define_insn_and_split "*sibcall_value_pop_vzeroupper"): Likewise
        (define_expand "return"): Delete vzeroupper emitting
        (define_expand "simple_return"): Likewise.

        * config/sh/sh.h (MODE_AFTER): Added an argument
        (EMIT_MODE_SET): Likewise.

        * mode-switching.c (transp): Changed type
        (make_preds_opaque): Added an argument
        (optimize_mode_switching): Added code for VZEROUPPER elemination.

2012-08-20  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>

        * gcc.target/i386/avx-vzeroupper-5.c: Changed scan-assembler-times
        gcc.target/i386/avx-vzeroupper-8.c: Likewise
        gcc.target/i386/avx-vzeroupper-9.c: Likewise
        gcc.target/i386/avx-vzeroupper-10.c: Likewise
        gcc.target/i386/avx-vzeroupper-11.c: Likewise
        gcc.target/i386/avx-vzeroupper-12.c: Likewise

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 52263 bytes --]

diff --git a/gcc/ChangeLog.vzu b/gcc/ChangeLog.vzu
new file mode 100644
index 0000000..d630124
--- /dev/null
+++ b/gcc/ChangeLog.vzu
@@ -0,0 +1,70 @@
+2012-06-25  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
+
+	*  config/epiphany/epiphany.h (EMIT_MODE_SET): Added an argument.
+
+	*  config/i386/i386-protos.h (emit_i387_cw_initialization):
+	Deleted
+	(emit_vzero): Added prototype
+	(ix86_mode_entry): Likewise
+	(ix86_mode_exit): Likewise
+	(ix86_emit_mode_set): Likewise.
+
+	* config/i386/i386.c (typedef struct block_info_def): Deleted
+	(define BLOCK_INFO): Deleted
+	(check_avx256_stores): Added checking for MEM_P
+	(move_or_delete_vzeroupper_2): Deleted
+	(move_or_delete_vzeroupper_1): Deleted
+	(move_or_delete_vzeroupper): Deleted
+	(ix86_maybe_emit_epilogue_vzeroupper): Deleted
+	(function_pass_avx256_p): Deleted
+	(ix86_function_ok_for_sibcall): Deleted disabling sibcall
+	(nit_cumulative_args): Deleted initialization of of avx256 fields of
+	cfun->machine
+	(ix86_emit_restore_sse_regs_using_mov): Deleted vzeroupper generation
+	(ix86_expand_epilogue): Likewise
+	(is_vzeroupper): New
+	(is_vzeroall): Likewise
+	(ix86_avx_u128_mode_needed): Likewise
+	(x86_mode_needed): Addad a switch case for AVX_U128
+	(x86_avx_u128_mode_after): New
+	(x86_mode_after): Likewise
+	(ix86_avx_u128_mode_entry): Likewise
+	(ix86_mode_entry): Likewise
+	(x86_avx_u128_mode_exit): Likewise
+	(ix86_mode_exit): Likewise
+	(x86_emit_vzeroupper): Likewise
+	(ix86_emit_mode_set): Likewise
+	(x86_expand_call): Deleted vzeroupper generation
+	(ix86_split_call_vzeroupper): Deleted
+	(x86_init_machine_status): Initialzed optimize_mode_switching
+	(ix86_expand_special_args_builtin): Changed
+	(ix86_reorg): Deletd a call of move_or_delete_vzeroupper.
+
+	* config/i386/i386.h (AVX_U128): New
+	(NUM_MODES_FOR_MODE_SWITCHING): Added AVX_U128_ANY
+	(MODE_AFTER): New
+	(MODE_ENTRY): Likewise
+	(MODE_EXIT): Likewise
+	(EMIT_MODE_SET): Changed
+	(machine_function): Deleted avx256 fields.
+
+	* config/i386/i386.md (UNSPEC_CALL_NEEDS_VZEROUPPER): Deleted
+	(define_insn_and_split "*call_vzeroupper): Likewise.
+	(define_insn_and_split "*call_rex64_ms_sysv_vzeroupper"): Likewise
+	(define_insn_and_split "*sibcall_vzeroupper"): Likewise
+	(define_insn_and_split "*call_pop_vzeroupper"): Likewise
+	(define_insn_and_split "*sibcall_pop_vzeroupper"): Likewise
+	(define_insn_and_split "*call_value_vzeroupper"): Likewise
+	(define_insn_and_split "*sibcall_value_vzeroupper"): Likewise
+	(define_insn_and_split "*call_value_rex64_ms_sysv_vzeroupper"): Likewise
+	(define_insn_and_split "*call_value_pop_vzeroupper"): Likewise
+	(define_insn_and_split "*sibcall_value_pop_vzeroupper"): Likewise
+	(define_expand "return"): Delete vzeroupper emitting
+	(define_expand "simple_return"): Likewise.
+
+	* config/sh/sh.h (MODE_AFTER): Added an argument
+	(EMIT_MODE_SET): Likewise.
+
+	* mode-switching.c (transp): Changed type
+	(make_preds_opaque): Added an argument
+	(optimize_mode_switching): Added code for VZEROUPPER elemination.
diff --git a/gcc/config/epiphany/epiphany.h b/gcc/config/epiphany/epiphany.h
index b1b5e8b..70e1cd0 100644
--- a/gcc/config/epiphany/epiphany.h
+++ b/gcc/config/epiphany/epiphany.h
@@ -883,7 +883,7 @@ enum epiphany_function_type
 #define MODE_PRIORITY_TO_MODE(ENTITY, N) \
   (epiphany_mode_priority_to_mode ((ENTITY), (N)))
 
-#define EMIT_MODE_SET(ENTITY, MODE, HARD_REGS_LIVE) \
+#define EMIT_MODE_SET(ENTITY, MODE, HARD_REGS_LIVE, INSN) \
   emit_set_fp_mode ((ENTITY), (MODE), (HARD_REGS_LIVE))
 
 #define MODE_ENTRY(ENTITY) (epiphany_mode_entry_exit ((ENTITY), false))
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index a1daeda..97777da 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -165,8 +165,13 @@ extern bool ix86_secondary_memory_needed (enum reg_class, enum reg_class,
 					  enum machine_mode, int);
 extern bool ix86_cannot_change_mode_class (enum machine_mode,
 					   enum machine_mode, enum reg_class);
+
 extern int ix86_mode_needed (int, rtx);
-extern void emit_i387_cw_initialization (int);
+extern int ix86_mode_after (int, int, rtx);
+extern int ix86_mode_entry (int);
+extern int ix86_mode_exit (int);
+extern void ix86_emit_mode_set (int, int, rtx);
+
 extern void x86_order_regs_for_local_alloc (void);
 extern void x86_function_profiler (FILE *, int);
 extern void x86_emit_floatuns (rtx [2]);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 624dab1..7286008 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -61,6 +61,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "fibheap.h"
 #include "opts.h"
 #include "diagnostic.h"
+#include "tree-pass.h"
 #include "dumpfile.h"
 
 enum upper_128bits_state
@@ -70,47 +71,15 @@ enum upper_128bits_state
   used
 };
 
-typedef struct block_info_def
-{
-  /* State of the upper 128bits of AVX registers at exit.  */
-  enum upper_128bits_state state;
-  /* TRUE if state of the upper 128bits of AVX registers is unchanged
-     in this block.  */
-  bool unchanged;
-  /* TRUE if block has been processed.  */
-  bool processed;
-  /* TRUE if block has been scanned.  */
-  bool scanned;
-  /* Previous state of the upper 128bits of AVX registers at entry.  */
-  enum upper_128bits_state prev;
-} *block_info;
-
-#define BLOCK_INFO(B)   ((block_info) (B)->aux)
-
-enum call_avx256_state
-{
-  /* Callee returns 256bit AVX register.  */
-  callee_return_avx256 = -1,
-  /* Callee returns and passes 256bit AVX register.  */
-  callee_return_pass_avx256,
-  /* Callee passes 256bit AVX register.  */
-  callee_pass_avx256,
-  /* Callee doesn't return nor passe 256bit AVX register, or no
-     256bit AVX register in function return.  */
-  call_no_avx256,
-  /* vzeroupper intrinsic.  */
-  vzeroupper_intrinsic
-};
-
 /* Check if a 256bit AVX register is referenced in stores.   */
 
 static void
 check_avx256_stores (rtx dest, const_rtx set, void *data)
 {
-  if ((REG_P (dest)
+  if (((REG_P (dest) || MEM_P(dest))
        && VALID_AVX256_REG_MODE (GET_MODE (dest)))
       || (GET_CODE (set) == SET
-	  && REG_P (SET_SRC (set))
+	  && (REG_P (SET_SRC (set)) || MEM_P (SET_SRC (set)))
 	  && VALID_AVX256_REG_MODE (GET_MODE (SET_SRC (set)))))
     {
       enum upper_128bits_state *state
@@ -119,377 +88,6 @@ check_avx256_stores (rtx dest, const_rtx set, void *data)
     }
 }
 
-/* Helper function for move_or_delete_vzeroupper_1.  Look for vzeroupper
-   in basic block BB.  Delete it if upper 128bit AVX registers are
-   unused.  If it isn't deleted, move it to just before a jump insn.
-
-   STATE is state of the upper 128bits of AVX registers at entry.  */
-
-static void
-move_or_delete_vzeroupper_2 (basic_block bb,
-			     enum upper_128bits_state state)
-{
-  rtx insn, bb_end;
-  rtx vzeroupper_insn = NULL_RTX;
-  rtx pat;
-  int avx256;
-  bool unchanged;
-
-  if (BLOCK_INFO (bb)->unchanged)
-    {
-      if (dump_file)
-	fprintf (dump_file, " [bb %i] unchanged: upper 128bits: %d\n",
-		 bb->index, state);
-
-      BLOCK_INFO (bb)->state = state;
-      return;
-    }
-
-  if (BLOCK_INFO (bb)->scanned && BLOCK_INFO (bb)->prev == state)
-    {
-      if (dump_file)
-	fprintf (dump_file, " [bb %i] scanned: upper 128bits: %d\n",
-		 bb->index, BLOCK_INFO (bb)->state);
-      return;
-    }
-
-  BLOCK_INFO (bb)->prev = state;
-
-  if (dump_file)
-    fprintf (dump_file, " [bb %i] entry: upper 128bits: %d\n",
-	     bb->index, state);
-
-  unchanged = true;
-
-  /* BB_END changes when it is deleted.  */
-  bb_end = BB_END (bb);
-  insn = BB_HEAD (bb);
-  while (insn != bb_end)
-    {
-      insn = NEXT_INSN (insn);
-
-      if (!NONDEBUG_INSN_P (insn))
-	continue;
-
-      /* Move vzeroupper before jump/call.  */
-      if (JUMP_P (insn) || CALL_P (insn))
-	{
-	  if (!vzeroupper_insn)
-	    continue;
-
-	  if (PREV_INSN (insn) != vzeroupper_insn)
-	    {
-	      if (dump_file)
-		{
-		  fprintf (dump_file, "Move vzeroupper after:\n");
-		  print_rtl_single (dump_file, PREV_INSN (insn));
-		  fprintf (dump_file, "before:\n");
-		  print_rtl_single (dump_file, insn);
-		}
-	      reorder_insns_nobb (vzeroupper_insn, vzeroupper_insn,
-				  PREV_INSN (insn));
-	    }
-	  vzeroupper_insn = NULL_RTX;
-	  continue;
-	}
-
-      pat = PATTERN (insn);
-
-      /* Check insn for vzeroupper intrinsic.  */
-      if (GET_CODE (pat) == UNSPEC_VOLATILE
-	  && XINT (pat, 1) == UNSPECV_VZEROUPPER)
-	{
-	  if (dump_file)
-	    {
-	      /* Found vzeroupper intrinsic.  */
-	      fprintf (dump_file, "Found vzeroupper:\n");
-	      print_rtl_single (dump_file, insn);
-	    }
-	}
-      else
-	{
-	  /* Check insn for vzeroall intrinsic.  */
-	  if (GET_CODE (pat) == PARALLEL
-	      && GET_CODE (XVECEXP (pat, 0, 0)) == UNSPEC_VOLATILE
-	      && XINT (XVECEXP (pat, 0, 0), 1) == UNSPECV_VZEROALL)
-	    {
-	      state = unused;
-	      unchanged = false;
-
-	      /* Delete pending vzeroupper insertion.  */
-	      if (vzeroupper_insn)
-		{
-		  delete_insn (vzeroupper_insn);
-		  vzeroupper_insn = NULL_RTX;
-		}
-	    }
-	  else if (state != used)
-	    {
-	      note_stores (pat, check_avx256_stores, &state);
-	      if (state == used)
-		unchanged = false;
-	    }
-	  continue;
-	}
-
-      /* Process vzeroupper intrinsic.  */
-      avx256 = INTVAL (XVECEXP (pat, 0, 0));
-
-      if (state == unused)
-	{
-	  /* Since the upper 128bits are cleared, callee must not pass
-	     256bit AVX register.  We only need to check if callee
-	     returns 256bit AVX register.  */
-	  if (avx256 == callee_return_avx256)
-	    {
-	      state = used;
-	      unchanged = false;
-	    }
-
-	  /* Remove unnecessary vzeroupper since upper 128bits are
-	     cleared.  */
-	  if (dump_file)
-	    {
-	      fprintf (dump_file, "Delete redundant vzeroupper:\n");
-	      print_rtl_single (dump_file, insn);
-	    }
-	  delete_insn (insn);
-	}
-      else
-	{
-	  /* Set state to UNUSED if callee doesn't return 256bit AVX
-	     register.  */
-	  if (avx256 != callee_return_pass_avx256)
-	    state = unused;
-
-	  if (avx256 == callee_return_pass_avx256
-	      || avx256 == callee_pass_avx256)
-	    {
-	      /* Must remove vzeroupper since callee passes in 256bit
-		 AVX register.  */
-	      if (dump_file)
-		{
-		  fprintf (dump_file, "Delete callee pass vzeroupper:\n");
-		  print_rtl_single (dump_file, insn);
-		}
-	      delete_insn (insn);
-	    }
-	  else
-	    {
-	      vzeroupper_insn = insn;
-	      unchanged = false;
-	    }
-	}
-    }
-
-  BLOCK_INFO (bb)->state = state;
-  BLOCK_INFO (bb)->unchanged = unchanged;
-  BLOCK_INFO (bb)->scanned = true;
-
-  if (dump_file)
-    fprintf (dump_file, " [bb %i] exit: %s: upper 128bits: %d\n",
-	     bb->index, unchanged ? "unchanged" : "changed",
-	     state);
-}
-
-/* Helper function for move_or_delete_vzeroupper.  Process vzeroupper
-   in BLOCK and check its predecessor blocks.  Treat UNKNOWN state
-   as USED if UNKNOWN_IS_UNUSED is true.  Return TRUE if the exit
-   state is changed.  */
-
-static bool
-move_or_delete_vzeroupper_1 (basic_block block, bool unknown_is_unused)
-{
-  edge e;
-  edge_iterator ei;
-  enum upper_128bits_state state, old_state, new_state;
-  bool seen_unknown;
-
-  if (dump_file)
-    fprintf (dump_file, " Process [bb %i]: status: %d\n",
-	     block->index, BLOCK_INFO (block)->processed);
-
-  if (BLOCK_INFO (block)->processed)
-    return false;
-
-  state = unused;
-
-  /* Check all predecessor edges of this block.  */
-  seen_unknown = false;
-  FOR_EACH_EDGE (e, ei, block->preds)
-    {
-      if (e->src == block)
-	continue;
-      switch (BLOCK_INFO (e->src)->state)
-	{
-	case unknown:
-	  if (!unknown_is_unused)
-	    seen_unknown = true;
-	case unused:
-	  break;
-	case used:
-	  state = used;
-	  goto done;
-	}
-    }
-
-  if (seen_unknown)
-    state = unknown;
-
-done:
-  old_state = BLOCK_INFO (block)->state;
-  move_or_delete_vzeroupper_2 (block, state);
-  new_state = BLOCK_INFO (block)->state;
-
-  if (state != unknown || new_state == used)
-    BLOCK_INFO (block)->processed = true;
-
-  /* Need to rescan if the upper 128bits of AVX registers are changed
-     to USED at exit.  */
-  if (new_state != old_state)
-    {
-      if (new_state == used)
-	cfun->machine->rescan_vzeroupper_p = 1;
-      return true;
-    }
-  else
-    return false;
-}
-
-/* Go through the instruction stream looking for vzeroupper.  Delete
-   it if upper 128bit AVX registers are unused.  If it isn't deleted,
-   move it to just before a jump insn.  */
-
-static void
-move_or_delete_vzeroupper (void)
-{
-  edge e;
-  edge_iterator ei;
-  basic_block bb;
-  fibheap_t worklist, pending, fibheap_swap;
-  sbitmap visited, in_worklist, in_pending, sbitmap_swap;
-  int *bb_order;
-  int *rc_order;
-  int i;
-
-  /* Set up block info for each basic block.  */
-  alloc_aux_for_blocks (sizeof (struct block_info_def));
-
-  /* Process outgoing edges of entry point.  */
-  if (dump_file)
-    fprintf (dump_file, "Process outgoing edges of entry point\n");
-
-  FOR_EACH_EDGE (e, ei, ENTRY_BLOCK_PTR->succs)
-    {
-      move_or_delete_vzeroupper_2 (e->dest,
-				   cfun->machine->caller_pass_avx256_p
-				   ? used : unused);
-      BLOCK_INFO (e->dest)->processed = true;
-    }
-
-  /* Compute reverse completion order of depth first search of the CFG
-     so that the data-flow runs faster.  */
-  rc_order = XNEWVEC (int, n_basic_blocks - NUM_FIXED_BLOCKS);
-  bb_order = XNEWVEC (int, last_basic_block);
-  pre_and_rev_post_order_compute (NULL, rc_order, false);
-  for (i = 0; i < n_basic_blocks - NUM_FIXED_BLOCKS; i++)
-    bb_order[rc_order[i]] = i;
-  free (rc_order);
-
-  worklist = fibheap_new ();
-  pending = fibheap_new ();
-  visited = sbitmap_alloc (last_basic_block);
-  in_worklist = sbitmap_alloc (last_basic_block);
-  in_pending = sbitmap_alloc (last_basic_block);
-  sbitmap_zero (in_worklist);
-
-  /* Don't check outgoing edges of entry point.  */
-  sbitmap_ones (in_pending);
-  FOR_EACH_BB (bb)
-    if (BLOCK_INFO (bb)->processed)
-      RESET_BIT (in_pending, bb->index);
-    else
-      {
-	move_or_delete_vzeroupper_1 (bb, false);
-	fibheap_insert (pending, bb_order[bb->index], bb);
-      }
-
-  if (dump_file)
-    fprintf (dump_file, "Check remaining basic blocks\n");
-
-  while (!fibheap_empty (pending))
-    {
-      fibheap_swap = pending;
-      pending = worklist;
-      worklist = fibheap_swap;
-      sbitmap_swap = in_pending;
-      in_pending = in_worklist;
-      in_worklist = sbitmap_swap;
-
-      sbitmap_zero (visited);
-
-      cfun->machine->rescan_vzeroupper_p = 0;
-
-      while (!fibheap_empty (worklist))
-	{
-	  bb = (basic_block) fibheap_extract_min (worklist);
-	  RESET_BIT (in_worklist, bb->index);
-	  gcc_assert (!TEST_BIT (visited, bb->index));
-	  if (!TEST_BIT (visited, bb->index))
-	    {
-	      edge_iterator ei;
-
-	      SET_BIT (visited, bb->index);
-
-	      if (move_or_delete_vzeroupper_1 (bb, false))
-		FOR_EACH_EDGE (e, ei, bb->succs)
-		  {
-		    if (e->dest == EXIT_BLOCK_PTR
-			|| BLOCK_INFO (e->dest)->processed)
-		      continue;
-
-		    if (TEST_BIT (visited, e->dest->index))
-		      {
-			if (!TEST_BIT (in_pending, e->dest->index))
-			  {
-			    /* Send E->DEST to next round.  */
-			    SET_BIT (in_pending, e->dest->index);
-			    fibheap_insert (pending,
-					    bb_order[e->dest->index],
-					    e->dest);
-			  }
-		      }
-		    else if (!TEST_BIT (in_worklist, e->dest->index))
-		      {
-			/* Add E->DEST to current round.  */
-			SET_BIT (in_worklist, e->dest->index);
-			fibheap_insert (worklist, bb_order[e->dest->index],
-					e->dest);
-		      }
-		  }
-	    }
-	}
-
-      if (!cfun->machine->rescan_vzeroupper_p)
-	break;
-    }
-
-  free (bb_order);
-  fibheap_delete (worklist);
-  fibheap_delete (pending);
-  sbitmap_free (visited);
-  sbitmap_free (in_worklist);
-  sbitmap_free (in_pending);
-
-  if (dump_file)
-    fprintf (dump_file, "Process remaining basic blocks\n");
-
-  FOR_EACH_BB (bb)
-    move_or_delete_vzeroupper_1 (bb, true);
-
-  free_aux_for_blocks ();
-}
-
 static rtx legitimize_dllimport_symbol (rtx, bool);
 
 #ifndef CHECK_STACK_LIMIT
@@ -4091,37 +3689,6 @@ ix86_option_override_internal (bool main_args_p)
       = build_target_option_node ();
 }
 
-/* Return TRUE if VAL is passed in register with 256bit AVX modes.  */
-
-static bool
-function_pass_avx256_p (const_rtx val)
-{
-  if (!val)
-    return false;
-
-  if (REG_P (val) && VALID_AVX256_REG_MODE (GET_MODE (val)))
-    return true;
-
-  if (GET_CODE (val) == PARALLEL)
-    {
-      int i;
-      rtx r;
-
-      for (i = XVECLEN (val, 0) - 1; i >= 0; i--)
-	{
-	  r = XVECEXP (val, 0, i);
-	  if (GET_CODE (r) == EXPR_LIST
-	      && XEXP (r, 0)
-	      && REG_P (XEXP (r, 0))
-	      && (GET_MODE (XEXP (r, 0)) == OImode
-		  || VALID_AVX256_REG_MODE (GET_MODE (XEXP (r, 0)))))
-	    return true;
-	}
-    }
-
-  return false;
-}
-
 /* Implement the TARGET_OPTION_OVERRIDE hook.  */
 
 static void
@@ -5041,15 +4608,6 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
       if (!rtx_equal_p (a, b))
 	return false;
     }
-  else if (VOID_TYPE_P (TREE_TYPE (DECL_RESULT (cfun->decl))))
-    {
-      /* Disable sibcall if we need to generate vzeroupper after
-	 callee returns.  */
-      if (TARGET_VZEROUPPER
-	  && cfun->machine->callee_return_avx256_p
-	  && !cfun->machine->caller_return_avx256_p)
-	return false;
-    }
   else if (!rtx_equal_p (a, b))
     return false;
 
@@ -5757,45 +5315,18 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* Argument info to initialize */
 		      int caller)
 {
   struct cgraph_local_info *i;
-  tree fnret_type;
 
   memset (cum, 0, sizeof (*cum));
 
-  /* Initialize for the current callee.  */
-  if (caller)
-    {
-      cfun->machine->callee_pass_avx256_p = false;
-      cfun->machine->callee_return_avx256_p = false;
-    }
-
   if (fndecl)
     {
       i = cgraph_local_info (fndecl);
       cum->call_abi = ix86_function_abi (fndecl);
-      fnret_type = TREE_TYPE (TREE_TYPE (fndecl));
     }
   else
     {
       i = NULL;
       cum->call_abi = ix86_function_type_abi (fntype);
-      if (fntype)
-	fnret_type = TREE_TYPE (fntype);
-      else
-	fnret_type = NULL;
-    }
-
-  if (TARGET_VZEROUPPER && fnret_type)
-    {
-      rtx fnret_value = ix86_function_value (fnret_type, fntype,
-					     false);
-      if (function_pass_avx256_p (fnret_value))
-	{
-	  /* The return value of this function uses 256bit AVX modes.  */
-	  if (caller)
-	    cfun->machine->callee_return_avx256_p = true;
-	  else
-	    cfun->machine->caller_return_avx256_p = true;
-	}
     }
 
   cum->caller = caller;
@@ -7088,15 +6619,6 @@ ix86_function_arg (cumulative_args_t cum_v, enum machine_mode omode,
   else
     arg = function_arg_32 (cum, mode, omode, type, bytes, words);
 
-  if (TARGET_VZEROUPPER && function_pass_avx256_p (arg))
-    {
-      /* This argument uses 256bit AVX modes.  */
-      if (cum->caller)
-	cfun->machine->callee_pass_avx256_p = true;
-      else
-	cfun->machine->caller_pass_avx256_p = true;
-    }
-
   return arg;
 }
 
@@ -10911,17 +10433,6 @@ ix86_emit_restore_sse_regs_using_mov (HOST_WIDE_INT cfa_offset,
       }
 }
 
-/* Emit vzeroupper if needed.  */
-
-void
-ix86_maybe_emit_epilogue_vzeroupper (void)
-{
-  if (TARGET_VZEROUPPER
-      && !TREE_THIS_VOLATILE (cfun->decl)
-      && !cfun->machine->caller_return_avx256_p)
-    emit_insn (gen_avx_vzeroupper (GEN_INT (call_no_avx256)));
-}
-
 /* Restore function stack, frame, and registers.  */
 
 void
@@ -11223,9 +10734,6 @@ ix86_expand_epilogue (int style)
       return;
     }
 
-  /* Emit vzeroupper if needed.  */
-  ix86_maybe_emit_epilogue_vzeroupper ();
-
   if (crtl->args.pops_args && crtl->args.size)
     {
       rtx popc = GEN_INT (crtl->args.pops_args);
@@ -15340,10 +14848,69 @@ output_387_binary_op (rtx insn, rtx *operands)
   return buf;
 }
 
+/* Check insn for vzeroupper intrinsic.  */
+
+static bool
+is_vzeroupper (rtx pat)
+{
+  return pat
+	 && GET_CODE (pat) == UNSPEC_VOLATILE
+	 && XINT (pat, 1) == UNSPECV_VZEROUPPER;
+}
+
+/* Check insn for vzeroall intrinsic.  */
+
+static bool
+is_vzeroall (rtx pat)
+{
+  return pat
+	 && GET_CODE (pat) == PARALLEL
+	 && GET_CODE (XVECEXP (pat, 0, 0)) == UNSPEC_VOLATILE
+	 && XINT (XVECEXP (pat, 0, 0), 1) == UNSPECV_VZEROALL;
+}
+
 /* Return needed mode for entity in optimize_mode_switching pass.  */
 
-int
-ix86_mode_needed (int entity, rtx insn)
+static int
+ix86_avx_u128_mode_needed (rtx insn)
+{
+  rtx pat = PATTERN (insn);
+  rtx arg;
+  enum upper_128bits_state state;
+
+  if (CALL_P (insn))
+    {
+      /* Needed mode is set to AVX_U128_CLEAN if there are
+	 no 256bit modes used in function arguments.  */
+      for (arg = CALL_INSN_FUNCTION_USAGE (insn); arg;
+	   arg = XEXP (arg, 1))
+	{
+	  if (GET_CODE (XEXP (arg, 0)) == USE)
+	    {
+	      rtx reg = XEXP (XEXP (arg, 0), 0);
+
+	      if (reg && REG_P (reg)
+		  && VALID_AVX256_REG_MODE (GET_MODE (reg)))
+		return AVX_U128_ANY;
+	    }
+	}
+
+      return AVX_U128_CLEAN;
+    }
+
+  /* Check if a 256bit AVX register is referenced in stores.  */
+  state = unused;
+  note_stores (pat, check_avx256_stores, &state);
+  if (state == used)
+    return AVX_U128_DIRTY;
+  return AVX_U128_ANY;
+}
+
+/* Return mode that i387 must be switched into
+   prior to the execution of insn.  */
+
+static int
+ix86_i387_mode_needed (int entity, rtx insn)
 {
   enum attr_i387_cw mode;
 
@@ -15392,11 +14959,174 @@ ix86_mode_needed (int entity, rtx insn)
   return I387_CW_ANY;
 }
 
+/* Return mode that entity must be switched into
+   prior to the execution of insn.  */
+
+int
+ix86_mode_needed (int entity, rtx insn)
+{
+  switch (entity)
+    {
+    case AVX_U128:
+      return ix86_avx_u128_mode_needed (insn);
+    case I387_TRUNC:
+    case I387_FLOOR:
+    case I387_CEIL:
+    case I387_MASK_PM:
+      return ix86_i387_mode_needed (entity, insn);
+    default:
+      gcc_unreachable ();
+    }
+  return 0;
+}
+
+/* Calculate mode of upper 128bit AVX registers after the insn.  */
+
+static int
+ix86_avx_u128_mode_after (int mode, rtx insn)
+{
+  rtx pat = PATTERN (insn);
+  rtx reg = NULL;
+  int i;
+  enum upper_128bits_state state;
+
+  /* Check for CALL instruction.  */
+  if (CALL_P (insn))
+    {
+      if (GET_CODE (pat) == SET || GET_CODE (pat) == CALL)
+	reg = SET_DEST (pat);
+      else if (GET_CODE (pat) ==  PARALLEL)
+	for (i = XVECLEN (pat, 0) - 1; i >= 0; i--)
+	  {
+	    rtx x = XVECEXP (pat, 0, i);
+	    if (GET_CODE(x) == SET)
+	      reg = SET_DEST (x);
+	  }
+      /* Mode after call is set to AVX_U128_DIRTY if there are
+	 256bit modes used in the function return register.  */
+      if (reg && REG_P (reg) && VALID_AVX256_REG_MODE (GET_MODE (reg)))
+	return AVX_U128_DIRTY;
+      else
+	return AVX_U128_CLEAN;
+    }
+
+  if (is_vzeroupper (pat) || is_vzeroall (pat))
+    return AVX_U128_CLEAN;
+
+  /* Check if a 256bit AVX register is referenced in stores.  */
+  state = unused;
+  note_stores (pat, check_avx256_stores, &state);
+  if (state == used)
+    return AVX_U128_DIRTY;
+
+  return mode;
+}
+
+/* Return the mode that an insn results in.  */
+
+int
+ix86_mode_after (int entity, int mode, rtx insn)
+{
+  switch (entity)
+    {
+    case AVX_U128:
+      return ix86_avx_u128_mode_after (mode, insn);
+    case I387_TRUNC:
+    case I387_FLOOR:
+    case I387_CEIL:
+    case I387_MASK_PM:
+      return mode;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+static int
+ix86_avx_u128_mode_entry (void)
+{
+  tree arg;
+
+  /* Entry mode is set to AVX_U128_DIRTY if there are
+     256bit modes used in function arguments.  */
+  for (arg = DECL_ARGUMENTS (current_function_decl); arg;
+       arg = TREE_CHAIN (arg))
+    {
+      rtx reg = DECL_INCOMING_RTL (arg);
+
+      if (reg && REG_P (reg) && VALID_AVX256_REG_MODE (GET_MODE (reg)))
+	return AVX_U128_DIRTY;
+    }
+
+  return AVX_U128_CLEAN;
+}
+
+/* Return a mode that ENTITY is assumed to be
+   switched to at function entry.  */
+
+int
+ix86_mode_entry (int entity)
+{
+  switch (entity)
+    {
+    case AVX_U128:
+      return ix86_avx_u128_mode_entry ();
+    case I387_TRUNC:
+    case I387_FLOOR:
+    case I387_CEIL:
+    case I387_MASK_PM:
+      return I387_CW_ANY;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+static int
+ix86_avx_u128_mode_exit (void)
+{
+  rtx reg = crtl->return_rtx;
+
+  /* Exit mode is set to AVX_U128_DIRTY if there are
+     256bit modes used in the function return register.  */
+  if (reg && REG_P (reg) && VALID_AVX256_REG_MODE (GET_MODE (reg)))
+    return AVX_U128_DIRTY;
+
+  return AVX_U128_CLEAN;
+}
+
+/* Return a mode that ENTITY is assumed to be
+   switched to at function exit.  */
+
+int
+ix86_mode_exit (int entity)
+{
+  switch (entity)
+    {
+    case AVX_U128:
+      return ix86_avx_u128_mode_exit ();
+    case I387_TRUNC:
+    case I387_FLOOR:
+    case I387_CEIL:
+    case I387_MASK_PM:
+      return I387_CW_ANY;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Output code to set upper 128bits of AVX registers to CLEAN state.  */
+
+static void
+ix86_emit_vzeroupper (void)
+{
+  emit_insn (gen_avx_vzeroupper (GEN_INT (9)));
+}
+
 /* Output code to initialize control word copies used by trunc?f?i and
    rounding patterns.  CURRENT_MODE is set to current control word,
    while NEW_MODE is set to new control word.  */
 
-void
+
+static void
 emit_i387_cw_initialization (int mode)
 {
   rtx stored_mode = assign_386_stack_local (HImode, SLOT_CW_STORED);
@@ -15483,6 +15213,39 @@ emit_i387_cw_initialization (int mode)
   emit_move_insn (new_mode, reg);
 }
 
+/* Generate one or more insns to set ENTITY to MODE.  */
+
+void
+ix86_emit_mode_set (int entity, int mode, rtx insn)
+{
+  switch (entity)
+    {
+    case AVX_U128:
+      if (mode == AVX_U128_CLEAN)
+	{
+	  if (insn)
+	    {
+	      rtx pat = PATTERN(insn);
+	      if (!is_vzeroupper(pat) && !is_vzeroall(pat))
+		ix86_emit_vzeroupper ();
+	    }
+	  else
+	    ix86_emit_vzeroupper ();
+	}
+      break;
+    case I387_TRUNC:
+    case I387_FLOOR:
+    case I387_CEIL:
+    case I387_MASK_PM:
+      if (mode != I387_CW_ANY
+	  && mode != I387_CW_UNINITIALIZED)
+	emit_i387_cw_initialization (mode);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+}
+
 /* Output code for INSN to convert a float to a signed int.  OPERANDS
    are the insn operands.  The output may be [HSD]Imode and the input
    operand may be [SDX]Fmode.  */
@@ -23350,30 +23113,6 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
 					  clobbered_registers[i]));
     }
 
-  /* Add UNSPEC_CALL_NEEDS_VZEROUPPER decoration.  */
-  if (TARGET_VZEROUPPER)
-    {
-      int avx256;
-      if (cfun->machine->callee_pass_avx256_p)
-	{
-	  if (cfun->machine->callee_return_avx256_p)
-	    avx256 = callee_return_pass_avx256;
-	  else
-	    avx256 = callee_pass_avx256;
-	}
-      else if (cfun->machine->callee_return_avx256_p)
-	avx256 = callee_return_avx256;
-      else
-	avx256 = call_no_avx256;
-
-      if (reload_completed)
-	emit_insn (gen_avx_vzeroupper (GEN_INT (avx256)));
-      else
-	vec[vec_len++] = gen_rtx_UNSPEC (VOIDmode,
-					 gen_rtvec (1, GEN_INT (avx256)),
-					 UNSPEC_CALL_NEEDS_VZEROUPPER);
-    }
-
   if (vec_len > 1)
     call = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (vec_len, vec));
   call = emit_call_insn (call);
@@ -23383,25 +23122,6 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
   return call;
 }
 
-void
-ix86_split_call_vzeroupper (rtx insn, rtx vzeroupper)
-{
-  rtx pat = PATTERN (insn);
-  rtvec vec = XVEC (pat, 0);
-  int len = GET_NUM_ELEM (vec) - 1;
-
-  /* Strip off the last entry of the parallel.  */
-  gcc_assert (GET_CODE (RTVEC_ELT (vec, len)) == UNSPEC);
-  gcc_assert (XINT (RTVEC_ELT (vec, len), 1) == UNSPEC_CALL_NEEDS_VZEROUPPER);
-  if (len == 1)
-    pat = RTVEC_ELT (vec, 0);
-  else
-    pat = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (len, &RTVEC_ELT (vec, 0)));
-
-  emit_insn (gen_avx_vzeroupper (vzeroupper));
-  emit_call_insn (pat);
-}
-
 /* Output the assembly for a call instruction.  */
 
 const char *
@@ -23482,6 +23202,7 @@ ix86_init_machine_status (void)
   f->use_fast_prologue_epilogue_nregs = -1;
   f->tls_descriptor_call_expanded_p = 0;
   f->call_abi = ix86_abi;
+  f->optimize_mode_switching[AVX_U128] = TARGET_VZEROUPPER;
 
   return f;
 }
@@ -29682,7 +29403,7 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
     {
     case VOID_FTYPE_VOID:
       if (icode == CODE_FOR_avx_vzeroupper)
-	target = GEN_INT (vzeroupper_intrinsic);
+	target = GEN_INT (9);
       emit_insn (GEN_FCN (icode) (target));
       return 0;
     case VOID_FTYPE_UINT64:
@@ -33710,10 +33431,6 @@ ix86_reorg (void)
      with old MDEP_REORGS that are not CFG based.  Recompute it now.  */
   compute_bb_for_insn ();
 
-  /* Run the vzeroupper optimization if needed.  */
-  if (TARGET_VZEROUPPER)
-    move_or_delete_vzeroupper ();
-
   if (optimize && optimize_function_for_speed_p (cfun))
     {
       if (TARGET_PAD_SHORT_FUNCTION)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 5ff82ab..bb8b720 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2127,7 +2127,8 @@ enum ix86_fpcmp_strategy {
 
 enum ix86_entity
 {
-  I387_TRUNC = 0,
+  AVX_U128 = 0,
+  I387_TRUNC,
   I387_FLOOR,
   I387_CEIL,
   I387_MASK_PM,
@@ -2146,6 +2147,13 @@ enum ix86_stack_slot
   MAX_386_STACK_LOCALS
 };
 
+enum avx_u128_state
+{
+  AVX_U128_CLEAN,
+  AVX_U128_DIRTY,
+  AVX_U128_ANY
+};
+
 /* Define this macro if the port needs extra instructions inserted
    for mode switching in an optimizing compilation.  */
 
@@ -2161,16 +2169,34 @@ enum ix86_stack_slot
    refer to the mode-switched entity in question.  */
 
 #define NUM_MODES_FOR_MODE_SWITCHING \
-   { I387_CW_ANY, I387_CW_ANY, I387_CW_ANY, I387_CW_ANY }
+  { AVX_U128_ANY, I387_CW_ANY, I387_CW_ANY, I387_CW_ANY, I387_CW_ANY }
 
 /* ENTITY is an integer specifying a mode-switched entity.  If
    `OPTIMIZE_MODE_SWITCHING' is defined, you must define this macro to
    return an integer value not larger than the corresponding element
    in `NUM_MODES_FOR_MODE_SWITCHING', to denote the mode that ENTITY
-   must be switched into prior to the execution of INSN. */
+   must be switched into prior to the execution of INSN.  */
 
 #define MODE_NEEDED(ENTITY, I) ix86_mode_needed ((ENTITY), (I))
 
+/* If this macro is defined, it is evaluated for every INSN during
+   mode switching.  It determines the mode that an insn results in (if
+   different from the incoming mode).  */
+
+#define MODE_AFTER(ENTITY, MODE, I) ix86_mode_after ((ENTITY), (MODE), (I))
+
+/* If this macro is defined, it is evaluated for every ENTITY that
+   needs mode switching.  It should evaluate to an integer, which is
+   a mode that ENTITY is assumed to be switched to at function entry.  */
+
+#define MODE_ENTRY(ENTITY) ix86_mode_entry (ENTITY)
+
+/* If this macro is defined, it is evaluated for every ENTITY that
+   needs mode switching.  It should evaluate to an integer, which is
+   a mode that ENTITY is assumed to be switched to at function exit.  */
+
+#define MODE_EXIT(ENTITY) ix86_mode_exit (ENTITY)
+
 /* This macro specifies the order in which modes for ENTITY are
    processed.  0 is the highest priority.  */
 
@@ -2180,11 +2206,8 @@ enum ix86_stack_slot
    is the set of hard registers live at the point where the insn(s)
    are to be inserted.  */
 
-#define EMIT_MODE_SET(ENTITY, MODE, HARD_REGS_LIVE) 			\
-  ((MODE) != I387_CW_ANY && (MODE) != I387_CW_UNINITIALIZED		\
-   ? emit_i387_cw_initialization (MODE), 0				\
-   : 0)
-
+#define EMIT_MODE_SET(ENTITY, MODE, HARD_REGS_LIVE, INSN) \
+  ix86_emit_mode_set ((ENTITY), (MODE), (INSN))
 \f
 /* Avoid renaming of stack registers, as doing so in combination with
    scheduling just increases amount of live registers at time and in
@@ -2286,21 +2309,6 @@ struct GTY(()) machine_function {
      stack below the return address.  */
   BOOL_BITFIELD static_chain_on_stack : 1;
 
-  /* Nonzero if caller passes 256bit AVX modes.  */
-  BOOL_BITFIELD caller_pass_avx256_p : 1;
-
-  /* Nonzero if caller returns 256bit AVX modes.  */
-  BOOL_BITFIELD caller_return_avx256_p : 1;
-
-  /* Nonzero if the current callee passes 256bit AVX modes.  */
-  BOOL_BITFIELD callee_pass_avx256_p : 1;
-
-  /* Nonzero if the current callee returns 256bit AVX modes.  */
-  BOOL_BITFIELD callee_return_avx256_p : 1;
-
-  /* Nonzero if rescan vzerouppers in the current function is needed.  */
-  BOOL_BITFIELD rescan_vzeroupper_p : 1;
-
   /* During prologue/epilogue generation, the current frame state.
      Otherwise, the frame state at the end of the prologue.  */
   struct machine_frame_state fs;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8d6f211..921da5a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -109,7 +109,6 @@
   UNSPEC_TRUNC_NOOP
   UNSPEC_DIV_ALREADY_SPLIT
   UNSPEC_MS_TO_SYSV_CALL
-  UNSPEC_CALL_NEEDS_VZEROUPPER
   UNSPEC_PAUSE
   UNSPEC_LEA_ADDR
   UNSPEC_XBEGIN_ABORT
@@ -11488,18 +11487,6 @@
   DONE;
 })
 
-(define_insn_and_split "*call_vzeroupper"
-  [(call (mem:QI (match_operand:W 0 "call_insn_operand" "<c>zw"))
-	 (match_operand 1))
-   (unspec [(match_operand 2 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && !SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[2]); DONE;"
-  [(set_attr "type" "call")])
-
 (define_insn "*call"
   [(call (mem:QI (match_operand:W 0 "call_insn_operand" "<c>zw"))
 	 (match_operand 1))]
@@ -11507,31 +11494,6 @@
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])
 
-(define_insn_and_split "*call_rex64_ms_sysv_vzeroupper"
-  [(call (mem:QI (match_operand:DI 0 "call_insn_operand" "rzw"))
-	 (match_operand 1))
-   (unspec [(const_int 0)] UNSPEC_MS_TO_SYSV_CALL)
-   (clobber (reg:TI XMM6_REG))
-   (clobber (reg:TI XMM7_REG))
-   (clobber (reg:TI XMM8_REG))
-   (clobber (reg:TI XMM9_REG))
-   (clobber (reg:TI XMM10_REG))
-   (clobber (reg:TI XMM11_REG))
-   (clobber (reg:TI XMM12_REG))
-   (clobber (reg:TI XMM13_REG))
-   (clobber (reg:TI XMM14_REG))
-   (clobber (reg:TI XMM15_REG))
-   (clobber (reg:DI SI_REG))
-   (clobber (reg:DI DI_REG))
-   (unspec [(match_operand 2 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && TARGET_64BIT && !SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[2]); DONE;"
-  [(set_attr "type" "call")])
-
 (define_insn "*call_rex64_ms_sysv"
   [(call (mem:QI (match_operand:DI 0 "call_insn_operand" "rzw"))
 	 (match_operand 1))
@@ -11552,18 +11514,6 @@
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])
 
-(define_insn_and_split "*sibcall_vzeroupper"
-  [(call (mem:QI (match_operand:W 0 "sibcall_insn_operand" "Uz"))
-	 (match_operand 1))
-   (unspec [(match_operand 2 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[2]); DONE;"
-  [(set_attr "type" "call")])
-
 (define_insn "*sibcall"
   [(call (mem:QI (match_operand:W 0 "sibcall_insn_operand" "Uz"))
 	 (match_operand 1))]
@@ -11584,21 +11534,6 @@
   DONE;
 })
 
-(define_insn_and_split "*call_pop_vzeroupper"
-  [(call (mem:QI (match_operand:SI 0 "call_insn_operand" "lzm"))
-	 (match_operand 1))
-   (set (reg:SI SP_REG)
-	(plus:SI (reg:SI SP_REG)
-		 (match_operand:SI 2 "immediate_operand" "i")))
-   (unspec [(match_operand 3 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && !TARGET_64BIT && !SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[3]); DONE;"
-  [(set_attr "type" "call")])
-
 (define_insn "*call_pop"
   [(call (mem:QI (match_operand:SI 0 "call_insn_operand" "lzm"))
 	 (match_operand 1))
@@ -11609,21 +11544,6 @@
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])
 
-(define_insn_and_split "*sibcall_pop_vzeroupper"
-  [(call (mem:QI (match_operand:SI 0 "sibcall_insn_operand" "Uz"))
-	 (match_operand 1))
-   (set (reg:SI SP_REG)
-	(plus:SI (reg:SI SP_REG)
-		 (match_operand:SI 2 "immediate_operand" "i")))
-   (unspec [(match_operand 3 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && !TARGET_64BIT && SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[3]); DONE;"
-  [(set_attr "type" "call")])
-
 (define_insn "*sibcall_pop"
   [(call (mem:QI (match_operand:SI 0 "sibcall_insn_operand" "Uz"))
 	 (match_operand 1))
@@ -11660,19 +11580,6 @@
   DONE;
 })
 
-(define_insn_and_split "*call_value_vzeroupper"
-  [(set (match_operand 0)
-	(call (mem:QI (match_operand:W 1 "call_insn_operand" "<c>zw"))
-	      (match_operand 2)))
-   (unspec [(match_operand 3 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && !SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[3]); DONE;"
-  [(set_attr "type" "callv")])
-
 (define_insn "*call_value"
   [(set (match_operand 0)
 	(call (mem:QI (match_operand:W 1 "call_insn_operand" "<c>zw"))
@@ -11681,19 +11588,6 @@
   "* return ix86_output_call_insn (insn, operands[1]);"
   [(set_attr "type" "callv")])
 
-(define_insn_and_split "*sibcall_value_vzeroupper"
-  [(set (match_operand 0)
-	(call (mem:QI (match_operand:W 1 "sibcall_insn_operand" "Uz"))
-	      (match_operand 2)))
-   (unspec [(match_operand 3 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[3]); DONE;"
-  [(set_attr "type" "callv")])
-
 (define_insn "*sibcall_value"
   [(set (match_operand 0)
 	(call (mem:QI (match_operand:W 1 "sibcall_insn_operand" "Uz"))
@@ -11702,32 +11596,6 @@
   "* return ix86_output_call_insn (insn, operands[1]);"
   [(set_attr "type" "callv")])
 
-(define_insn_and_split "*call_value_rex64_ms_sysv_vzeroupper"
-  [(set (match_operand 0)
-	(call (mem:QI (match_operand:DI 1 "call_insn_operand" "rzw"))
-	      (match_operand 2)))
-   (unspec [(const_int 0)] UNSPEC_MS_TO_SYSV_CALL)
-   (clobber (reg:TI XMM6_REG))
-   (clobber (reg:TI XMM7_REG))
-   (clobber (reg:TI XMM8_REG))
-   (clobber (reg:TI XMM9_REG))
-   (clobber (reg:TI XMM10_REG))
-   (clobber (reg:TI XMM11_REG))
-   (clobber (reg:TI XMM12_REG))
-   (clobber (reg:TI XMM13_REG))
-   (clobber (reg:TI XMM14_REG))
-   (clobber (reg:TI XMM15_REG))
-   (clobber (reg:DI SI_REG))
-   (clobber (reg:DI DI_REG))
-   (unspec [(match_operand 3 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && TARGET_64BIT && !SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[3]); DONE;"
-  [(set_attr "type" "callv")])
-
 (define_insn "*call_value_rex64_ms_sysv"
   [(set (match_operand 0)
 	(call (mem:QI (match_operand:DI 1 "call_insn_operand" "rzw"))
@@ -11763,22 +11631,6 @@
   DONE;
 })
 
-(define_insn_and_split "*call_value_pop_vzeroupper"
-  [(set (match_operand 0)
-	(call (mem:QI (match_operand:SI 1 "call_insn_operand" "lzm"))
-	      (match_operand 2)))
-   (set (reg:SI SP_REG)
-	(plus:SI (reg:SI SP_REG)
-		 (match_operand:SI 3 "immediate_operand" "i")))
-   (unspec [(match_operand 4 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && !TARGET_64BIT && !SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[4]); DONE;"
-  [(set_attr "type" "callv")])
-
 (define_insn "*call_value_pop"
   [(set (match_operand 0)
 	(call (mem:QI (match_operand:SI 1 "call_insn_operand" "lzm"))
@@ -11790,22 +11642,6 @@
   "* return ix86_output_call_insn (insn, operands[1]);"
   [(set_attr "type" "callv")])
 
-(define_insn_and_split "*sibcall_value_pop_vzeroupper"
-  [(set (match_operand 0)
-	(call (mem:QI (match_operand:SI 1 "sibcall_insn_operand" "Uz"))
-	      (match_operand 2)))
-   (set (reg:SI SP_REG)
-	(plus:SI (reg:SI SP_REG)
-		 (match_operand:SI 3 "immediate_operand" "i")))
-   (unspec [(match_operand 4 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && !TARGET_64BIT && SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[4]); DONE;"
-  [(set_attr "type" "callv")])
-
 (define_insn "*sibcall_value_pop"
   [(set (match_operand 0)
 	(call (mem:QI (match_operand:SI 1 "sibcall_insn_operand" "Uz"))
@@ -11907,7 +11743,6 @@
   [(simple_return)]
   "ix86_can_use_return_insn_p ()"
 {
-  ix86_maybe_emit_epilogue_vzeroupper ();
   if (crtl->args.pops_args)
     {
       rtx popc = GEN_INT (crtl->args.pops_args);
@@ -11924,7 +11759,6 @@
   [(simple_return)]
   "!TARGET_SEH"
 {
-  ix86_maybe_emit_epilogue_vzeroupper ();
   if (crtl->args.pops_args)
     {
       rtx popc = GEN_INT (crtl->args.pops_args);
diff --git a/gcc/config/sh/sh.h b/gcc/config/sh/sh.h
index af7fe0b..d24edb8 100644
--- a/gcc/config/sh/sh.h
+++ b/gcc/config/sh/sh.h
@@ -2359,7 +2359,7 @@ extern int current_function_interrupt;
 #define MODE_PRIORITY_TO_MODE(ENTITY, N) \
   ((TARGET_FPU_SINGLE != 0) ^ (N) ? FP_MODE_SINGLE : FP_MODE_DOUBLE)
 
-#define EMIT_MODE_SET(ENTITY, MODE, HARD_REGS_LIVE) \
+#define EMIT_MODE_SET(ENTITY, MODE, HARD_REGS_LIVE, INSN) \
   fpscr_set_from_mem ((MODE), (HARD_REGS_LIVE))
 
 #define MD_CAN_REDIRECT_BRANCH(INSN, SEQ) \
diff --git a/gcc/mode-switching.c b/gcc/mode-switching.c
index 1984a69..e75fb86 100644
--- a/gcc/mode-switching.c
+++ b/gcc/mode-switching.c
@@ -86,14 +86,14 @@ struct bb_info
 /* These bitmaps are used for the LCM algorithm.  */
 
 static sbitmap *antic;
-static sbitmap *transp;
+static sbitmap **transp;
 static sbitmap *comp;
 
 static struct seginfo * new_seginfo (int, rtx, int, HARD_REG_SET);
 static void add_seginfo (struct bb_info *, struct seginfo *);
 static void reg_dies (rtx, HARD_REG_SET *);
 static void reg_becomes_live (rtx, const_rtx, void *);
-static void make_preds_opaque (basic_block, int);
+static void make_preds_opaque (basic_block, int, int);
 \f
 
 /* This function will allocate a new BBINFO structure, initialized
@@ -139,7 +139,7 @@ add_seginfo (struct bb_info *head, struct seginfo *info)
    we are currently handling mode-switching for.  */
 
 static void
-make_preds_opaque (basic_block b, int j)
+make_preds_opaque (basic_block b, int j, int m)
 {
   edge e;
   edge_iterator ei;
@@ -148,11 +148,11 @@ make_preds_opaque (basic_block b, int j)
     {
       basic_block pb = e->src;
 
-      if (e->aux || ! TEST_BIT (transp[pb->index], j))
+      if (e->aux || ! TEST_BIT (transp[m][pb->index], j))
 	continue;
 
-      RESET_BIT (transp[pb->index], j);
-      make_preds_opaque (pb, j);
+      RESET_BIT (transp[m][pb->index], j);
+      make_preds_opaque (pb, j, m);
     }
 }
 
@@ -479,10 +479,14 @@ optimize_mode_switching (void)
   /* Create the bitmap vectors.  */
 
   antic = sbitmap_vector_alloc (last_basic_block, n_entities);
-  transp = sbitmap_vector_alloc (last_basic_block, n_entities);
+  transp = (sbitmap **) xmalloc (max_num_modes * sizeof (sbitmap *));
   comp = sbitmap_vector_alloc (last_basic_block, n_entities);
 
-  sbitmap_vector_ones (transp, last_basic_block);
+  for (i = 0 ; i < max_num_modes; i++)
+    {
+      transp[i] = sbitmap_vector_alloc (last_basic_block, n_entities);
+      sbitmap_vector_ones (transp[i], last_basic_block);
+    }
 
   for (j = n_entities - 1; j >= 0; j--)
     {
@@ -513,7 +517,8 @@ optimize_mode_switching (void)
 	      {
 		ptr = new_seginfo (no_mode, BB_HEAD (bb), bb->index, live_now);
 		add_seginfo (info + bb->index, ptr);
-		RESET_BIT (transp[bb->index], j);
+		for (i = 0 ; i < max_num_modes; i++)
+		  RESET_BIT (transp[i][bb->index], j);
 	      }
 	  }
 
@@ -530,10 +535,16 @@ optimize_mode_switching (void)
 		      last_mode = mode;
 		      ptr = new_seginfo (mode, insn, bb->index, live_now);
 		      add_seginfo (info + bb->index, ptr);
-		      RESET_BIT (transp[bb->index], j);
+		      for (i = 0 ; i < max_num_modes; i++)
+			if (i != mode)
+			  RESET_BIT (transp[i][bb->index], j);
 		    }
 #ifdef MODE_AFTER
 		  last_mode = MODE_AFTER (e, last_mode, insn);
+		  if (last_mode != no_mode)
+		    for (i = 0 ; i < max_num_modes; i++)
+		      if (i != last_mode)
+			RESET_BIT (transp[i][bb->index], j);
 #endif
 		  /* Update LIVE_NOW.  */
 		  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
@@ -569,15 +580,19 @@ optimize_mode_switching (void)
 	       an extra check in make_preds_opaque.  We also
 	       need this to avoid confusing pre_edge_lcm when
 	       antic is cleared but transp and comp are set.  */
-	    RESET_BIT (transp[bb->index], j);
+	    for (i = 0 ; i < max_num_modes; i++)
+	      /*if (i != mode)*/
+		RESET_BIT (transp[i][bb->index], j);
 
 	    /* Insert a fake computing definition of MODE into entry
 	       blocks which compute no mode. This represents the mode on
 	       entry.  */
 	    info[bb->index].computing = mode;
-
-	    if (pre_exit)
-	      info[pre_exit->index].seginfo->mode = MODE_EXIT (e);
+	  }
+	if (pre_exit)
+	  {
+	    int post_mode = MODE_EXIT (e);
+	    info[pre_exit->index].seginfo->mode = post_mode;
 	  }
       }
 #endif /* NORMAL_MODE */
@@ -612,10 +627,28 @@ optimize_mode_switching (void)
 	 placement mode switches to modes with priority I.  */
 
       FOR_EACH_BB (bb)
-	sbitmap_not (kill[bb->index], transp[bb->index]);
-      edge_list = pre_edge_lcm (n_entities, transp, comp, antic,
-				kill, &insert, &del);
+	sbitmap_not (kill[bb->index], transp[i][bb->index]);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  dump_sbitmap_vector(dump_file, "transp","",transp[i],last_basic_block);
+	  dump_sbitmap_vector(dump_file, "comp","",comp,last_basic_block);
+	  dump_sbitmap_vector(dump_file, "antic","",antic,last_basic_block);
+	  dump_sbitmap_vector(dump_file, "kill","",kill,last_basic_block);
+	}
 
+      edge_list = pre_edge_lcm (n_entities, transp[i], comp, antic,
+				kill, &insert, &del);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  for (e = 0 ; e < NUM_EDGES (edge_list) ; e ++)
+	    {
+	      edge eg = INDEX_EDGE (edge_list, e);
+	      fprintf (dump_file, "\n\tE%d: (%d->%d) ",
+		       e, eg->src->index, eg->dest->index);
+	    }
+	  dump_sbitmap_vector(dump_file,"\ninsert","", insert, NUM_EDGES (edge_list));
+	  dump_sbitmap_vector(dump_file, "delete","",del,last_basic_block);
+	}
       for (j = n_entities - 1; j >= 0; j--)
 	{
 	  /* Insert all mode sets that have been inserted by lcm.  */
@@ -649,7 +682,7 @@ optimize_mode_switching (void)
 	      REG_SET_TO_HARD_REG_SET (live_at_edge, df_get_live_out (src_bb));
 
 	      start_sequence ();
-	      EMIT_MODE_SET (entity_map[j], mode, live_at_edge);
+	      EMIT_MODE_SET (entity_map[j], mode, live_at_edge, NULL);
 	      mode_set = get_insns ();
 	      end_sequence ();
 
@@ -667,9 +700,13 @@ optimize_mode_switching (void)
 	  FOR_EACH_BB_REVERSE (bb)
 	    if (TEST_BIT (del[bb->index], j))
 	      {
-		make_preds_opaque (bb, j);
+		struct seginfo *ptr;
+		make_preds_opaque (bb, j, current_mode[j]);
 		/* Cancel the 'deleted' mode set.  */
-		bb_info[j][bb->index].seginfo->mode = no_mode;
+		for (ptr = bb_info[j][bb->index].seginfo;
+		     ptr && ptr->mode == current_mode[j];
+		     ptr = ptr->next)
+		  ptr->mode = no_mode;
 	      }
 	}
 
@@ -687,15 +724,18 @@ optimize_mode_switching (void)
       FOR_EACH_BB_REVERSE (bb)
 	{
 	  struct seginfo *ptr, *next;
+	  int l_mode = no_mode;
 	  for (ptr = bb_info[j][bb->index].seginfo; ptr; ptr = next)
 	    {
 	      next = ptr->next;
-	      if (ptr->mode != no_mode)
+	      if (ptr->mode != no_mode && ptr->mode != l_mode)
 		{
 		  rtx mode_set;
 
+		  l_mode = ptr->mode;
 		  start_sequence ();
-		  EMIT_MODE_SET (entity_map[j], ptr->mode, ptr->regs_live);
+		  EMIT_MODE_SET (entity_map[j], ptr->mode,
+				 ptr->regs_live, ptr->insn_ptr);
 		  mode_set = get_insns ();
 		  end_sequence ();
 
@@ -720,12 +760,14 @@ optimize_mode_switching (void)
   /* Finished. Free up all the things we've allocated.  */
   sbitmap_vector_free (kill);
   sbitmap_vector_free (antic);
-  sbitmap_vector_free (transp);
+  for (i = 0 ; i < max_num_modes; i++)
+    sbitmap_vector_free (transp[i]);
   sbitmap_vector_free (comp);
 
   if (need_commit)
     commit_edge_insertions ();
 
+  free (transp);
 #if defined (MODE_ENTRY) && defined (MODE_EXIT)
   cleanup_cfg (CLEANUP_NO_INSN_DEL);
 #else
diff --git a/gcc/testsuite/ChangeLog.vzu b/gcc/testsuite/ChangeLog.vzu
new file mode 100644
index 0000000..e29a1ef
--- /dev/null
+++ b/gcc/testsuite/ChangeLog.vzu
@@ -0,0 +1,9 @@
+2012-06-25  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
+
+	* gcc.target/i386/avx-vzeroupper-5.c: Changed scan-assembler-times
+	gcc.target/i386/avx-vzeroupper-8.c: Likewise
+	gcc.target/i386/avx-vzeroupper-9.c: Likewise
+	gcc.target/i386/avx-vzeroupper-10.c: Likewise
+	gcc.target/i386/avx-vzeroupper-11.c: Likewise
+	gcc.target/i386/avx-vzeroupper-12.c: Likewise
+
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c
index 667bb17..5007753 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c
@@ -14,4 +14,4 @@ foo ()
   _mm256_zeroupper ();
 }
 
-/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
+/* { dg-final { scan-assembler-times "avx_vzeroupper" 3 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c
index d98ceb9..507f945 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c
@@ -16,4 +16,4 @@ foo ()
 }
 
 /* { dg-final { scan-assembler-times "\\*avx_vzeroall" 1 } } */
-/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
+/* { dg-final { scan-assembler-times "avx_vzeroupper" 3 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-12.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-12.c
index f74ea0c..e694d40 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-12.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-12.c
@@ -16,5 +16,5 @@ foo ()
   _mm256_zeroupper ();
 }
 
-/* { dg-final { scan-assembler-times "avx_vzeroupper" 1 } } */
+/* { dg-final { scan-assembler-times "avx_vzeroupper" 4 } } */
 /* { dg-final { scan-assembler-times "\\*avx_vzeroall" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-5.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-5.c
index 0f54602..ba08978 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-5.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-5.c
@@ -14,4 +14,4 @@ foo ()
   _mm256_zeroupper ();
 }
 
-/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
+/* { dg-final { scan-assembler-times "avx_vzeroupper" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-8.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-8.c
index 0a821c2..6ba9f54 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-8.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-8.c
@@ -13,4 +13,5 @@ foo ()
   _mm256_zeroupper ();
 }
 
-/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
+/* { dg-final { scan-assembler-times "avx_vzeroupper" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-9.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-9.c
index 5aa05b8..974e162 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-9.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-9.c
@@ -15,4 +15,4 @@ foo ()
   _mm256_zeroupper ();
 }
 
-/* { dg-final { scan-assembler-times "avx_vzeroupper" 1 } } */
+/* { dg-final { scan-assembler-times "avx_vzeroupper" 4 } } */

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-08-23 11:38 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-23  9:53 [PATCH] PR47440 - Use LCM for vzeroupper insertion Uros Bizjak
  -- strict thread matches above, loose matches on Subject: below --
2012-08-23 11:38 Uros Bizjak
2012-08-21 13:52 Vladimir Yakovlev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).