public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH] Vzeroupper placement/47440
@ 2012-11-04 13:29 Uros Bizjak
  2012-11-04 17:59 ` Uros Bizjak
       [not found] ` <CAK1BsWpoD4AVB_4+J6snJgs4BF1Jbiw-RrifvZiiAm21qRURew@mail.gmail.com>
  0 siblings, 2 replies; 10+ messages in thread
From: Uros Bizjak @ 2012-11-04 13:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: Vladimir Yakovlev

Hello!

2012-11-04  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>

        * mode-switching.c (create_pre_exit): Added code for
maybe_builtin_apply case.

        * config/i386/i386-protos.h (emit_i387_cw_initialization): Deleted.
        (emit_vzero): Added prototype.
        (ix86_mode_entry): Likewise.
        (ix86_mode_exit): Likewise.
        (ix86_emit_mode_set): Likewise.

        * config/i386/i386.c (VALID_AVX256_REG_OR_OI_MODE): New.
        (typedef struct block_info_def): Deleted.
        (define BLOCK_INFO): Deleted.
        (check_avx256_stores): Added checking for MEM_P.
        (move_or_delete_vzeroupper_2): Deleted.
        (move_or_delete_vzeroupper_1): Deleted.
        (move_or_delete_vzeroupper): Deleted.
        (ix86_maybe_emit_epilogue_vzeroupper): Deleted.
        (function_pass_avx256_p): Deleted.
        (ix86_function_ok_for_sibcall): Deleted disabling sibcall.
        (nit_cumulative_args): Deleted initialization of of avx256 fields of
        cfun->machine.
        (ix86_emit_restore_sse_regs_using_mov): Deleted vzeroupper generation.
        (ix86_expand_epilogue): Likewise.
        (is_vzeroupper): New.
        (is_vzeroall): New.
        (ix86_avx_u128_mode_needed): New.
        (ix86_i387_mode_needed): Renamed ix86_mode_needed.
        (ix86_mode_needed): New.
        (ix86_avx_u128_mode_after): New.
        (ix86_mode_after): New.
        (ix86_avx_u128_mode_entry): New.
        (ix86_mode_entry): New.
        (ix86_avx_u128_mode_exit): New.
        (ix86_mode_exit): New.
        (ix86_emit_vzeroupper): New.
        (ix86_emit_mode_set): New.
        (ix86_expand_call): Deleted vzeroupper generation.
        (ix86_split_call_vzeroupper): Deleted.
        (ix86_init_machine_status): Initialzed optimize_mode_switching.
        (ix86_expand_special_args_builtin): Changed.
        (ix86_reorg): Deleted a call of move_or_delete_vzeroupper.

        * config/i386/i386.h (AVX_U128): New.
        (avx_u128_state): New.
        (NUM_MODES_FOR_MODE_SWITCHING): Added AVX_U128_ANY.
        (MODE_AFTER): New.
        (MODE_ENTRY): New.
        (MODE_EXIT): New.
        (EMIT_MODE_SET): Changed.
        (machine_function): Deleted avx256 fields.

        * config/i386/i386.md (UNSPEC_CALL_NEEDS_VZEROUPPER): Deleted.
        (define_insn_and_split "*call_vzeroupper"): Deleted.
        (define_insn_and_split "*call_rex64_ms_sysv_vzeroupper"): Deleted.
        (define_insn_and_split "*sibcall_vzeroupper"): Deleted.
        (define_insn_and_split "*call_pop_vzeroupper"): Deleted.
        (define_insn_and_split "*sibcall_pop_vzeroupper"): Deleted.
        (define_insn_and_split "*call_value_vzeroupper"): Deleted.
        (define_insn_and_split "*sibcall_value_vzeroupper"): Deleted.
        (define_insn_and_split "*call_value_rex64_ms_sysv_vzeroupper"): Deleted.
        (define_insn_and_split "*call_value_pop_vzeroupper"): Deleted.
        (define_insn_and_split "*sibcall_value_pop_vzeroupper"): Deleted.
        (define_expand "return"): Deleted vzeroupper emitting.
        (define_expand "simple_return"): Deleted.

2012-11-04  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>

        * gcc.target/i386/avx-vzeroupper-5.c: Changed scan-assembler-times.
        gcc.target/i386/avx-vzeroupper-8.c: Likewise.
        gcc.target/i386/avx-vzeroupper-9.c: Likewise.
        gcc.target/i386/avx-vzeroupper-10.c: Likewise.
        gcc.target/i386/avx-vzeroupper-11.c: Likewise.
        gcc.target/i386/avx-vzeroupper-12.c: Likewise.
        gcc.target/i386/avx-vzeroupper-19.c: Likewis.
        gcc.target/i386/avx-vzeroupper-27.c: New.

Target part (without mode-switching.c change) is OK for mainline, with
a few small changes below:

+#define VALID_AVX256_REG_OR_OI_MODE(m) (VALID_AVX256_REG_MODE (m) ||
(m) == OImode)
 enum upper_128bits_state

Put this definition in i386.h, after VALID_AVX256_REG_MODE.

+static void
+ix86_emit_vzeroupper (void)
+{
+  emit_insn (gen_avx_vzeroupper (GEN_INT (9)));
+}

No need to pass argument to vzeroupper anymore. We have only one
vzeroupper type now, so following definition in sse.md could also be
changed from:

(define_insn "avx_vzeroupper"
  [(unspec_volatile [(match_operand 0 "const_int_operand")]
		    UNSPECV_VZEROUPPER)]

to:

(define_insn "avx_vzeroupper"
  [(unspec_volatile [(const_int 0)]
		    UNSPECV_VZEROUPPER)]

Please call gen_avx_vzeroupper directly, so ix86_emit_vzeroupper
wrapper function can be simply deleted.

+/* Check insn for vzeroupper intrinsic.  */
+
+static bool
+is_vzeroupper (rtx pat)
+{
+  return pat
+	 && GET_CODE (pat) == UNSPEC_VOLATILE
+	 && XINT (pat, 1) == UNSPECV_VZEROUPPER;
+}
+
+/* Check insn for vzeroall intrinsic.  */
+
+static bool
+is_vzeroall (rtx pat)
+{
+  return pat
+	 && GET_CODE (pat) == PARALLEL
+	 && GET_CODE (XVECEXP (pat, 0, 0)) == UNSPEC_VOLATILE
+	 && XINT (XVECEXP (pat, 0, 0), 1) == UNSPECV_VZEROALL;
+}

These should be put in predicates.md. This can be in a follow-up patch.

     case VOID_FTYPE_VOID:
       if (icode == CODE_FOR_avx_vzeroupper)
-	target = GEN_INT (vzeroupper_intrinsic);
+	target = GEN_INT (9);
       emit_insn (GEN_FCN (icode) (target));
       return 0;

Please use:
    case VOID_FTYPE_VOID:
      emit_insn (GEN_FCN (icode) ());
      return 0;

Otherwise other VOID_FTYPE_VOID patterns will get excessive argument.

-/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
+/* { dg-final { scan-assembler-times "avx_vzeroupper" 3 } } */

(... and a couple of similar testsuite changes ...)

These asm scans were put there for a reason. I assume you have looked
at these differences and are correct (this also implies that current
vzeroupper placement code is not optimal or even wrong).

I will split out the mode-switching part and re-post it to mailing
list with an explanation. After this change is approved, please commit
the patch to mainline SVN with requested changes.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Vzeroupper placement/47440
  2012-11-04 13:29 [PATCH] Vzeroupper placement/47440 Uros Bizjak
@ 2012-11-04 17:59 ` Uros Bizjak
       [not found] ` <CAK1BsWpoD4AVB_4+J6snJgs4BF1Jbiw-RrifvZiiAm21qRURew@mail.gmail.com>
  1 sibling, 0 replies; 10+ messages in thread
From: Uros Bizjak @ 2012-11-04 17:59 UTC (permalink / raw)
  To: gcc-patches; +Cc: Vladimir Yakovlev

On Sun, Nov 4, 2012 at 2:29 PM, Uros Bizjak <ubizjak@gmail.com> wrote:

> -/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
> +/* { dg-final { scan-assembler-times "avx_vzeroupper" 3 } } */
>
> (... and a couple of similar testsuite changes ...)
>
> These asm scans were put there for a reason. I assume you have looked
> at these differences and are correct (this also implies that current
> vzeroupper placement code is not optimal or even wrong).

Ah, these extra instructions were inserted with _mm265_* intrinsics.
We decided some time ago, that these should remain, and no attempt to
"optimize" them will be performed. OTOH, automatic insertion won't
emit extra vzeroupper in this case.

So, all is OK.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Vzeroupper placement/47440
       [not found]   ` <CAFULd4Y5zDhMH3h34Lt0O5xNG+xibDJih7q2_ctef7nqSNJcOQ@mail.gmail.com>
@ 2012-11-04 20:28     ` Vladimir Yakovlev
  0 siblings, 0 replies; 10+ messages in thread
From: Vladimir Yakovlev @ 2012-11-04 20:28 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 5404 bytes --]

Here is Changelogs and patch after fixing Uros remarks.

2012-11-04  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>

        * mode-switching.c (create_pre_exit): Added code for
maybe_builtin_apply case.

        * config/i386/i386-protos.h (emit_i387_cw_initialization): Deleted.
        (emit_vzero): Added prototype.
        (ix86_mode_entry): Likewise.
        (ix86_mode_exit): Likewise.
        (ix86_emit_mode_set): Likewise.

        * config/i386/i386.c (typedef struct block_info_def): Deleted.
        (define BLOCK_INFO): Deleted.
        (check_avx256_stores): Added checking for MEM_P.
        (move_or_delete_vzeroupper_2): Deleted.
        (move_or_delete_vzeroupper_1): Deleted.
        (move_or_delete_vzeroupper): Deleted.
        (ix86_maybe_emit_epilogue_vzeroupper): Deleted.
        (function_pass_avx256_p): Deleted.
        (ix86_function_ok_for_sibcall): Deleted disabling sibcall.
        (nit_cumulative_args): Deleted initialization of of avx256 fields of
        cfun->machine.
        (ix86_emit_restore_sse_regs_using_mov): Deleted vzeroupper generation.
        (ix86_expand_epilogue): Likewise.
        (ix86_avx_u128_mode_needed): New.
        (ix86_i387_mode_needed): Renamed ix86_mode_needed.
        (ix86_mode_needed): New.
        (ix86_avx_u128_mode_after): New.
        (ix86_mode_after): New.
        (ix86_avx_u128_mode_entry): New.
        (ix86_mode_entry): New.
        (ix86_avx_u128_mode_exit): New.
        (ix86_mode_exit): New.
        (ix86_emit_mode_set): New.
        (ix86_expand_call): Deleted vzeroupper generation.
        (ix86_split_call_vzeroupper): Deleted.
        (ix86_init_machine_status): Initialzed optimize_mode_switching.
        (ix86_expand_special_args_builtin): Changed.
        (ix86_reorg): Deleted a call of move_or_delete_vzeroupper.

        * config/i386/i386.h  (VALID_AVX256_REG_OR_OI_MODE): New.
        (AVX_U128): New.
        (avx_u128_state): New.
        (NUM_MODES_FOR_MODE_SWITCHING): Added AVX_U128_ANY.
        (MODE_AFTER): New.
        (MODE_ENTRY): New.
        (MODE_EXIT): New.
        (EMIT_MODE_SET): Changed.
        (machine_function): Deleted avx256 fields.

        * config/i386/i386.md (UNSPEC_CALL_NEEDS_VZEROUPPER): Deleted.
        (define_insn_and_split "*call_vzeroupper"): Deleted.
        (define_insn_and_split "*call_rex64_ms_sysv_vzeroupper"): Deleted.
        (define_insn_and_split "*sibcall_vzeroupper"): Deleted.
        (define_insn_and_split "*call_pop_vzeroupper"): Deleted.
        (define_insn_and_split "*sibcall_pop_vzeroupper"): Deleted.
        (define_insn_and_split "*call_value_vzeroupper"): Deleted.
        (define_insn_and_split "*sibcall_value_vzeroupper"): Deleted.
        (define_insn_and_split "*call_value_rex64_ms_sysv_vzeroupper"): Deleted.
        (define_insn_and_split "*call_value_pop_vzeroupper"): Deleted.
        (define_insn_and_split "*sibcall_value_pop_vzeroupper"): Deleted.
        (define_expand "return"): Deleted vzeroupper emitting.
        (define_expand "simple_return"): Deleted.

        * config/i386/predicates.md (vzeroupper_operation): New.

        * config/i386/sse.md (avx_vzeroupper): Changed.

2012-11-04  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>

        * gcc.target/i386/avx-vzeroupper-5.c: Changed scan-assembler-times.
        gcc.target/i386/avx-vzeroupper-8.c: Likewise.
        gcc.target/i386/avx-vzeroupper-9.c: Likewise.
        gcc.target/i386/avx-vzeroupper-10.c: Likewise.
        gcc.target/i386/avx-vzeroupper-11.c: Likewise.
        gcc.target/i386/avx-vzeroupper-12.c: Likewise.
        gcc.target/i386/avx-vzeroupper-19.c: Likewis.
        gcc.target/i386/avx-vzeroupper-27.c: New.

2012/11/4 Uros Bizjak <ubizjak@gmail.com>:
> On Sun, Nov 4, 2012 at 5:18 PM, Vladimir Yakovlev <vbyakovl23@gmail.com> wrote:
>> Thank you for review. I did changes you asked (see attached) with
>> small change: I left argument in 'emit_insn (GEN_FCN (icode)
>> (target));' because 'GEN_FCN (icode)' requeres it.
>
> Yes, you are correct.
>
> -/* Output code to initialize control word copies used by trunc?f?i and
> -   rounding patterns.  CURRENT_MODE is set to current control word,
> -   while NEW_MODE is set to new control word.  */
> -
>
> Please leave this comment...
>
> +#define VALID_AVX256_REG_OR_OI_MODE(m)                                 \
> +  (VALID_AVX256_REG_MODE (m)                                           \
> +   || (m) == OImode)
>
> Please use (MODE) as is case with other predicates. Also, put on one line:
>
> +  (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode)
>
> +;; eturn true if OP is a vzeroupper operation.
>
> Return ...
>
> +  [(unspec_volatile [(const_int 0)]
>                     UNSPECV_VZEROUPPER)]
>
> Please merge these two lines to one line.
>
>> Some comment about tests
>>> -/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
>>> +/* { dg-final { scan-assembler-times "avx_vzeroupper" 3 } } */
>>>
>>> (... and a couple of similar testsuite changes ...)
>>>
>>> These asm scans were put there for a reason. I assume you have looked
>>> at these differences and are correct (this also implies that current
>>> vzeroupper placement code is not optimal or even wrong).
>>>
>>
>> The tests use builtin functions therefore I don't remove them.
>
> Yes, I agree with this approach.
>
> These additional changes are OK, I have no further comments.
>
> Thanks,
> Uros.

[-- Attachment #2: vzu.patch --]
[-- Type: application/octet-stream, Size: 43427 bytes --]

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 96971ae..0d643b1 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -167,8 +167,13 @@ extern bool ix86_secondary_memory_needed (enum reg_class, enum reg_class,
 					  enum machine_mode, int);
 extern bool ix86_cannot_change_mode_class (enum machine_mode,
 					   enum machine_mode, enum reg_class);
+
 extern int ix86_mode_needed (int, rtx);
-extern void emit_i387_cw_initialization (int);
+extern int ix86_mode_after (int, int, rtx);
+extern int ix86_mode_entry (int);
+extern int ix86_mode_exit (int);
+extern void ix86_emit_mode_set (int, int);
+
 extern void x86_order_regs_for_local_alloc (void);
 extern void x86_function_profiler (FILE *, int);
 extern void x86_emit_floatuns (rtx [2]);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 833ef5c..8593102 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -70,48 +70,16 @@ enum upper_128bits_state
   used
 };
 
-typedef struct block_info_def
-{
-  /* State of the upper 128bits of AVX registers at exit.  */
-  enum upper_128bits_state state;
-  /* TRUE if state of the upper 128bits of AVX registers is unchanged
-     in this block.  */
-  bool unchanged;
-  /* TRUE if block has been processed.  */
-  bool processed;
-  /* TRUE if block has been scanned.  */
-  bool scanned;
-  /* Previous state of the upper 128bits of AVX registers at entry.  */
-  enum upper_128bits_state prev;
-} *block_info;
-
-#define BLOCK_INFO(B)   ((block_info) (B)->aux)
-
-enum call_avx256_state
-{
-  /* Callee returns 256bit AVX register.  */
-  callee_return_avx256 = -1,
-  /* Callee returns and passes 256bit AVX register.  */
-  callee_return_pass_avx256,
-  /* Callee passes 256bit AVX register.  */
-  callee_pass_avx256,
-  /* Callee doesn't return nor passe 256bit AVX register, or no
-     256bit AVX register in function return.  */
-  call_no_avx256,
-  /* vzeroupper intrinsic.  */
-  vzeroupper_intrinsic
-};
-
 /* Check if a 256bit AVX register is referenced in stores.   */
 
 static void
 check_avx256_stores (rtx dest, const_rtx set, void *data)
 {
-  if ((REG_P (dest)
-       && VALID_AVX256_REG_MODE (GET_MODE (dest)))
+  if (((REG_P (dest) || MEM_P(dest))
+       && VALID_AVX256_REG_OR_OI_MODE (GET_MODE (dest)))
       || (GET_CODE (set) == SET
-	  && REG_P (SET_SRC (set))
-	  && VALID_AVX256_REG_MODE (GET_MODE (SET_SRC (set)))))
+	  && (REG_P (SET_SRC (set)) || MEM_P (SET_SRC (set)))
+	  && VALID_AVX256_REG_OR_OI_MODE (GET_MODE (SET_SRC (set)))))
     {
       enum upper_128bits_state *state
 	= (enum upper_128bits_state *) data;
@@ -119,377 +87,6 @@ check_avx256_stores (rtx dest, const_rtx set, void *data)
     }
 }
 
-/* Helper function for move_or_delete_vzeroupper_1.  Look for vzeroupper
-   in basic block BB.  Delete it if upper 128bit AVX registers are
-   unused.  If it isn't deleted, move it to just before a jump insn.
-
-   STATE is state of the upper 128bits of AVX registers at entry.  */
-
-static void
-move_or_delete_vzeroupper_2 (basic_block bb,
-			     enum upper_128bits_state state)
-{
-  rtx insn, bb_end;
-  rtx vzeroupper_insn = NULL_RTX;
-  rtx pat;
-  int avx256;
-  bool unchanged;
-
-  if (BLOCK_INFO (bb)->unchanged)
-    {
-      if (dump_file)
-	fprintf (dump_file, " [bb %i] unchanged: upper 128bits: %d\n",
-		 bb->index, state);
-
-      BLOCK_INFO (bb)->state = state;
-      return;
-    }
-
-  if (BLOCK_INFO (bb)->scanned && BLOCK_INFO (bb)->prev == state)
-    {
-      if (dump_file)
-	fprintf (dump_file, " [bb %i] scanned: upper 128bits: %d\n",
-		 bb->index, BLOCK_INFO (bb)->state);
-      return;
-    }
-
-  BLOCK_INFO (bb)->prev = state;
-
-  if (dump_file)
-    fprintf (dump_file, " [bb %i] entry: upper 128bits: %d\n",
-	     bb->index, state);
-
-  unchanged = true;
-
-  /* BB_END changes when it is deleted.  */
-  bb_end = BB_END (bb);
-  insn = BB_HEAD (bb);
-  while (insn != bb_end)
-    {
-      insn = NEXT_INSN (insn);
-
-      if (!NONDEBUG_INSN_P (insn))
-	continue;
-
-      /* Move vzeroupper before jump/call.  */
-      if (JUMP_P (insn) || CALL_P (insn))
-	{
-	  if (!vzeroupper_insn)
-	    continue;
-
-	  if (PREV_INSN (insn) != vzeroupper_insn)
-	    {
-	      if (dump_file)
-		{
-		  fprintf (dump_file, "Move vzeroupper after:\n");
-		  print_rtl_single (dump_file, PREV_INSN (insn));
-		  fprintf (dump_file, "before:\n");
-		  print_rtl_single (dump_file, insn);
-		}
-	      reorder_insns_nobb (vzeroupper_insn, vzeroupper_insn,
-				  PREV_INSN (insn));
-	    }
-	  vzeroupper_insn = NULL_RTX;
-	  continue;
-	}
-
-      pat = PATTERN (insn);
-
-      /* Check insn for vzeroupper intrinsic.  */
-      if (GET_CODE (pat) == UNSPEC_VOLATILE
-	  && XINT (pat, 1) == UNSPECV_VZEROUPPER)
-	{
-	  if (dump_file)
-	    {
-	      /* Found vzeroupper intrinsic.  */
-	      fprintf (dump_file, "Found vzeroupper:\n");
-	      print_rtl_single (dump_file, insn);
-	    }
-	}
-      else
-	{
-	  /* Check insn for vzeroall intrinsic.  */
-	  if (GET_CODE (pat) == PARALLEL
-	      && GET_CODE (XVECEXP (pat, 0, 0)) == UNSPEC_VOLATILE
-	      && XINT (XVECEXP (pat, 0, 0), 1) == UNSPECV_VZEROALL)
-	    {
-	      state = unused;
-	      unchanged = false;
-
-	      /* Delete pending vzeroupper insertion.  */
-	      if (vzeroupper_insn)
-		{
-		  delete_insn (vzeroupper_insn);
-		  vzeroupper_insn = NULL_RTX;
-		}
-	    }
-	  else if (state != used)
-	    {
-	      note_stores (pat, check_avx256_stores, &state);
-	      if (state == used)
-		unchanged = false;
-	    }
-	  continue;
-	}
-
-      /* Process vzeroupper intrinsic.  */
-      avx256 = INTVAL (XVECEXP (pat, 0, 0));
-
-      if (state == unused)
-	{
-	  /* Since the upper 128bits are cleared, callee must not pass
-	     256bit AVX register.  We only need to check if callee
-	     returns 256bit AVX register.  */
-	  if (avx256 == callee_return_avx256)
-	    {
-	      state = used;
-	      unchanged = false;
-	    }
-
-	  /* Remove unnecessary vzeroupper since upper 128bits are
-	     cleared.  */
-	  if (dump_file)
-	    {
-	      fprintf (dump_file, "Delete redundant vzeroupper:\n");
-	      print_rtl_single (dump_file, insn);
-	    }
-	  delete_insn (insn);
-	}
-      else
-	{
-	  /* Set state to UNUSED if callee doesn't return 256bit AVX
-	     register.  */
-	  if (avx256 != callee_return_pass_avx256)
-	    state = unused;
-
-	  if (avx256 == callee_return_pass_avx256
-	      || avx256 == callee_pass_avx256)
-	    {
-	      /* Must remove vzeroupper since callee passes in 256bit
-		 AVX register.  */
-	      if (dump_file)
-		{
-		  fprintf (dump_file, "Delete callee pass vzeroupper:\n");
-		  print_rtl_single (dump_file, insn);
-		}
-	      delete_insn (insn);
-	    }
-	  else
-	    {
-	      vzeroupper_insn = insn;
-	      unchanged = false;
-	    }
-	}
-    }
-
-  BLOCK_INFO (bb)->state = state;
-  BLOCK_INFO (bb)->unchanged = unchanged;
-  BLOCK_INFO (bb)->scanned = true;
-
-  if (dump_file)
-    fprintf (dump_file, " [bb %i] exit: %s: upper 128bits: %d\n",
-	     bb->index, unchanged ? "unchanged" : "changed",
-	     state);
-}
-
-/* Helper function for move_or_delete_vzeroupper.  Process vzeroupper
-   in BLOCK and check its predecessor blocks.  Treat UNKNOWN state
-   as USED if UNKNOWN_IS_UNUSED is true.  Return TRUE if the exit
-   state is changed.  */
-
-static bool
-move_or_delete_vzeroupper_1 (basic_block block, bool unknown_is_unused)
-{
-  edge e;
-  edge_iterator ei;
-  enum upper_128bits_state state, old_state, new_state;
-  bool seen_unknown;
-
-  if (dump_file)
-    fprintf (dump_file, " Process [bb %i]: status: %d\n",
-	     block->index, BLOCK_INFO (block)->processed);
-
-  if (BLOCK_INFO (block)->processed)
-    return false;
-
-  state = unused;
-
-  /* Check all predecessor edges of this block.  */
-  seen_unknown = false;
-  FOR_EACH_EDGE (e, ei, block->preds)
-    {
-      if (e->src == block)
-	continue;
-      switch (BLOCK_INFO (e->src)->state)
-	{
-	case unknown:
-	  if (!unknown_is_unused)
-	    seen_unknown = true;
-	case unused:
-	  break;
-	case used:
-	  state = used;
-	  goto done;
-	}
-    }
-
-  if (seen_unknown)
-    state = unknown;
-
-done:
-  old_state = BLOCK_INFO (block)->state;
-  move_or_delete_vzeroupper_2 (block, state);
-  new_state = BLOCK_INFO (block)->state;
-
-  if (state != unknown || new_state == used)
-    BLOCK_INFO (block)->processed = true;
-
-  /* Need to rescan if the upper 128bits of AVX registers are changed
-     to USED at exit.  */
-  if (new_state != old_state)
-    {
-      if (new_state == used)
-	cfun->machine->rescan_vzeroupper_p = 1;
-      return true;
-    }
-  else
-    return false;
-}
-
-/* Go through the instruction stream looking for vzeroupper.  Delete
-   it if upper 128bit AVX registers are unused.  If it isn't deleted,
-   move it to just before a jump insn.  */
-
-static void
-move_or_delete_vzeroupper (void)
-{
-  edge e;
-  edge_iterator ei;
-  basic_block bb;
-  fibheap_t worklist, pending, fibheap_swap;
-  sbitmap visited, in_worklist, in_pending, sbitmap_swap;
-  int *bb_order;
-  int *rc_order;
-  int i;
-
-  /* Set up block info for each basic block.  */
-  alloc_aux_for_blocks (sizeof (struct block_info_def));
-
-  /* Process outgoing edges of entry point.  */
-  if (dump_file)
-    fprintf (dump_file, "Process outgoing edges of entry point\n");
-
-  FOR_EACH_EDGE (e, ei, ENTRY_BLOCK_PTR->succs)
-    {
-      move_or_delete_vzeroupper_2 (e->dest,
-				   cfun->machine->caller_pass_avx256_p
-				   ? used : unused);
-      BLOCK_INFO (e->dest)->processed = true;
-    }
-
-  /* Compute reverse completion order of depth first search of the CFG
-     so that the data-flow runs faster.  */
-  rc_order = XNEWVEC (int, n_basic_blocks - NUM_FIXED_BLOCKS);
-  bb_order = XNEWVEC (int, last_basic_block);
-  pre_and_rev_post_order_compute (NULL, rc_order, false);
-  for (i = 0; i < n_basic_blocks - NUM_FIXED_BLOCKS; i++)
-    bb_order[rc_order[i]] = i;
-  free (rc_order);
-
-  worklist = fibheap_new ();
-  pending = fibheap_new ();
-  visited = sbitmap_alloc (last_basic_block);
-  in_worklist = sbitmap_alloc (last_basic_block);
-  in_pending = sbitmap_alloc (last_basic_block);
-  bitmap_clear (in_worklist);
-
-  /* Don't check outgoing edges of entry point.  */
-  bitmap_ones (in_pending);
-  FOR_EACH_BB (bb)
-    if (BLOCK_INFO (bb)->processed)
-      bitmap_clear_bit (in_pending, bb->index);
-    else
-      {
-	move_or_delete_vzeroupper_1 (bb, false);
-	fibheap_insert (pending, bb_order[bb->index], bb);
-      }
-
-  if (dump_file)
-    fprintf (dump_file, "Check remaining basic blocks\n");
-
-  while (!fibheap_empty (pending))
-    {
-      fibheap_swap = pending;
-      pending = worklist;
-      worklist = fibheap_swap;
-      sbitmap_swap = in_pending;
-      in_pending = in_worklist;
-      in_worklist = sbitmap_swap;
-
-      bitmap_clear (visited);
-
-      cfun->machine->rescan_vzeroupper_p = 0;
-
-      while (!fibheap_empty (worklist))
-	{
-	  bb = (basic_block) fibheap_extract_min (worklist);
-	  bitmap_clear_bit (in_worklist, bb->index);
-	  gcc_assert (!bitmap_bit_p (visited, bb->index));
-	  if (!bitmap_bit_p (visited, bb->index))
-	    {
-	      edge_iterator ei;
-
-	      bitmap_set_bit (visited, bb->index);
-
-	      if (move_or_delete_vzeroupper_1 (bb, false))
-		FOR_EACH_EDGE (e, ei, bb->succs)
-		  {
-		    if (e->dest == EXIT_BLOCK_PTR
-			|| BLOCK_INFO (e->dest)->processed)
-		      continue;
-
-		    if (bitmap_bit_p (visited, e->dest->index))
-		      {
-			if (!bitmap_bit_p (in_pending, e->dest->index))
-			  {
-			    /* Send E->DEST to next round.  */
-			    bitmap_set_bit (in_pending, e->dest->index);
-			    fibheap_insert (pending,
-					    bb_order[e->dest->index],
-					    e->dest);
-			  }
-		      }
-		    else if (!bitmap_bit_p (in_worklist, e->dest->index))
-		      {
-			/* Add E->DEST to current round.  */
-			bitmap_set_bit (in_worklist, e->dest->index);
-			fibheap_insert (worklist, bb_order[e->dest->index],
-					e->dest);
-		      }
-		  }
-	    }
-	}
-
-      if (!cfun->machine->rescan_vzeroupper_p)
-	break;
-    }
-
-  free (bb_order);
-  fibheap_delete (worklist);
-  fibheap_delete (pending);
-  sbitmap_free (visited);
-  sbitmap_free (in_worklist);
-  sbitmap_free (in_pending);
-
-  if (dump_file)
-    fprintf (dump_file, "Process remaining basic blocks\n");
-
-  FOR_EACH_BB (bb)
-    move_or_delete_vzeroupper_1 (bb, true);
-
-  free_aux_for_blocks ();
-}
-
 static rtx legitimize_dllimport_symbol (rtx, bool);
 
 #ifndef CHECK_STACK_LIMIT
@@ -4123,37 +3720,6 @@ ix86_option_override_internal (bool main_args_p)
       = build_target_option_node ();
 }
 
-/* Return TRUE if VAL is passed in register with 256bit AVX modes.  */
-
-static bool
-function_pass_avx256_p (const_rtx val)
-{
-  if (!val)
-    return false;
-
-  if (REG_P (val) && VALID_AVX256_REG_MODE (GET_MODE (val)))
-    return true;
-
-  if (GET_CODE (val) == PARALLEL)
-    {
-      int i;
-      rtx r;
-
-      for (i = XVECLEN (val, 0) - 1; i >= 0; i--)
-	{
-	  r = XVECEXP (val, 0, i);
-	  if (GET_CODE (r) == EXPR_LIST
-	      && XEXP (r, 0)
-	      && REG_P (XEXP (r, 0))
-	      && (GET_MODE (XEXP (r, 0)) == OImode
-		  || VALID_AVX256_REG_MODE (GET_MODE (XEXP (r, 0)))))
-	    return true;
-	}
-    }
-
-  return false;
-}
-
 /* Implement the TARGET_OPTION_OVERRIDE hook.  */
 
 static void
@@ -5076,15 +4642,6 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
       if (!rtx_equal_p (a, b))
 	return false;
     }
-  else if (VOID_TYPE_P (TREE_TYPE (DECL_RESULT (cfun->decl))))
-    {
-      /* Disable sibcall if we need to generate vzeroupper after
-	 callee returns.  */
-      if (TARGET_VZEROUPPER
-	  && cfun->machine->callee_return_avx256_p
-	  && !cfun->machine->caller_return_avx256_p)
-	return false;
-    }
   else if (!rtx_equal_p (a, b))
     return false;
 
@@ -5864,45 +5421,18 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* Argument info to initialize */
 		      int caller)
 {
   struct cgraph_local_info *i;
-  tree fnret_type;
 
   memset (cum, 0, sizeof (*cum));
 
-  /* Initialize for the current callee.  */
-  if (caller)
-    {
-      cfun->machine->callee_pass_avx256_p = false;
-      cfun->machine->callee_return_avx256_p = false;
-    }
-
   if (fndecl)
     {
       i = cgraph_local_info (fndecl);
       cum->call_abi = ix86_function_abi (fndecl);
-      fnret_type = TREE_TYPE (TREE_TYPE (fndecl));
     }
   else
     {
       i = NULL;
       cum->call_abi = ix86_function_type_abi (fntype);
-      if (fntype)
-	fnret_type = TREE_TYPE (fntype);
-      else
-	fnret_type = NULL;
-    }
-
-  if (TARGET_VZEROUPPER && fnret_type)
-    {
-      rtx fnret_value = ix86_function_value (fnret_type, fntype,
-					     false);
-      if (function_pass_avx256_p (fnret_value))
-	{
-	  /* The return value of this function uses 256bit AVX modes.  */
-	  if (caller)
-	    cfun->machine->callee_return_avx256_p = true;
-	  else
-	    cfun->machine->caller_return_avx256_p = true;
-	}
     }
 
   cum->caller = caller;
@@ -7195,15 +6725,6 @@ ix86_function_arg (cumulative_args_t cum_v, enum machine_mode omode,
   else
     arg = function_arg_32 (cum, mode, omode, type, bytes, words);
 
-  if (TARGET_VZEROUPPER && function_pass_avx256_p (arg))
-    {
-      /* This argument uses 256bit AVX modes.  */
-      if (cum->caller)
-	cfun->machine->callee_pass_avx256_p = true;
-      else
-	cfun->machine->caller_pass_avx256_p = true;
-    }
-
   return arg;
 }
 
@@ -11042,17 +10563,6 @@ ix86_emit_restore_sse_regs_using_mov (HOST_WIDE_INT cfa_offset,
       }
 }
 
-/* Emit vzeroupper if needed.  */
-
-void
-ix86_maybe_emit_epilogue_vzeroupper (void)
-{
-  if (TARGET_VZEROUPPER
-      && !TREE_THIS_VOLATILE (cfun->decl)
-      && !cfun->machine->caller_return_avx256_p)
-    emit_insn (gen_avx_vzeroupper (GEN_INT (call_no_avx256)));
-}
-
 /* Restore function stack, frame, and registers.  */
 
 void
@@ -11354,9 +10864,6 @@ ix86_expand_epilogue (int style)
       return;
     }
 
-  /* Emit vzeroupper if needed.  */
-  ix86_maybe_emit_epilogue_vzeroupper ();
-
   if (crtl->args.pops_args && crtl->args.size)
     {
       rtx popc = GEN_INT (crtl->args.pops_args);
@@ -15472,8 +14979,46 @@ output_387_binary_op (rtx insn, rtx *operands)
 
 /* Return needed mode for entity in optimize_mode_switching pass.  */
 
-int
-ix86_mode_needed (int entity, rtx insn)
+static int
+ix86_avx_u128_mode_needed (rtx insn)
+{
+  rtx pat = PATTERN (insn);
+  rtx arg;
+  enum upper_128bits_state state;
+
+  if (CALL_P (insn))
+    {
+      /* Needed mode is set to AVX_U128_CLEAN if there are
+	 no 256bit modes used in function arguments.  */
+      for (arg = CALL_INSN_FUNCTION_USAGE (insn); arg;
+	   arg = XEXP (arg, 1))
+	{
+	  if (GET_CODE (XEXP (arg, 0)) == USE)
+	    {
+	      rtx reg = XEXP (XEXP (arg, 0), 0);
+
+	      if (reg && REG_P (reg)
+		  && VALID_AVX256_REG_OR_OI_MODE (GET_MODE (reg)))
+		return AVX_U128_ANY;
+	    }
+	}
+
+      return AVX_U128_CLEAN;
+    }
+
+  /* Check if a 256bit AVX register is referenced in stores.  */
+  state = unused;
+  note_stores (pat, check_avx256_stores, &state);
+  if (state == used)
+    return AVX_U128_DIRTY;
+  return AVX_U128_ANY;
+}
+
+/* Return mode that i387 must be switched into
+   prior to the execution of insn.  */
+
+static int
+ix86_i387_mode_needed (int entity, rtx insn)
 {
   enum attr_i387_cw mode;
 
@@ -15522,11 +15067,166 @@ ix86_mode_needed (int entity, rtx insn)
   return I387_CW_ANY;
 }
 
+/* Return mode that entity must be switched into
+   prior to the execution of insn.  */
+
+int
+ix86_mode_needed (int entity, rtx insn)
+{
+  switch (entity)
+    {
+    case AVX_U128:
+      return ix86_avx_u128_mode_needed (insn);
+    case I387_TRUNC:
+    case I387_FLOOR:
+    case I387_CEIL:
+    case I387_MASK_PM:
+      return ix86_i387_mode_needed (entity, insn);
+    default:
+      gcc_unreachable ();
+    }
+  return 0;
+}
+
+/* Calculate mode of upper 128bit AVX registers after the insn.  */
+
+static int
+ix86_avx_u128_mode_after (int mode, rtx insn)
+{
+  rtx pat = PATTERN (insn);
+  rtx reg = NULL;
+  int i;
+  enum upper_128bits_state state;
+
+  /* Check for CALL instruction.  */
+  if (CALL_P (insn))
+    {
+      if (GET_CODE (pat) == SET || GET_CODE (pat) == CALL)
+	reg = SET_DEST (pat);
+      else if (GET_CODE (pat) ==  PARALLEL)
+	for (i = XVECLEN (pat, 0) - 1; i >= 0; i--)
+	  {
+	    rtx x = XVECEXP (pat, 0, i);
+	    if (GET_CODE(x) == SET)
+	      reg = SET_DEST (x);
+	  }
+      /* Mode after call is set to AVX_U128_DIRTY if there are
+	 256bit modes used in the function return register.  */
+      if (reg && REG_P (reg) && VALID_AVX256_REG_OR_OI_MODE (GET_MODE (reg)))
+	return AVX_U128_DIRTY;
+      else
+	return AVX_U128_CLEAN;
+    }
+
+  if (vzeroupper_operation (pat, VOIDmode)
+      || vzeroall_operation (pat, VOIDmode))
+    return AVX_U128_CLEAN;
+
+  /* Check if a 256bit AVX register is referenced in stores.  */
+  state = unused;
+  note_stores (pat, check_avx256_stores, &state);
+  if (state == used)
+    return AVX_U128_DIRTY;
+
+  return mode;
+}
+
+/* Return the mode that an insn results in.  */
+
+int
+ix86_mode_after (int entity, int mode, rtx insn)
+{
+  switch (entity)
+    {
+    case AVX_U128:
+      return ix86_avx_u128_mode_after (mode, insn);
+    case I387_TRUNC:
+    case I387_FLOOR:
+    case I387_CEIL:
+    case I387_MASK_PM:
+      return mode;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+static int
+ix86_avx_u128_mode_entry (void)
+{
+  tree arg;
+
+  /* Entry mode is set to AVX_U128_DIRTY if there are
+     256bit modes used in function arguments.  */
+  for (arg = DECL_ARGUMENTS (current_function_decl); arg;
+       arg = TREE_CHAIN (arg))
+    {
+      rtx reg = DECL_INCOMING_RTL (arg);
+
+      if (reg && REG_P (reg) && VALID_AVX256_REG_OR_OI_MODE (GET_MODE (reg)))
+	return AVX_U128_DIRTY;
+    }
+
+  return AVX_U128_CLEAN;
+}
+
+/* Return a mode that ENTITY is assumed to be
+   switched to at function entry.  */
+
+int
+ix86_mode_entry (int entity)
+{
+  switch (entity)
+    {
+    case AVX_U128:
+      return ix86_avx_u128_mode_entry ();
+    case I387_TRUNC:
+    case I387_FLOOR:
+    case I387_CEIL:
+    case I387_MASK_PM:
+      return I387_CW_ANY;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+static int
+ix86_avx_u128_mode_exit (void)
+{
+  rtx reg = crtl->return_rtx;
+
+  /* Exit mode is set to AVX_U128_DIRTY if there are
+     256bit modes used in the function return register.  */
+  if (reg && REG_P (reg) && VALID_AVX256_REG_OR_OI_MODE (GET_MODE (reg)))
+    return AVX_U128_DIRTY;
+
+  return AVX_U128_CLEAN;
+}
+
+/* Return a mode that ENTITY is assumed to be
+   switched to at function exit.  */
+
+int
+ix86_mode_exit (int entity)
+{
+  switch (entity)
+    {
+    case AVX_U128:
+      return ix86_avx_u128_mode_exit ();
+    case I387_TRUNC:
+    case I387_FLOOR:
+    case I387_CEIL:
+    case I387_MASK_PM:
+      return I387_CW_ANY;
+    default:
+      gcc_unreachable ();
+    }
+}
+
 /* Output code to initialize control word copies used by trunc?f?i and
    rounding patterns.  CURRENT_MODE is set to current control word,
    while NEW_MODE is set to new control word.  */
 
-void
+static void
 emit_i387_cw_initialization (int mode)
 {
   rtx stored_mode = assign_386_stack_local (HImode, SLOT_CW_STORED);
@@ -15613,6 +15313,30 @@ emit_i387_cw_initialization (int mode)
   emit_move_insn (new_mode, reg);
 }
 
+/* Generate one or more insns to set ENTITY to MODE.  */
+
+void
+ix86_emit_mode_set (int entity, int mode)
+{
+  switch (entity)
+    {
+    case AVX_U128:
+      if (mode == AVX_U128_CLEAN)
+	emit_insn (gen_avx_vzeroupper ());
+      break;
+    case I387_TRUNC:
+    case I387_FLOOR:
+    case I387_CEIL:
+    case I387_MASK_PM:
+      if (mode != I387_CW_ANY
+	  && mode != I387_CW_UNINITIALIZED)
+	emit_i387_cw_initialization (mode);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+}
+
 /* Output code for INSN to convert a float to a signed int.  OPERANDS
    are the insn operands.  The output may be [HSD]Imode and the input
    operand may be [SDX]Fmode.  */
@@ -23621,30 +23345,6 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
 					  clobbered_registers[i]));
     }
 
-  /* Add UNSPEC_CALL_NEEDS_VZEROUPPER decoration.  */
-  if (TARGET_VZEROUPPER)
-    {
-      int avx256;
-      if (cfun->machine->callee_pass_avx256_p)
-	{
-	  if (cfun->machine->callee_return_avx256_p)
-	    avx256 = callee_return_pass_avx256;
-	  else
-	    avx256 = callee_pass_avx256;
-	}
-      else if (cfun->machine->callee_return_avx256_p)
-	avx256 = callee_return_avx256;
-      else
-	avx256 = call_no_avx256;
-
-      if (reload_completed)
-	emit_insn (gen_avx_vzeroupper (GEN_INT (avx256)));
-      else
-	vec[vec_len++] = gen_rtx_UNSPEC (VOIDmode,
-					 gen_rtvec (1, GEN_INT (avx256)),
-					 UNSPEC_CALL_NEEDS_VZEROUPPER);
-    }
-
   if (vec_len > 1)
     call = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (vec_len, vec));
   call = emit_call_insn (call);
@@ -23654,25 +23354,6 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
   return call;
 }
 
-void
-ix86_split_call_vzeroupper (rtx insn, rtx vzeroupper)
-{
-  rtx pat = PATTERN (insn);
-  rtvec vec = XVEC (pat, 0);
-  int len = GET_NUM_ELEM (vec) - 1;
-
-  /* Strip off the last entry of the parallel.  */
-  gcc_assert (GET_CODE (RTVEC_ELT (vec, len)) == UNSPEC);
-  gcc_assert (XINT (RTVEC_ELT (vec, len), 1) == UNSPEC_CALL_NEEDS_VZEROUPPER);
-  if (len == 1)
-    pat = RTVEC_ELT (vec, 0);
-  else
-    pat = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (len, &RTVEC_ELT (vec, 0)));
-
-  emit_insn (gen_avx_vzeroupper (vzeroupper));
-  emit_call_insn (pat);
-}
-
 /* Output the assembly for a call instruction.  */
 
 const char *
@@ -23753,6 +23434,7 @@ ix86_init_machine_status (void)
   f->use_fast_prologue_epilogue_nregs = -1;
   f->tls_descriptor_call_expanded_p = 0;
   f->call_abi = ix86_abi;
+  f->optimize_mode_switching[AVX_U128] = TARGET_VZEROUPPER;
 
   return f;
 }
@@ -30187,8 +29869,6 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
   switch ((enum ix86_builtin_func_type) d->flag)
     {
     case VOID_FTYPE_VOID:
-      if (icode == CODE_FOR_avx_vzeroupper)
-	target = GEN_INT (vzeroupper_intrinsic);
       emit_insn (GEN_FCN (icode) (target));
       return 0;
     case VOID_FTYPE_UINT64:
@@ -34422,10 +34102,6 @@ ix86_reorg (void)
      with old MDEP_REORGS that are not CFG based.  Recompute it now.  */
   compute_bb_for_insn ();
 
-  /* Run the vzeroupper optimization if needed.  */
-  if (TARGET_VZEROUPPER)
-    move_or_delete_vzeroupper ();
-
   if (optimize && optimize_function_for_speed_p (cfun))
     {
       if (TARGET_PAD_SHORT_FUNCTION)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 712d00a..67403c5 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1035,6 +1035,9 @@ enum target_cpu_default
    || (MODE) == V4DImode || (MODE) == V2TImode || (MODE) == V8SFmode	\
    || (MODE) == V4DFmode)
 
+#define VALID_AVX256_REG_OR_OI_MODE(MODE)					\
+  (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode)
+
 #define VALID_SSE2_REG_MODE(MODE)					\
   ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode	\
    || (MODE) == V2DImode || (MODE) == DFmode)
@@ -2141,7 +2144,8 @@ enum ix86_fpcmp_strategy {
 
 enum ix86_entity
 {
-  I387_TRUNC = 0,
+  AVX_U128 = 0,
+  I387_TRUNC,
   I387_FLOOR,
   I387_CEIL,
   I387_MASK_PM,
@@ -2160,6 +2164,13 @@ enum ix86_stack_slot
   MAX_386_STACK_LOCALS
 };
 
+enum avx_u128_state
+{
+  AVX_U128_CLEAN,
+  AVX_U128_DIRTY,
+  AVX_U128_ANY
+};
+
 /* Define this macro if the port needs extra instructions inserted
    for mode switching in an optimizing compilation.  */
 
@@ -2175,16 +2186,34 @@ enum ix86_stack_slot
    refer to the mode-switched entity in question.  */
 
 #define NUM_MODES_FOR_MODE_SWITCHING \
-   { I387_CW_ANY, I387_CW_ANY, I387_CW_ANY, I387_CW_ANY }
+  { AVX_U128_ANY, I387_CW_ANY, I387_CW_ANY, I387_CW_ANY, I387_CW_ANY }
 
 /* ENTITY is an integer specifying a mode-switched entity.  If
    `OPTIMIZE_MODE_SWITCHING' is defined, you must define this macro to
    return an integer value not larger than the corresponding element
    in `NUM_MODES_FOR_MODE_SWITCHING', to denote the mode that ENTITY
-   must be switched into prior to the execution of INSN. */
+   must be switched into prior to the execution of INSN.  */
 
 #define MODE_NEEDED(ENTITY, I) ix86_mode_needed ((ENTITY), (I))
 
+/* If this macro is defined, it is evaluated for every INSN during
+   mode switching.  It determines the mode that an insn results in (if
+   different from the incoming mode).  */
+
+#define MODE_AFTER(ENTITY, MODE, I) ix86_mode_after ((ENTITY), (MODE), (I))
+
+/* If this macro is defined, it is evaluated for every ENTITY that
+   needs mode switching.  It should evaluate to an integer, which is
+   a mode that ENTITY is assumed to be switched to at function entry.  */
+
+#define MODE_ENTRY(ENTITY) ix86_mode_entry (ENTITY)
+
+/* If this macro is defined, it is evaluated for every ENTITY that
+   needs mode switching.  It should evaluate to an integer, which is
+   a mode that ENTITY is assumed to be switched to at function exit.  */
+
+#define MODE_EXIT(ENTITY) ix86_mode_exit (ENTITY)
+
 /* This macro specifies the order in which modes for ENTITY are
    processed.  0 is the highest priority.  */
 
@@ -2194,11 +2223,8 @@ enum ix86_stack_slot
    is the set of hard registers live at the point where the insn(s)
    are to be inserted.  */
 
-#define EMIT_MODE_SET(ENTITY, MODE, HARD_REGS_LIVE) 			\
-  ((MODE) != I387_CW_ANY && (MODE) != I387_CW_UNINITIALIZED		\
-   ? emit_i387_cw_initialization (MODE), 0				\
-   : 0)
-
+#define EMIT_MODE_SET(ENTITY, MODE, HARD_REGS_LIVE) \
+  ix86_emit_mode_set ((ENTITY), (MODE))
 \f
 /* Avoid renaming of stack registers, as doing so in combination with
    scheduling just increases amount of live registers at time and in
@@ -2299,21 +2325,6 @@ struct GTY(()) machine_function {
      stack below the return address.  */
   BOOL_BITFIELD static_chain_on_stack : 1;
 
-  /* Nonzero if caller passes 256bit AVX modes.  */
-  BOOL_BITFIELD caller_pass_avx256_p : 1;
-
-  /* Nonzero if caller returns 256bit AVX modes.  */
-  BOOL_BITFIELD caller_return_avx256_p : 1;
-
-  /* Nonzero if the current callee passes 256bit AVX modes.  */
-  BOOL_BITFIELD callee_pass_avx256_p : 1;
-
-  /* Nonzero if the current callee returns 256bit AVX modes.  */
-  BOOL_BITFIELD callee_return_avx256_p : 1;
-
-  /* Nonzero if rescan vzerouppers in the current function is needed.  */
-  BOOL_BITFIELD rescan_vzeroupper_p : 1;
-
   /* During prologue/epilogue generation, the current frame state.
      Otherwise, the frame state at the end of the prologue.  */
   struct machine_frame_state fs;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 61d3ccd..f2d2cd6 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -109,7 +109,6 @@
   UNSPEC_TRUNC_NOOP
   UNSPEC_DIV_ALREADY_SPLIT
   UNSPEC_MS_TO_SYSV_CALL
-  UNSPEC_CALL_NEEDS_VZEROUPPER
   UNSPEC_PAUSE
   UNSPEC_LEA_ADDR
   UNSPEC_XBEGIN_ABORT
@@ -11503,18 +11502,6 @@
   DONE;
 })
 
-(define_insn_and_split "*call_vzeroupper"
-  [(call (mem:QI (match_operand:W 0 "call_insn_operand" "<c>zw"))
-	 (match_operand 1))
-   (unspec [(match_operand 2 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && !SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[2]); DONE;"
-  [(set_attr "type" "call")])
-
 (define_insn "*call"
   [(call (mem:QI (match_operand:W 0 "call_insn_operand" "<c>zw"))
 	 (match_operand 1))]
@@ -11522,31 +11509,6 @@
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])
 
-(define_insn_and_split "*call_rex64_ms_sysv_vzeroupper"
-  [(call (mem:QI (match_operand:DI 0 "call_insn_operand" "rzw"))
-	 (match_operand 1))
-   (unspec [(const_int 0)] UNSPEC_MS_TO_SYSV_CALL)
-   (clobber (reg:TI XMM6_REG))
-   (clobber (reg:TI XMM7_REG))
-   (clobber (reg:TI XMM8_REG))
-   (clobber (reg:TI XMM9_REG))
-   (clobber (reg:TI XMM10_REG))
-   (clobber (reg:TI XMM11_REG))
-   (clobber (reg:TI XMM12_REG))
-   (clobber (reg:TI XMM13_REG))
-   (clobber (reg:TI XMM14_REG))
-   (clobber (reg:TI XMM15_REG))
-   (clobber (reg:DI SI_REG))
-   (clobber (reg:DI DI_REG))
-   (unspec [(match_operand 2 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && TARGET_64BIT && !SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[2]); DONE;"
-  [(set_attr "type" "call")])
-
 (define_insn "*call_rex64_ms_sysv"
   [(call (mem:QI (match_operand:DI 0 "call_insn_operand" "rzw"))
 	 (match_operand 1))
@@ -11567,18 +11529,6 @@
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])
 
-(define_insn_and_split "*sibcall_vzeroupper"
-  [(call (mem:QI (match_operand:W 0 "sibcall_insn_operand" "Uz"))
-	 (match_operand 1))
-   (unspec [(match_operand 2 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[2]); DONE;"
-  [(set_attr "type" "call")])
-
 (define_insn "*sibcall"
   [(call (mem:QI (match_operand:W 0 "sibcall_insn_operand" "Uz"))
 	 (match_operand 1))]
@@ -11599,21 +11549,6 @@
   DONE;
 })
 
-(define_insn_and_split "*call_pop_vzeroupper"
-  [(call (mem:QI (match_operand:SI 0 "call_insn_operand" "lzm"))
-	 (match_operand 1))
-   (set (reg:SI SP_REG)
-	(plus:SI (reg:SI SP_REG)
-		 (match_operand:SI 2 "immediate_operand" "i")))
-   (unspec [(match_operand 3 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && !TARGET_64BIT && !SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[3]); DONE;"
-  [(set_attr "type" "call")])
-
 (define_insn "*call_pop"
   [(call (mem:QI (match_operand:SI 0 "call_insn_operand" "lzm"))
 	 (match_operand 1))
@@ -11624,21 +11559,6 @@
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])
 
-(define_insn_and_split "*sibcall_pop_vzeroupper"
-  [(call (mem:QI (match_operand:SI 0 "sibcall_insn_operand" "Uz"))
-	 (match_operand 1))
-   (set (reg:SI SP_REG)
-	(plus:SI (reg:SI SP_REG)
-		 (match_operand:SI 2 "immediate_operand" "i")))
-   (unspec [(match_operand 3 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && !TARGET_64BIT && SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[3]); DONE;"
-  [(set_attr "type" "call")])
-
 (define_insn "*sibcall_pop"
   [(call (mem:QI (match_operand:SI 0 "sibcall_insn_operand" "Uz"))
 	 (match_operand 1))
@@ -11675,19 +11595,6 @@
   DONE;
 })
 
-(define_insn_and_split "*call_value_vzeroupper"
-  [(set (match_operand 0)
-	(call (mem:QI (match_operand:W 1 "call_insn_operand" "<c>zw"))
-	      (match_operand 2)))
-   (unspec [(match_operand 3 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && !SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[3]); DONE;"
-  [(set_attr "type" "callv")])
-
 (define_insn "*call_value"
   [(set (match_operand 0)
 	(call (mem:QI (match_operand:W 1 "call_insn_operand" "<c>zw"))
@@ -11696,19 +11603,6 @@
   "* return ix86_output_call_insn (insn, operands[1]);"
   [(set_attr "type" "callv")])
 
-(define_insn_and_split "*sibcall_value_vzeroupper"
-  [(set (match_operand 0)
-	(call (mem:QI (match_operand:W 1 "sibcall_insn_operand" "Uz"))
-	      (match_operand 2)))
-   (unspec [(match_operand 3 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[3]); DONE;"
-  [(set_attr "type" "callv")])
-
 (define_insn "*sibcall_value"
   [(set (match_operand 0)
 	(call (mem:QI (match_operand:W 1 "sibcall_insn_operand" "Uz"))
@@ -11717,32 +11611,6 @@
   "* return ix86_output_call_insn (insn, operands[1]);"
   [(set_attr "type" "callv")])
 
-(define_insn_and_split "*call_value_rex64_ms_sysv_vzeroupper"
-  [(set (match_operand 0)
-	(call (mem:QI (match_operand:DI 1 "call_insn_operand" "rzw"))
-	      (match_operand 2)))
-   (unspec [(const_int 0)] UNSPEC_MS_TO_SYSV_CALL)
-   (clobber (reg:TI XMM6_REG))
-   (clobber (reg:TI XMM7_REG))
-   (clobber (reg:TI XMM8_REG))
-   (clobber (reg:TI XMM9_REG))
-   (clobber (reg:TI XMM10_REG))
-   (clobber (reg:TI XMM11_REG))
-   (clobber (reg:TI XMM12_REG))
-   (clobber (reg:TI XMM13_REG))
-   (clobber (reg:TI XMM14_REG))
-   (clobber (reg:TI XMM15_REG))
-   (clobber (reg:DI SI_REG))
-   (clobber (reg:DI DI_REG))
-   (unspec [(match_operand 3 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && TARGET_64BIT && !SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[3]); DONE;"
-  [(set_attr "type" "callv")])
-
 (define_insn "*call_value_rex64_ms_sysv"
   [(set (match_operand 0)
 	(call (mem:QI (match_operand:DI 1 "call_insn_operand" "rzw"))
@@ -11778,22 +11646,6 @@
   DONE;
 })
 
-(define_insn_and_split "*call_value_pop_vzeroupper"
-  [(set (match_operand 0)
-	(call (mem:QI (match_operand:SI 1 "call_insn_operand" "lzm"))
-	      (match_operand 2)))
-   (set (reg:SI SP_REG)
-	(plus:SI (reg:SI SP_REG)
-		 (match_operand:SI 3 "immediate_operand" "i")))
-   (unspec [(match_operand 4 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && !TARGET_64BIT && !SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[4]); DONE;"
-  [(set_attr "type" "callv")])
-
 (define_insn "*call_value_pop"
   [(set (match_operand 0)
 	(call (mem:QI (match_operand:SI 1 "call_insn_operand" "lzm"))
@@ -11805,22 +11657,6 @@
   "* return ix86_output_call_insn (insn, operands[1]);"
   [(set_attr "type" "callv")])
 
-(define_insn_and_split "*sibcall_value_pop_vzeroupper"
-  [(set (match_operand 0)
-	(call (mem:QI (match_operand:SI 1 "sibcall_insn_operand" "Uz"))
-	      (match_operand 2)))
-   (set (reg:SI SP_REG)
-	(plus:SI (reg:SI SP_REG)
-		 (match_operand:SI 3 "immediate_operand" "i")))
-   (unspec [(match_operand 4 "const_int_operand")]
-   	   UNSPEC_CALL_NEEDS_VZEROUPPER)]
-  "TARGET_VZEROUPPER && !TARGET_64BIT && SIBLING_CALL_P (insn)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_call_vzeroupper (curr_insn, operands[4]); DONE;"
-  [(set_attr "type" "callv")])
-
 (define_insn "*sibcall_value_pop"
   [(set (match_operand 0)
 	(call (mem:QI (match_operand:SI 1 "sibcall_insn_operand" "Uz"))
@@ -11922,7 +11758,6 @@
   [(simple_return)]
   "ix86_can_use_return_insn_p ()"
 {
-  ix86_maybe_emit_epilogue_vzeroupper ();
   if (crtl->args.pops_args)
     {
       rtx popc = GEN_INT (crtl->args.pops_args);
@@ -11939,7 +11774,6 @@
   [(simple_return)]
   "!TARGET_SEH"
 {
-  ix86_maybe_emit_epilogue_vzeroupper ();
   if (crtl->args.pops_args)
     {
       rtx popc = GEN_INT (crtl->args.pops_args);
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 4e5c17d..c4337e1 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1225,6 +1225,13 @@
   return true;
 })
 
+;; return true if OP is a vzeroupper operation.
+(define_predicate "vzeroupper_operation"
+  (match_code "unspec_volatile")
+{
+  return XINT (op, 1) == UNSPECV_VZEROUPPER;
+})
+
 ;; Return true if OP is a parallel for a vbroadcast permute.
 
 (define_predicate "avx_vbroadcast_operand"
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 299b0d9..614f81d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -10665,8 +10665,7 @@
 ;; Clear the upper 128bits of AVX registers, equivalent to a NOP
 ;; if the upper 128bits are unused.
 (define_insn "avx_vzeroupper"
-  [(unspec_volatile [(match_operand 0 "const_int_operand")]
-		    UNSPECV_VZEROUPPER)]
+  [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)]
   "TARGET_AVX"
   "vzeroupper"
   [(set_attr "type" "sse")
diff --git a/gcc/mode-switching.c b/gcc/mode-switching.c
index d9f83ca..ce69320 100644
--- a/gcc/mode-switching.c
+++ b/gcc/mode-switching.c
@@ -342,6 +342,16 @@ create_pre_exit (int n_entities, int *entity_map, const int *num_modes)
 		      }
 		    if (j >= 0)
 		      {
+			/* __builtin_return emits a sequence of loads to all
+			   function value registers in their widest mode,
+			   which breaks the assumption on the mode of the
+			   return register load. Allow this situation, so the
+			   final mode switch will be emitted after the load.  */
+			if (maybe_builtin_apply
+			    && targetm.calls.function_value_regno_p
+				(copy_start))
+			  forced_late_switch = 1;
+
 			/* For the SH4, floating point loads depend on fpscr,
 			   thus we might need to put the final mode switch
 			   after the return value copy.  That is still OK,
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c
index 667bb17..5007753 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c
@@ -14,4 +14,4 @@ foo ()
   _mm256_zeroupper ();
 }
 
-/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
+/* { dg-final { scan-assembler-times "avx_vzeroupper" 3 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c
index d98ceb9..507f945 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c
@@ -16,4 +16,4 @@ foo ()
 }
 
 /* { dg-final { scan-assembler-times "\\*avx_vzeroall" 1 } } */
-/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
+/* { dg-final { scan-assembler-times "avx_vzeroupper" 3 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-12.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-12.c
index f74ea0c..e694d40 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-12.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-12.c
@@ -16,5 +16,5 @@ foo ()
   _mm256_zeroupper ();
 }
 
-/* { dg-final { scan-assembler-times "avx_vzeroupper" 1 } } */
+/* { dg-final { scan-assembler-times "avx_vzeroupper" 4 } } */
 /* { dg-final { scan-assembler-times "\\*avx_vzeroall" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-19.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-19.c
index 602de87..ae2f861 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-19.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-19.c
@@ -14,4 +14,4 @@ void feat_s3_cep_dcep (int cepsize_used, float **mfc, float **feat)
     f[i] = w[i] - _w[i];
 }
 
-/* { dg-final { scan-assembler-times "avx_vzeroupper" 1 } } */
+/* { dg-final { scan-assembler-times "avx_vzeroupper" 2 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-27.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-27.c
new file mode 100644
index 0000000..7fa5de4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-27.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx -mtune=generic -dp" } */
+
+typedef struct objc_class *Class;
+typedef struct objc_object
+{
+  Class class_pointer;
+} *id;
+
+typedef const struct objc_selector *SEL;
+typedef void * retval_t;
+typedef void * arglist_t;
+
+extern retval_t __objc_forward (id object, SEL sel, arglist_t args);
+
+double
+__objc_double_forward (id rcv, SEL op, ...)
+{
+  void *args, *res;
+
+  args = __builtin_apply_args ();
+  res = __objc_forward (rcv, op, args);
+  __builtin_return (res);
+}
+
+/* { dg-final { scan-assembler-times "avx_vzeroupper" 2 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-5.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-5.c
index 0f54602..ba08978 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-5.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-5.c
@@ -14,4 +14,4 @@ foo ()
   _mm256_zeroupper ();
 }
 
-/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
+/* { dg-final { scan-assembler-times "avx_vzeroupper" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-8.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-8.c
index 0a821c2..bb370c5 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-8.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-8.c
@@ -13,4 +13,4 @@ foo ()
   _mm256_zeroupper ();
 }
 
-/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
+/* { dg-final { scan-assembler-times "avx_vzeroupper" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-9.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-9.c
index 5aa05b8..974e162 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-9.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-9.c
@@ -15,4 +15,4 @@ foo ()
   _mm256_zeroupper ();
 }
 
-/* { dg-final { scan-assembler-times "avx_vzeroupper" 1 } } */
+/* { dg-final { scan-assembler-times "avx_vzeroupper" 4 } } */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Fwd: [off-list] Re: [PATCH] Vzeroupper placement/47440
       [not found]                   ` <CAK1BsWpL69eRHTD8dzVOm9xtOqtjcr6z3B2tvb_VikWPzKT0Dw@mail.gmail.com>
@ 2012-11-09 10:55                     ` Vladimir Yakovlev
       [not found]                     ` <CAFULd4YaVLCYF=Huw_kDozTBTcZnGUAy7xOcV+VEweOWZ5Cigg@mail.gmail.com>
  1 sibling, 0 replies; 10+ messages in thread
From: Vladimir Yakovlev @ 2012-11-09 10:55 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3624 bytes --]

---------- Forwarded message ----------
From: Vladimir Yakovlev <vbyakovl23@gmail.com>
Date: 2012/11/9
Subject: Re: [off-list] Re: [PATCH] Vzeroupper placement/47440
To: Uros Bizjak <ubizjak@gmail.com>
Копия: "H.J. Lu" <hjl.tools@gmail.com>, Igor Zamyatin <izamyatin@gmail.com>


I did changes that moves vzeroupper insertion after reload

2012-11-09  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>

        * i386/i386-protos.h (ix86_avx256_optimize_mode_switching): New.
        * config/i386/i386.c (ix86_init_machine_status): Deleted
initialization for mode switching.
        * i386/i386.h (OPTIMIZE_MODE_SWITCHING1): New.
        * mode-switching.c (gate_mode_switching1): New.
        (rest_of_handle_mode_switching1): New.
        (pass_mode_switching1): New.
        * passes.c (init_optimization_passes): New pass pass_mode_switching1.
        * tree-pass.h (pass_mode_switching1): New.

But this caused assertion fails in  rtl_verify_flow_info_1 () at cfgrtl.c:2291

      fatal_insn ("flow control insn inside a basic block", x);

The asserts are called by two calls of mode-switching.c:
commit_edge_insertion and cleanup_cfg. After I commented (see below)
459.GemsFDTD benchspec passed. Your opinion of the patch and haw we
can do something with asserts.

Regards,
Vladimir

--- a/gcc/mode-switching.c
+++ b/gcc/mode-switching.c
@@ -747,7 +747,7 @@ optimize_mode_switching (void)
     commit_edge_insertions ();

 #if defined (MODE_ENTRY) && defined (MODE_EXIT)
-  cleanup_cfg (CLEANUP_NO_INSN_DEL);
+  /*cleanup_cfg (CLEANUP_NO_INSN_DEL);*/
 #else
   if (!need_commit && !emitted)
     return 0;
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -1828,7 +1828,7 @@ commit_edge_insertions (void)
   basic_block bb;

 #ifdef ENABLE_CHECKING
-  verify_flow_info ();
+  /*verify_flow_info ();*/
 #endif


2012/11/9 Uros Bizjak <ubizjak@gmail.com>:
> On Thu, Nov 8, 2012 at 6:52 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>
>> Uh, this is spill around call insn, produced by reload.
>>
>> Please compile this code:
>>
>> double test (double a)
>> {
>>   printf ("Hello\n");
>>   return a;
>> }
>>
>> You will get at mode switching:
>>
>>     1 NOTE_INSN_DELETED
>>     4 NOTE_INSN_BASIC_BLOCK
>>     2 r60:DF=xmm0:DF
>>       REG_DEAD: xmm0:DF
>>     3 NOTE_INSN_FUNCTION_BEG
>>     6 di:DI=`*.LC0'
>>     7 call <...>
>>       REG_DEAD: di:DI
>>       REG_UNUSED: ax:SI
>>    12 xmm0:DF=r60:DF
>>       REG_DEAD: r60:DF
>>    15 use xmm0:DF
>>
>> But reload will insert:
>>
>>     1 NOTE_INSN_DELETED
>>     4 NOTE_INSN_BASIC_BLOCK
>>     2 xmm0:DF=xmm0:DF
>>       REG_DEAD: xmm0:DF
>>    18 [sp:DI+0x8]=xmm0:DF
>>       REG_DEAD: xmm0:DF
>>     3 NOTE_INSN_FUNCTION_BEG
>>     6 di:DI=`*.LC0'
>>     7 call <...>
>>       REG_DEAD: di:DI
>>       REG_UNUSED: ax:SI
>>    19 xmm0:DF=[sp:DI+0x8]
>>       REG_DEAD: r62:DF
>>    12 xmm0:DF=xmm0:DF
>>       REG_DEAD: xmm0:DF
>>    15 use xmm0:DF
>>
>> I was not paying attention to this situation.
>
>
> A viable solution to this issue is through machine-reorg function (AKA
> x86_reorg) that would just move vzeroupper to the close proximity to a
> call insn. This would work on non-64bit-MS-ABI targets, where all SSE
> registers are dead at call insn place.
>
> Please note that 64bit-MS-ABI target declares registers xmm6+ as
> call-saved, so they can live over the call. I am not familiar with
> this target, but it looks to me that we have to remove vzeroupper, if
> one or more call-saved SSE registers are live at the call insn place.
>
> Uros.

[-- Attachment #2: prvzu.patch --]
[-- Type: application/octet-stream, Size: 4524 bytes --]

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 0d643b1..33c6e45 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -168,6 +168,7 @@ extern bool ix86_secondary_memory_needed (enum reg_class, enum reg_class,
 extern bool ix86_cannot_change_mode_class (enum machine_mode,
 					   enum machine_mode, enum reg_class);
 
+extern void ix86_avx256_optimize_mode_switching(int (*f)(void));
 extern int ix86_mode_needed (int, rtx);
 extern int ix86_mode_after (int, int, rtx);
 extern int ix86_mode_entry (int);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2386017..b5a495e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -23395,11 +23395,28 @@ ix86_init_machine_status (void)
   f = ggc_alloc_cleared_machine_function ();
   f->use_fast_prologue_epilogue_nregs = -1;
   f->call_abi = ix86_abi;
-  f->optimize_mode_switching[AVX_U128] = TARGET_VZEROUPPER;
 
   return f;
 }
 
+void ix86_avx256_optimize_mode_switching(int (*f)(void))
+{
+  int oms[MAX_386_ENTITIES];
+  int i;
+
+  for (i = 0; i < MAX_386_ENTITIES - 1; i++)
+    {
+      oms [i] = OPTIMIZE_MODE_SWITCHING(i);
+      OPTIMIZE_MODE_SWITCHING(i) = 0;
+    }
+
+  OPTIMIZE_MODE_SWITCHING(AVX_U128) = 1;
+  f ();
+
+  for (i = 0; i < MAX_386_ENTITIES - 1; i++)
+    OPTIMIZE_MODE_SWITCHING(i) = oms [i];
+}
+
 /* Return a MEM corresponding to a stack slot with mode MODE.
    Allocate a new slot if necessary.
 
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 18d476d..b87c903 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2176,6 +2176,10 @@ enum avx_u128_state
 #define OPTIMIZE_MODE_SWITCHING(ENTITY) \
    ix86_optimize_mode_switching[(ENTITY)]
 
+#define OPTIMIZE_MODE_SWITCHING1 \
+  if (TARGET_VZEROUPPER) \
+    ix86_avx256_optimize_mode_switching (optimize_mode_switching);
+
 /* If you define `OPTIMIZE_MODE_SWITCHING', you have to define this as
    initializer for an array of integers.  Each initializer element N
    refers to an entity that needs mode switching, and specifies the
diff --git a/gcc/mode-switching.c b/gcc/mode-switching.c
index 2072628..f6da395 100644
--- a/gcc/mode-switching.c
+++ b/gcc/mode-switching.c
@@ -798,3 +798,45 @@ struct rtl_opt_pass pass_mode_switching =
   0                                     /* todo_flags_finish */
  }
 };
+
+\f
+static bool
+gate_mode_switching1 (void)
+{
+#ifdef OPTIMIZE_MODE_SWITCHING1
+  return true;
+#else
+  return false;
+#endif
+}
+
+static unsigned int
+rest_of_handle_mode_switching1 (void)
+{
+#ifdef OPTIMIZE_MODE_SWITCHING1
+  OPTIMIZE_MODE_SWITCHING1;
+#endif /* OPTIMIZE_MODE_SWITCHING1 */
+  return 0;
+}
+
+
+struct rtl_opt_pass pass_mode_switching1 =
+{
+ {
+  RTL_PASS,
+  "mode_sw1",                            /* name */
+  OPTGROUP_NONE,                        /* optinfo_flags */
+  gate_mode_switching1,                  /* gate */
+  rest_of_handle_mode_switching1,        /* execute */
+  NULL,                                 /* sub */
+  NULL,                                 /* next */
+  0,                                    /* static_pass_number */
+  TV_MODE_SWITCH,                       /* tv_id */
+  0,                                    /* properties_required */
+  0,                                    /* properties_provided */
+  0,                                    /* properties_destroyed */
+  0,                                    /* todo_flags_start */
+  TODO_df_finish | TODO_verify_rtl_sharing |
+  0                                     /* todo_flags_finish */
+ }
+};
diff --git a/gcc/passes.c b/gcc/passes.c
index 67aae52..97d16ef 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -1659,6 +1659,7 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_variable_tracking);
 	  NEXT_PASS (pass_free_cfg);
 	  NEXT_PASS (pass_machine_reorg);
+	  NEXT_PASS (pass_mode_switching1);
 	  NEXT_PASS (pass_cleanup_barriers);
 	  NEXT_PASS (pass_delay_slots);
 	  NEXT_PASS (pass_split_for_shorten_branches);
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 09ec531..198f888 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -439,6 +439,7 @@ extern struct rtl_opt_pass pass_split_all_insns;
 extern struct rtl_opt_pass pass_fast_rtl_byte_dce;
 extern struct rtl_opt_pass pass_lower_subreg2;
 extern struct rtl_opt_pass pass_mode_switching;
+extern struct rtl_opt_pass pass_mode_switching1;
 extern struct rtl_opt_pass pass_sms;
 extern struct rtl_opt_pass pass_sched;
 extern struct rtl_opt_pass pass_ira;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [off-list] Re: [PATCH] Vzeroupper placement/47440
       [not found]                       ` <CAFULd4YyRVY4BzD+csZAqCCmB7v3YEwAaOpNW9QsMXEbCkFw+Q@mail.gmail.com>
@ 2012-11-09 12:18                         ` Vladimir Yakovlev
  2012-11-09 12:29                           ` Uros Bizjak
  0 siblings, 1 reply; 10+ messages in thread
From: Vladimir Yakovlev @ 2012-11-09 12:18 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: H.J. Lu, Igor Zamyatin, gcc-patches

> These assert should tell you what is wrong with the control flow.
> Please look at control_flow_insn_p, which condition returns true.

There is a note after call insn.

(call_insn:TI 908 35558 50534 1681 (call (mem:QI (symbol_ref:DI
("_gfortran_stop_string") [flags 0x41] <function_decl 0x7ffff7eb6200
_gfortran_stop_string>) [0 _gfortran_stop_string S1 A8])
        (const_int 0 [0])) huygens.fppized.f90:190 616 {*call}
     (expr_list:REG_DEAD (reg:DI 5 di)
        (expr_list:REG_DEAD (reg:SI 4 si)
            (expr_list:REG_NORETURN (const_int 0 [0])
                (nil))))
    (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di))
        (expr_list:REG_BR_PRED (use (reg:SI 4 si))
            (nil))))
(note 50534 908 909 1681 (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di)
        (const_int 0 [0]))
    (expr_list:REG_DEP_TRUE (concat:SI (reg:SI 4 si)
            (const_int 0 [0]))
        (nil))) NOTE_INSN_CALL_ARG_LOCATION)

> You shouldn't disable commit_edge_insertions, as there is the function
> where vzerouppers are emitted.

I didn;t disable commit_edge_insertions. I only remove call of assert.

2012/11/9 Uros Bizjak <ubizjak@gmail.com>:
> On Fri, Nov 9, 2012 at 11:45 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Fri, Nov 9, 2012 at 11:21 AM, Vladimir Yakovlev <vbyakovl23@gmail.com> wrote:
>>> I did changes that moves vzeroupper insertion after reload
>>>
>>> 2012-11-09  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
>>>
>>>         * i386/i386-protos.h (ix86_avx256_optimize_mode_switching): New.
>>>         * config/i386/i386.c (ix86_init_machine_status): Deleted
>>> initialization for mode switching.
>>>         * i386/i386.h (OPTIMIZE_MODE_SWITCHING1): New.
>>>         * mode-switching.c (gate_mode_switching1): New.
>>>         (rest_of_handle_mode_switching1): New.
>>>         (pass_mode_switching1): New.
>>>         * passes.c (init_optimization_passes): New pass pass_mode_switching1.
>>>         * tree-pass.h (pass_mode_switching1): New.
>>>
>>> But this caused assertion fails in  rtl_verify_flow_info_1 () at cfgrtl.c:2291
>>>
>>>       fatal_insn ("flow control insn inside a basic block", x);
>>>
>>> The asserts are called by two calls of mode-switching.c:
>>> commit_edge_insertion and cleanup_cfg. After I commented (see below)
>>> 459.GemsFDTD benchspec passed. Your opinion of the patch and haw we
>>> can do something with asserts.
>
> These assert should tell you what is wrong with the control flow.
> Please look at control_flow_insn_p, which condition returns true. You
> shouldn't disable commit_edge_insertions, as there is the function
> where vzerouppers are emitted.
>
> Uros.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [off-list] Re: [PATCH] Vzeroupper placement/47440
  2012-11-09 12:18                         ` Vladimir Yakovlev
@ 2012-11-09 12:29                           ` Uros Bizjak
  2012-11-09 12:36                             ` Jakub Jelinek
  0 siblings, 1 reply; 10+ messages in thread
From: Uros Bizjak @ 2012-11-09 12:29 UTC (permalink / raw)
  To: Vladimir Yakovlev; +Cc: H.J. Lu, Igor Zamyatin, gcc-patches

On Fri, Nov 9, 2012 at 1:18 PM, Vladimir Yakovlev <vbyakovl23@gmail.com> wrote:
>> These assert should tell you what is wrong with the control flow.
>> Please look at control_flow_insn_p, which condition returns true.
>
> There is a note after call insn.
>
> (call_insn:TI 908 35558 50534 1681 (call (mem:QI (symbol_ref:DI
> ("_gfortran_stop_string") [flags 0x41] <function_decl 0x7ffff7eb6200
> _gfortran_stop_string>) [0 _gfortran_stop_string S1 A8])
>         (const_int 0 [0])) huygens.fppized.f90:190 616 {*call}
>      (expr_list:REG_DEAD (reg:DI 5 di)
>         (expr_list:REG_DEAD (reg:SI 4 si)
>             (expr_list:REG_NORETURN (const_int 0 [0])
>                 (nil))))
>     (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di))
>         (expr_list:REG_BR_PRED (use (reg:SI 4 si))
>             (nil))))
> (note 50534 908 909 1681 (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di)
>         (const_int 0 [0]))
>     (expr_list:REG_DEP_TRUE (concat:SI (reg:SI 4 si)
>             (const_int 0 [0]))
>         (nil))) NOTE_INSN_CALL_ARG_LOCATION)
>

Huh, this RTX is ignored:

--cfgrtl.c--
bool
control_flow_insn_p (const_rtx insn)
{
  switch (GET_CODE (insn))
    {
    case NOTE:
    case CODE_LABEL:
    case DEBUG_INSN:
      return false;
--cfgrtl.c--

The problem is noreturn call.

BTW: What happens if the new pass is put before pro_and_epilogue pass?

Uros,

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [off-list] Re: [PATCH] Vzeroupper placement/47440
  2012-11-09 12:29                           ` Uros Bizjak
@ 2012-11-09 12:36                             ` Jakub Jelinek
  2012-11-09 12:48                               ` Uros Bizjak
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2012-11-09 12:36 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Vladimir Yakovlev, H.J. Lu, Igor Zamyatin, gcc-patches

On Fri, Nov 09, 2012 at 01:29:18PM +0100, Uros Bizjak wrote:
> On Fri, Nov 9, 2012 at 1:18 PM, Vladimir Yakovlev <vbyakovl23@gmail.com> wrote:
> >> These assert should tell you what is wrong with the control flow.
> >> Please look at control_flow_insn_p, which condition returns true.
> >
> > There is a note after call insn.
> >
> > (call_insn:TI 908 35558 50534 1681 (call (mem:QI (symbol_ref:DI
> > ("_gfortran_stop_string") [flags 0x41] <function_decl 0x7ffff7eb6200
> > _gfortran_stop_string>) [0 _gfortran_stop_string S1 A8])
> >         (const_int 0 [0])) huygens.fppized.f90:190 616 {*call}
> >      (expr_list:REG_DEAD (reg:DI 5 di)
> >         (expr_list:REG_DEAD (reg:SI 4 si)
> >             (expr_list:REG_NORETURN (const_int 0 [0])
> >                 (nil))))
> >     (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di))
> >         (expr_list:REG_BR_PRED (use (reg:SI 4 si))
> >             (nil))))
> > (note 50534 908 909 1681 (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di)
> >         (const_int 0 [0]))
> >     (expr_list:REG_DEP_TRUE (concat:SI (reg:SI 4 si)
> >             (const_int 0 [0]))
> >         (nil))) NOTE_INSN_CALL_ARG_LOCATION)
> >
> 
> Huh, this RTX is ignored:

NOTE_INSN_CALL_ARG_LOCATION is fine, even after a REG_NORETURN call.
It is just a way how to pass call argument details to dwarf2out.
If you have a pass after var-tracking, you need to skip over it.

	Jakub

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [off-list] Re: [PATCH] Vzeroupper placement/47440
  2012-11-09 12:36                             ` Jakub Jelinek
@ 2012-11-09 12:48                               ` Uros Bizjak
  2012-11-09 13:28                                 ` Uros Bizjak
  0 siblings, 1 reply; 10+ messages in thread
From: Uros Bizjak @ 2012-11-09 12:48 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Vladimir Yakovlev, H.J. Lu, Igor Zamyatin, gcc-patches

On Fri, Nov 9, 2012 at 1:36 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Nov 09, 2012 at 01:29:18PM +0100, Uros Bizjak wrote:
>> On Fri, Nov 9, 2012 at 1:18 PM, Vladimir Yakovlev <vbyakovl23@gmail.com> wrote:
>> >> These assert should tell you what is wrong with the control flow.
>> >> Please look at control_flow_insn_p, which condition returns true.
>> >
>> > There is a note after call insn.
>> >
>> > (call_insn:TI 908 35558 50534 1681 (call (mem:QI (symbol_ref:DI
>> > ("_gfortran_stop_string") [flags 0x41] <function_decl 0x7ffff7eb6200
>> > _gfortran_stop_string>) [0 _gfortran_stop_string S1 A8])
>> >         (const_int 0 [0])) huygens.fppized.f90:190 616 {*call}
>> >      (expr_list:REG_DEAD (reg:DI 5 di)
>> >         (expr_list:REG_DEAD (reg:SI 4 si)
>> >             (expr_list:REG_NORETURN (const_int 0 [0])
>> >                 (nil))))
>> >     (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di))
>> >         (expr_list:REG_BR_PRED (use (reg:SI 4 si))
>> >             (nil))))
>> > (note 50534 908 909 1681 (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di)
>> >         (const_int 0 [0]))
>> >     (expr_list:REG_DEP_TRUE (concat:SI (reg:SI 4 si)
>> >             (const_int 0 [0]))
>> >         (nil))) NOTE_INSN_CALL_ARG_LOCATION)
>> >
>>
>> Huh, this RTX is ignored:
>
> NOTE_INSN_CALL_ARG_LOCATION is fine, even after a REG_NORETURN call.
> It is just a way how to pass call argument details to dwarf2out.
> If you have a pass after var-tracking, you need to skip over it.

Yes, but postreload mode switching should come before pro_and_epilogue
anyway, otherwise create_pre_exit won't work:

--mode-switching.c (222)--
	/* If this function returns a value at the end, we have to
	   insert the final mode switch before the return value copy
	   to its hard register.  */
	if (EDGE_COUNT (EXIT_BLOCK_PTR->preds) == 1
	    && NONJUMP_INSN_P ((last_insn = BB_END (src_bb)))
	    && GET_CODE (PATTERN (last_insn)) == USE
	    && GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG)
--mode-switching.2 (228)--

I believe that this will work OK if the pass is inserted before
prologue generation pass.

Uros.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [off-list] Re: [PATCH] Vzeroupper placement/47440
  2012-11-09 12:48                               ` Uros Bizjak
@ 2012-11-09 13:28                                 ` Uros Bizjak
  2012-11-16  7:50                                   ` Uros Bizjak
  0 siblings, 1 reply; 10+ messages in thread
From: Uros Bizjak @ 2012-11-09 13:28 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Vladimir Yakovlev, H.J. Lu, Igor Zamyatin, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2613 bytes --]

On Fri, Nov 9, 2012 at 1:47 PM, Uros Bizjak <ubizjak@gmail.com> wrote:

>>> >> These assert should tell you what is wrong with the control flow.
>>> >> Please look at control_flow_insn_p, which condition returns true.
>>> >
>>> > There is a note after call insn.
>>> >
>>> > (call_insn:TI 908 35558 50534 1681 (call (mem:QI (symbol_ref:DI
>>> > ("_gfortran_stop_string") [flags 0x41] <function_decl 0x7ffff7eb6200
>>> > _gfortran_stop_string>) [0 _gfortran_stop_string S1 A8])
>>> >         (const_int 0 [0])) huygens.fppized.f90:190 616 {*call}
>>> >      (expr_list:REG_DEAD (reg:DI 5 di)
>>> >         (expr_list:REG_DEAD (reg:SI 4 si)
>>> >             (expr_list:REG_NORETURN (const_int 0 [0])
>>> >                 (nil))))
>>> >     (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di))
>>> >         (expr_list:REG_BR_PRED (use (reg:SI 4 si))
>>> >             (nil))))
>>> > (note 50534 908 909 1681 (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di)
>>> >         (const_int 0 [0]))
>>> >     (expr_list:REG_DEP_TRUE (concat:SI (reg:SI 4 si)
>>> >             (const_int 0 [0]))
>>> >         (nil))) NOTE_INSN_CALL_ARG_LOCATION)
>>> >
>>>
>>> Huh, this RTX is ignored:
>>
>> NOTE_INSN_CALL_ARG_LOCATION is fine, even after a REG_NORETURN call.
>> It is just a way how to pass call argument details to dwarf2out.
>> If you have a pass after var-tracking, you need to skip over it.
>
> Yes, but postreload mode switching should come before pro_and_epilogue
> anyway, otherwise create_pre_exit won't work:
>
> --mode-switching.c (222)--
>         /* If this function returns a value at the end, we have to
>            insert the final mode switch before the return value copy
>            to its hard register.  */
>         if (EDGE_COUNT (EXIT_BLOCK_PTR->preds) == 1
>             && NONJUMP_INSN_P ((last_insn = BB_END (src_bb)))
>             && GET_CODE (PATTERN (last_insn)) == USE
>             && GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG)
> --mode-switching.2 (228)--
>
> I believe that this will work OK if the pass is inserted before
> prologue generation pass.

Finally, having a post-reload mode-switching pass, we can double-check
that there are no live SSE registers at vzeroupper insertion point. As
vzeroupper is only an optimization, we want to play safe and cancel
vzeroupper insertion in this case

There is no degradation for x86_64 gABI targets, since all SSE
registers are call-clobbered. Vzeroupper is conditionally inserted
just before call insn, where all registers are saved to stack and
already dead. The vzeroupper at function exit is not problematic.

Uros.

[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 2314 bytes --]

Index: i386-protos.h
===================================================================
--- i386-protos.h	(revision 193329)
+++ i386-protos.h	(working copy)
@@ -172,8 +172,11 @@ extern int ix86_mode_needed (int, rtx);
 extern int ix86_mode_after (int, int, rtx);
 extern int ix86_mode_entry (int);
 extern int ix86_mode_exit (int);
-extern void ix86_emit_mode_set (int, int);
 
+#ifdef HARD_CONST
+extern void ix86_emit_mode_set (int, int, HARD_REG_SET);
+#endif
+
 extern void x86_order_regs_for_local_alloc (void);
 extern void x86_function_profiler (FILE *, int);
 extern void x86_emit_floatuns (rtx [2]);
Index: i386.c
===================================================================
--- i386.c	(revision 193329)
+++ i386.c	(working copy)
@@ -15277,16 +15284,38 @@ emit_i387_cw_initialization (int mode)
   emit_move_insn (new_mode, reg);
 }
 
+/* Emit vzeroupper.  */
+
+void
+ix86_avx_emit_vzeroupper (HARD_REG_SET regs_live)
+{
+  int i;
+
+  /* Cancel automatic vzeroupper insertion if there are
+     live call-saved SSE registers at the insertion point.  */
+
+  for (i = FIRST_SSE_REG; i <= LAST_SSE_REG; i++)
+    if (!call_used_regs[i] && TEST_HARD_REG_BIT (regs_live, i))
+      return;
+
+  if (TARGET_64BIT)
+    for (i = FIRST_REX_SSE_REG; i <= LAST_REX_SSE_REG; i++)
+      if (!call_used_regs[i] && TEST_HARD_REG_BIT (regs_live, i))
+	return;
+
+  emit_insn (gen_avx_vzeroupper ());
+}
+
 /* Generate one or more insns to set ENTITY to MODE.  */
 
 void
-ix86_emit_mode_set (int entity, int mode)
+ix86_emit_mode_set (int entity, int mode, HARD_REG_SET regs_live)
 {
   switch (entity)
     {
     case AVX_U128:
       if (mode == AVX_U128_CLEAN)
-	emit_insn (gen_avx_vzeroupper ());
+	ix86_avx_emit_vzeroupper (regs_live);
       break;
     case I387_TRUNC:
     case I387_FLOOR:
Index: i386.h
===================================================================
--- i386.h	(revision 193329)
+++ i386.h	(working copy)
@@ -2223,7 +2227,7 @@ enum avx_u128_state
    are to be inserted.  */
 
 #define EMIT_MODE_SET(ENTITY, MODE, HARD_REGS_LIVE) \
-  ix86_emit_mode_set ((ENTITY), (MODE))
+  ix86_emit_mode_set ((ENTITY), (MODE), (HARD_REGS_LIVE))
 \f
 /* Avoid renaming of stack registers, as doing so in combination with
    scheduling just increases amount of live registers at time and in

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Vzeroupper placement/47440
  2012-11-09 13:28                                 ` Uros Bizjak
@ 2012-11-16  7:50                                   ` Uros Bizjak
  0 siblings, 0 replies; 10+ messages in thread
From: Uros Bizjak @ 2012-11-16  7:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: Vladimir Yakovlev, H.J. Lu, Igor Zamyatin, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1095 bytes --]

On Fri, Nov 9, 2012 at 2:28 PM, Uros Bizjak <ubizjak@gmail.com> wrote:

> Finally, having a post-reload mode-switching pass, we can double-check
> that there are no live SSE registers at vzeroupper insertion point. As
> vzeroupper is only an optimization, we want to play safe and cancel
> vzeroupper insertion in this case
>
> There is no degradation for x86_64 gABI targets, since all SSE
> registers are call-clobbered. Vzeroupper is conditionally inserted
> just before call insn, where all registers are saved to stack and
> already dead. The vzeroupper at function exit is not problematic.

Patch was committed to mainline SVN with the following ChangeLog:

2012-11-16  Uros Bizjak  <ubizjak@gmail.com>

	* config/i386/i386-protos.h (ix86_emit_mode_set): Add third argument.
	* config/i386/i386.h (EMIT_MODE_SET): Update.
	* config/i386/i386.c (ix86_avx_emit_vzeroupper): New function.
	(ix86_emit_mode_set) <AVX_U128>: Call ix86_avx_emit_vzeroupper.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32},
configured with --with-arch=corei7-avx --with-tune=corei7-avx.

Uros.

[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 2215 bytes --]

Index: i386-protos.h
===================================================================
--- i386-protos.h	(revision 193549)
+++ i386-protos.h	(working copy)
@@ -172,8 +172,11 @@
 extern int ix86_mode_after (int, int, rtx);
 extern int ix86_mode_entry (int);
 extern int ix86_mode_exit (int);
-extern void ix86_emit_mode_set (int, int);
 
+#ifdef HARD_CONST
+extern void ix86_emit_mode_set (int, int, HARD_REG_SET);
+#endif
+
 extern void x86_order_regs_for_local_alloc (void);
 extern void x86_function_profiler (FILE *, int);
 extern void x86_emit_floatuns (rtx [2]);
Index: i386.c
===================================================================
--- i386.c	(revision 193549)
+++ i386.c	(working copy)
@@ -15477,16 +15477,38 @@
   emit_move_insn (new_mode, reg);
 }
 
+/* Emit vzeroupper.  */
+
+void
+ix86_avx_emit_vzeroupper (HARD_REG_SET regs_live)
+{
+  int i;
+
+  /* Cancel automatic vzeroupper insertion if there are
+     live call-saved SSE registers at the insertion point.  */
+
+  for (i = FIRST_SSE_REG; i <= LAST_SSE_REG; i++)
+    if (TEST_HARD_REG_BIT (regs_live, i) && !call_used_regs[i])
+      return;
+
+  if (TARGET_64BIT)
+    for (i = FIRST_REX_SSE_REG; i <= LAST_REX_SSE_REG; i++)
+      if (TEST_HARD_REG_BIT (regs_live, i) && !call_used_regs[i])
+	return;
+
+  emit_insn (gen_avx_vzeroupper ());
+}
+
 /* Generate one or more insns to set ENTITY to MODE.  */
 
 void
-ix86_emit_mode_set (int entity, int mode)
+ix86_emit_mode_set (int entity, int mode, HARD_REG_SET regs_live)
 {
   switch (entity)
     {
     case AVX_U128:
       if (mode == AVX_U128_CLEAN)
-	emit_insn (gen_avx_vzeroupper ());
+	ix86_avx_emit_vzeroupper (regs_live);
       break;
     case I387_TRUNC:
     case I387_FLOOR:
Index: i386.h
===================================================================
--- i386.h	(revision 193549)
+++ i386.h	(working copy)
@@ -2226,7 +2226,7 @@
    are to be inserted.  */
 
 #define EMIT_MODE_SET(ENTITY, MODE, HARD_REGS_LIVE) \
-  ix86_emit_mode_set ((ENTITY), (MODE))
+  ix86_emit_mode_set ((ENTITY), (MODE), (HARD_REGS_LIVE))
 \f
 /* Avoid renaming of stack registers, as doing so in combination with
    scheduling just increases amount of live registers at time and in

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-11-16  7:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-04 13:29 [PATCH] Vzeroupper placement/47440 Uros Bizjak
2012-11-04 17:59 ` Uros Bizjak
     [not found] ` <CAK1BsWpoD4AVB_4+J6snJgs4BF1Jbiw-RrifvZiiAm21qRURew@mail.gmail.com>
     [not found]   ` <CAFULd4Y5zDhMH3h34Lt0O5xNG+xibDJih7q2_ctef7nqSNJcOQ@mail.gmail.com>
2012-11-04 20:28     ` Vladimir Yakovlev
     [not found]   ` <CAFULd4a8pgcTu-yv=8sm3=KyYxz0SAJW+7+uUmUu9k_YwXxsew@mail.gmail.com>
     [not found]     ` <CAK1BsWrZyWL8WrczwbTm5djhkqZjbBy0p10wb9-_=HJFA0Z8iA@mail.gmail.com>
     [not found]       ` <CAFULd4aP_JMxTnSymMe373PJ3WFcR2Bax3BtksBtf-xVQeH=0Q@mail.gmail.com>
     [not found]         ` <CAK1BsWrsVu4TRW50RW0X7G4RSguSAjhqFPe-tkeXKaurr=sX1A@mail.gmail.com>
     [not found]           ` <CAFULd4b0y6GGZsn1s4-RXc1mAvZGrhGd4YQBhfLgeMWmv2eXPA@mail.gmail.com>
     [not found]             ` <CAK1BsWoL5hsfZprf-a8zxG+Bhe9SwGFwqxHxOw9UX+bbsFD5oQ@mail.gmail.com>
     [not found]               ` <CAFULd4bJXT-nnAk6HCn2C=+jhfiUD-fAe3LK8AYd9jgqQQHvKQ@mail.gmail.com>
     [not found]                 ` <CAFULd4bdxuKbYYS7TcyRfjNukLvJ0d5pOD7zJGAyKEQLPq7z2Q@mail.gmail.com>
     [not found]                   ` <CAK1BsWpL69eRHTD8dzVOm9xtOqtjcr6z3B2tvb_VikWPzKT0Dw@mail.gmail.com>
2012-11-09 10:55                     ` Fwd: [off-list] " Vladimir Yakovlev
     [not found]                     ` <CAFULd4YaVLCYF=Huw_kDozTBTcZnGUAy7xOcV+VEweOWZ5Cigg@mail.gmail.com>
     [not found]                       ` <CAFULd4YyRVY4BzD+csZAqCCmB7v3YEwAaOpNW9QsMXEbCkFw+Q@mail.gmail.com>
2012-11-09 12:18                         ` Vladimir Yakovlev
2012-11-09 12:29                           ` Uros Bizjak
2012-11-09 12:36                             ` Jakub Jelinek
2012-11-09 12:48                               ` Uros Bizjak
2012-11-09 13:28                                 ` Uros Bizjak
2012-11-16  7:50                                   ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).