public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* PATCH: PR target/46519: Missing vzeroupper
@ 2010-12-17 19:45 H.J. Lu
  2010-12-18 19:36 ` Uros Bizjak
  0 siblings, 1 reply; 46+ messages in thread
From: H.J. Lu @ 2010-12-17 19:45 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak

Hi,

This patch fixes another missing vzeroupper.  OK for trunk?

Thanks.


H.J.
---
gcc/

2010-12-17  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* config/i386/i386.c (rescan_move_or_delete_vzeroupper): Stop
	rescanning predecessor edges if one of them uses upper 128bits.

gcc/testsuite/

2010-12-17  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* gfortran.dg/pr46519-2.f90: New.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a5603e6..99ba823 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -349,7 +349,10 @@ rescan_move_or_delete_vzeroupper (basic_block block)
       rescan_move_or_delete_vzeroupper (e->src);
       /* For rescan, UKKNOWN state is treated as UNUSED.  */
       if (BLOCK_INFO (e->src)->state == used)
-	state = used;
+	{
+	  state = used;
+	  break;
+	}
     }
 
   /* Rescan this block only if there are vzerouppers or the upper
diff --git a/gcc/testsuite/gfortran.dg/pr46519-2.f90 b/gcc/testsuite/gfortran.dg/pr46519-2.f90
new file mode 100644
index 0000000..b4d6980
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr46519-2.f90
@@ -0,0 +1,31 @@
+! { dg-do compile { target i?86-*-* x86_64-*-* } }
+! { dg-options "-O3 -mavx -mvzeroupper -mtune=generic -dp" }
+
+  SUBROUTINE func(kts, kte, qrz, qiz, rho)
+  IMPLICIT NONE
+  INTEGER, INTENT(IN)               :: kts, kte
+  REAL,    DIMENSION(kts:kte), INTENT(INOUT) :: qrz, qiz, rho
+  INTEGER                              :: k
+  REAL, DIMENSION(kts:kte)    ::  praci, vtiold
+  REAL                          :: fluxout
+  INTEGER                       :: min_q, max_q, var
+  do k=kts,kte
+    praci(k)=0.0
+  enddo
+  min_q=kte
+  max_q=kts-1
+  DO var=1,20
+    do k=max_q,min_q,-1
+       fluxout=rho(k)*qrz(k)
+    enddo
+    qrz(min_q-1)=qrz(min_q-1)+fluxout
+  ENDDO
+  DO var=1,20
+    do k=kts,kte-1
+      vtiold(k)= (rho(k))**0.16
+    enddo
+  ENDDO
+  STOP
+  END SUBROUTINE func
+
+! { dg-final { scan-assembler "avx_vzeroupper" } }

^ permalink raw reply	[flat|nested] 46+ messages in thread
* PATCH: PR target/46519: Missing vzeroupper
@ 2010-11-19 21:58 H.J. Lu
  2010-11-20  0:24 ` Richard Guenther
  0 siblings, 1 reply; 46+ messages in thread
From: H.J. Lu @ 2010-11-19 21:58 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3259 bytes --]

On Thu, Nov 18, 2010 at 1:11 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Thu, Nov 18, 2010 at 12:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>> Here is the patch for
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46519
>>
>> We have 2 blocks pointing to each others. This patch first scans
>> all blocks without moving vzeroupper so that we can have accurate
>> information about upper 128bits at block entry.
>
> This introduces another insn scanning pass, almost the same as
> existing vzeroupper pass (modulo CALL_INSN/JUMP_INSN handling).
>
> So, if I understand correctly:
> - The patch removes the detection if the function ever touches AVX registers.
> - Due to this, all call_insn RTXes have to be decorated with
> CALL_NEEDS_VZEROUPPER.
> - A new pre-pass is required that scans all functions in order to
> detect functions with live AVX registers at exit, and at the same time
> marks the functions that *do not* use AVX registers.
> - Existing pass then re-scans everything to again detect functions
> with live AVX registers at exit and handles vzeroupper emission.
>
> I don't think this approach is acceptable. Maybe a LCM infrastructure
> can be used to handle this case?
>

Here is the rewrite of the vzeroupper optimization pass.
To avoid circular dependency, it has 2 passes.  It
delays the circular dependency to the second pass
and avoid rescan as much as possible.

I compared the bootstrap times with/wthout this patch
on 64bit Sandy Bridge with multilib and --with-fpmath=avx.
I enabled c,c++,fortran,java,lto,objc

Without patch:

12378.70user 573.02system 41:54.21elapsed 515%CPU

With patch

12580.56user 578.07system 42:25.41elapsed 516%CPU

The overhead is about 1.6%.


-- 
H.J.
---
gcc/

2010-11-19  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* config/i386/i386.c (upper_128bits_state): New.
	(block_info_def): Remove upper_128bits_set and done.  Add state,
	referenced, count, processed and rescanned.
	(check_avx256_stores): Updated.
	(move_or_delete_vzeroupper_2): Updated. Handle deleted BB_END.
	Call note_stores only if needed.  Set referenced and count.
	(move_or_delete_vzeroupper_1): Updated.  Set rescan_vzeroupper_p.
	(rescan_move_or_delete_vzeroupper): New.
	(move_or_delete_vzeroupper):  Process and rescan all all basic
	blocks instead of predecessor blocks of all exit points.
	(use_avx256_p): Removed.
	(init_cumulative_args): Don't set use_avx256_p.
	(ix86_function_arg): Likewise.
	(ix86_expand_move): Likewise.
	(ix86_expand_vector_move_misalign): Likewise.
	(ix86_local_alignment): Likewise.
	(ix86_minimum_alignment): Likewise.
	(ix86_expand_epilogue): Don't check use_avx256_p when generating
	vzeroupper.
	(ix86_expand_call): Likewise.

	* config/i386/i386.h (machine_function): Remove use_vzeroupper_p
	and use_avx256_p.  Add rescan_vzeroupper_p.

gcc/testsuite/

2010-11-17  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* gcc.target/i386/avx-vzeroupper-10.c: Expect no avx_vzeroupper.
	* gcc.target/i386/avx-vzeroupper-11.c: Likewise.

	* gcc.target/i386/avx-vzeroupper-20.c: New.
	* gcc.target/i386/avx-vzeroupper-21.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-22.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-23.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-24.c: Likewise.

[-- Attachment #2: gcc-pr46519-3.patch --]
[-- Type: text/plain, Size: 20416 bytes --]

gcc/

2010-11-19  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* config/i386/i386.c (upper_128bits_state): New.
	(block_info_def): Remove upper_128bits_set and done.  Add state,
	referenced, count, processed and rescanned. 
	(check_avx256_stores): Updated.
	(move_or_delete_vzeroupper_2): Updated. Handle deleted BB_END.
	Call note_stores only if needed.  Set referenced and count.
	(move_or_delete_vzeroupper_1): Updated.  Set rescan_vzeroupper_p.
	(rescan_move_or_delete_vzeroupper): New.
	(move_or_delete_vzeroupper):  Process and rescan all all basic
	blocks instead of predecessor blocks of all exit points.
	(use_avx256_p): Removed.
	(init_cumulative_args): Don't set use_avx256_p.
	(ix86_function_arg): Likewise.
	(ix86_expand_move): Likewise.
	(ix86_expand_vector_move_misalign): Likewise.
	(ix86_local_alignment): Likewise.
	(ix86_minimum_alignment): Likewise.
	(ix86_expand_epilogue): Don't check use_avx256_p when generating
	vzeroupper.
	(ix86_expand_call): Likewise.

	* config/i386/i386.h (machine_function): Remove use_vzeroupper_p
	and use_avx256_p.  Add rescan_vzeroupper_p.

gcc/testsuite/

2010-11-17  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* gcc.target/i386/avx-vzeroupper-10.c: Expect no avx_vzeroupper.
	* gcc.target/i386/avx-vzeroupper-11.c: Likewise.

	* gcc.target/i386/avx-vzeroupper-20.c: New.
	* gcc.target/i386/avx-vzeroupper-21.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-22.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-23.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-24.c: Likewise.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d5f097d..625fa12 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -57,12 +57,25 @@ along with GCC; see the file COPYING3.  If not see
 #include "dwarf2out.h"
 #include "sched-int.h"
 
+enum upper_128bits_state
+{
+  unknown = 0,		/* Unknown.  */
+  unused,		/* Not used or not referenced.  */
+  used			/* Used or referenced.  */
+};
+
 typedef struct block_info_def
 {
-  /* TRUE if the upper 128bits of any AVX registers are live at exit.  */
-  bool upper_128bits_set;
+  /* State of the upper 128bits of any AVX registers at exit.  */
+  enum upper_128bits_state state;
+  /* If the upper 128bits of any AVX registers are referenced.  */
+  enum upper_128bits_state referenced;
+  /* Number of vzerouppers in this block.  */
+  unsigned int count;
   /* TRUE if block has been processed.  */
-  bool done;
+  bool processed;
+  /* TRUE if block has been rescanned.  */
+  bool rescanned;
 } *block_info;
 
 #define BLOCK_INFO(B)   ((block_info) (B)->aux)
@@ -93,8 +106,9 @@ check_avx256_stores (rtx dest, const_rtx set, void *data)
 	  && REG_P (SET_SRC (set))
 	  && VALID_AVX256_REG_MODE (GET_MODE (SET_SRC (set)))))
     {
-      bool *upper_128bits_set = (bool *) data;
-      *upper_128bits_set = true;
+      enum upper_128bits_state *state
+	= (enum upper_128bits_state *) data;
+      *state = used;
     }
 }
 
@@ -106,19 +120,24 @@ check_avx256_stores (rtx dest, const_rtx set, void *data)
    are live at entry.  */
 
 static void
-move_or_delete_vzeroupper_2 (basic_block bb, bool upper_128bits_set)
+move_or_delete_vzeroupper_2 (basic_block bb,
+			     enum upper_128bits_state state)
 {
-  rtx insn;
+  rtx insn, bb_end;
   rtx vzeroupper_insn = NULL_RTX;
   rtx pat;
   int avx256;
+  enum upper_128bits_state referenced = BLOCK_INFO (bb)->referenced;
+  int count = BLOCK_INFO (bb)->count;
 
   if (dump_file)
     fprintf (dump_file, " BB [%i] entry: upper 128bits: %d\n",
-	     bb->index, upper_128bits_set);
+	     bb->index, state);
 
+  /* BB_END changes when it is deleted.  */
+  bb_end = BB_END (bb);
   insn = BB_HEAD (bb);
-  while (insn != BB_END (bb))
+  while (insn != bb_end)
     {
       insn = NEXT_INSN (insn);
 
@@ -167,67 +186,89 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool upper_128bits_set)
 	      && GET_CODE (XVECEXP (pat, 0, 0)) == UNSPEC_VOLATILE
 	      && XINT (XVECEXP (pat, 0, 0), 1) == UNSPECV_VZEROALL)
 	    {
-	      upper_128bits_set = false;
+	      state = unused;
 
 	      /* Delete pending vzeroupper insertion.  */
 	      if (vzeroupper_insn)
 		{
+		  count--;
 		  delete_insn (vzeroupper_insn);
 		  vzeroupper_insn = NULL_RTX;
 		}
 	    }
-	  else if (!upper_128bits_set)
-	    note_stores (pat, check_avx256_stores, &upper_128bits_set);
+	  else if (state != used && referenced != unused)
+	    {
+	      /* No need to call note_stores if the upper 128bits of
+		 AVX registers are never referenced.  */
+	      note_stores (pat, check_avx256_stores, &state);
+	      if (state == used)
+		referenced = used;
+	    }
 	  continue;
 	}
 
       /* Process vzeroupper intrinsic.  */
+      count++;
       avx256 = INTVAL (XVECEXP (pat, 0, 0));
 
-      if (!upper_128bits_set)
+      if (state == unused)
 	{
 	  /* Since the upper 128bits are cleared, callee must not pass
 	     256bit AVX register.  We only need to check if callee
 	     returns 256bit AVX register.  */
-	  upper_128bits_set = (avx256 == callee_return_avx256);
+	  if (avx256 == callee_return_avx256)
+	    state = used;
 
-	  /* Remove unnecessary vzeroupper since
-	     upper 128bits are cleared.  */
+	  /* Remove unnecessary vzeroupper since upper 128bits are
+	     cleared.  */
 	  if (dump_file)
 	    {
 	      fprintf (dump_file, "Delete redundant vzeroupper:\n");
 	      print_rtl_single (dump_file, insn);
 	    }
+	  count--;
 	  delete_insn (insn);
 	}
-      else if (avx256 == callee_return_pass_avx256
-	       || avx256 == callee_pass_avx256)
+      else
 	{
-	  /* Callee passes 256bit AVX register.  Check if callee
-	     returns 256bit AVX register.  */
-	  upper_128bits_set = (avx256 == callee_return_pass_avx256);
+	  /* Set state to UNUSED if callee doesn't return 256bit AVX
+	     register.  */
+	  if (avx256 != callee_return_pass_avx256)
+	    state = unused;
 
-	  /* Must remove vzeroupper since
-	     callee passes in 256bit AVX register.  */
-	  if (dump_file)
+	  if (avx256 == callee_return_pass_avx256
+	      || avx256 == callee_pass_avx256)
 	    {
-	      fprintf (dump_file, "Delete callee pass vzeroupper:\n");
-	      print_rtl_single (dump_file, insn);
+	      /* Must remove vzeroupper since callee passes in 256bit
+		 AVX register.  */
+	      if (dump_file)
+		{
+		  fprintf (dump_file, "Delete callee pass vzeroupper:\n");
+		  print_rtl_single (dump_file, insn);
+		}
+	      count--;
+	      delete_insn (insn);
 	    }
-	  delete_insn (insn);
-	}
-      else
-	{
-	  upper_128bits_set = false;
-	  vzeroupper_insn = insn;
+	  else
+	    vzeroupper_insn = insn;
 	}
     }
 
-  BLOCK_INFO (bb)->upper_128bits_set = upper_128bits_set;
+  BLOCK_INFO (bb)->state = state;
+
+  if (BLOCK_INFO (bb)->referenced == unknown)
+    {
+      /* The upper 128bits of AVX registers are never referenced if
+	 REFERENCED isn't updated.  */
+      if (referenced == unknown)
+	referenced = unused;
+      BLOCK_INFO (bb)->referenced = referenced;
+      BLOCK_INFO (bb)->count = count;
+    }
 
   if (dump_file)
     fprintf (dump_file, " BB [%i] exit: upper 128bits: %d\n",
-	     bb->index, upper_128bits_set);
+	     bb->index, state);
 }
 
 /* Helper function for move_or_delete_vzeroupper.  Process vzeroupper
@@ -238,18 +279,18 @@ move_or_delete_vzeroupper_1 (basic_block block)
 {
   edge e;
   edge_iterator ei;
-  bool upper_128bits_set;
+  enum upper_128bits_state state;
 
   if (dump_file)
     fprintf (dump_file, " Process BB [%i]: status: %d\n",
-	     block->index, BLOCK_INFO (block)->done);
+	     block->index, BLOCK_INFO (block)->processed);
 
-  if (BLOCK_INFO (block)->done)
+  if (BLOCK_INFO (block)->processed)
     return;
 
-  BLOCK_INFO (block)->done = true;
+  BLOCK_INFO (block)->processed = true;
 
-  upper_128bits_set = false;
+  state = unknown;
 
   /* Process all predecessor edges of this block.  */
   FOR_EACH_EDGE (e, ei, block->preds)
@@ -257,12 +298,70 @@ move_or_delete_vzeroupper_1 (basic_block block)
       if (e->src == block)
 	continue;
       move_or_delete_vzeroupper_1 (e->src);
-      if (BLOCK_INFO (e->src)->upper_128bits_set)
-	upper_128bits_set = true;
+      switch (BLOCK_INFO (e->src)->state)
+	{
+	case unknown:
+	  if (state == unused)
+	    state = unknown;
+	  break;
+	case used:
+	  state = used;
+	  break;
+	case unused:
+	  break;
+	}
     }
 
+  /* If state of any predecessor edges is unknown, we need to rescan.  */
+  if (state == unknown)
+    cfun->machine->rescan_vzeroupper_p = 1;
+
   /* Process this block.  */
-  move_or_delete_vzeroupper_2 (block, upper_128bits_set);
+  move_or_delete_vzeroupper_2 (block, state);
+}
+
+/* Helper function for move_or_delete_vzeroupper.  Rescan vzeroupper
+   in BLOCK and its predecessor blocks recursively.  */
+
+static void
+rescan_move_or_delete_vzeroupper (basic_block block)
+{
+  edge e;
+  edge_iterator ei;
+  enum upper_128bits_state state;
+
+  if (dump_file)
+    fprintf (dump_file, " Rescan BB [%i]: status: %d\n",
+	     block->index, BLOCK_INFO (block)->rescanned);
+
+  if (BLOCK_INFO (block)->rescanned)
+    return;
+
+  BLOCK_INFO (block)->rescanned = true;
+
+  state = unused;
+
+  /* Rescan all predecessor edges of this block.  */
+  FOR_EACH_EDGE (e, ei, block->preds)
+    {
+      if (e->src == block)
+	continue;
+      rescan_move_or_delete_vzeroupper (e->src);
+      /* For rescan, UKKNOWN state is treated as UNUSED.  */
+      if (BLOCK_INFO (e->src)->state == used)
+	state = used;
+    }
+
+  /* Rescan this block only if there are vzerouppers or the upper
+     128bits of AVX registers are referenced.  */
+  if (BLOCK_INFO (block)->count == 0
+      && (state == used || BLOCK_INFO (block)->referenced != used))
+    {
+      if (state == used)
+	BLOCK_INFO (block)->state = state;
+    }
+  else
+    move_or_delete_vzeroupper_2 (block, state);
 }
 
 /* Go through the instruction stream looking for vzeroupper.  Delete
@@ -274,6 +373,8 @@ move_or_delete_vzeroupper (void)
 {
   edge e;
   edge_iterator ei;
+  basic_block bb;
+  unsigned int count = 0;
 
   /* Set up block info for each basic block.  */
   alloc_aux_for_blocks (sizeof (struct block_info_def));
@@ -285,16 +386,31 @@ move_or_delete_vzeroupper (void)
   FOR_EACH_EDGE (e, ei, ENTRY_BLOCK_PTR->succs)
     {
       move_or_delete_vzeroupper_2 (e->dest,
-				   cfun->machine->caller_pass_avx256_p);
-      BLOCK_INFO (e->dest)->done = true;
+				   cfun->machine->caller_pass_avx256_p
+				   ? used : unused);
+      BLOCK_INFO (e->dest)->processed = true;
+      BLOCK_INFO (e->dest)->rescanned = true;
     }
 
-  /* Process predecessor blocks of all exit points.  */
+  /* Process all basic blocks.  */
   if (dump_file)
-    fprintf (dump_file, "Process all exit points\n");
+    fprintf (dump_file, "Process all basic blocks\n");
 
-  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds)
-    move_or_delete_vzeroupper_1 (e->src);
+  FOR_EACH_BB (bb)
+    {
+      move_or_delete_vzeroupper_1 (bb);
+      count += BLOCK_INFO (bb)->count;
+    }
+
+  /* Rescan all basic blocks if needed.  */
+  if (count && cfun->machine->rescan_vzeroupper_p)
+    {
+      if (dump_file)
+	fprintf (dump_file, "Rescan all basic blocks\n");
+
+      FOR_EACH_BB (bb)
+	rescan_move_or_delete_vzeroupper (bb);
+    }
 
   free_aux_for_blocks ();
 }
@@ -4062,17 +4178,6 @@ ix86_option_override_internal (bool main_args_p)
     }
 }
 
-/* Return TRUE if type TYPE and mode MODE use 256bit AVX modes.  */
-
-static bool
-use_avx256_p (enum machine_mode mode, const_tree type)
-{
-  return (VALID_AVX256_REG_MODE (mode)
-	  || (type
-	      && TREE_CODE (type) == VECTOR_TYPE
-	      && int_size_in_bytes (type) == 32));
-}
-
 /* Return TRUE if VAL is passed in register with 256bit AVX modes.  */
 
 static bool
@@ -5687,7 +5792,6 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* Argument info to initialize */
       if (function_pass_avx256_p (fnret_value))
 	{
 	  /* The return value of this function uses 256bit AVX modes.  */
-	  cfun->machine->use_avx256_p = true;
 	  if (caller)
 	    cfun->machine->callee_return_avx256_p = true;
 	  else
@@ -6956,7 +7060,6 @@ ix86_function_arg (CUMULATIVE_ARGS *cum, enum machine_mode omode,
   if (TARGET_VZEROUPPER && function_pass_avx256_p (arg))
     {
       /* This argument uses 256bit AVX modes.  */
-      cfun->machine->use_avx256_p = true;
       if (cum->caller)
 	cfun->machine->callee_pass_avx256_p = true;
       else
@@ -10970,12 +11073,9 @@ ix86_expand_epilogue (int style)
 
   /* Emit vzeroupper if needed.  */
   if (TARGET_VZEROUPPER
-      && cfun->machine->use_avx256_p
+      && !TREE_THIS_VOLATILE (cfun->decl)
       && !cfun->machine->caller_return_avx256_p)
-    {
-      cfun->machine->use_vzeroupper_p = 1;
-      emit_insn (gen_avx_vzeroupper (GEN_INT (call_no_avx256))); 
-    }
+    emit_insn (gen_avx_vzeroupper (GEN_INT (call_no_avx256))); 
 
   if (crtl->args.pops_args && crtl->args.size)
     {
@@ -15130,9 +15230,6 @@ ix86_expand_move (enum machine_mode mode, rtx operands[])
   rtx op0, op1;
   enum tls_model model;
 
-  if (VALID_AVX256_REG_MODE (mode))
-    cfun->machine->use_avx256_p = true;
-
   op0 = operands[0];
   op1 = operands[1];
 
@@ -15277,9 +15374,6 @@ ix86_expand_vector_move (enum machine_mode mode, rtx operands[])
   rtx op0 = operands[0], op1 = operands[1];
   unsigned int align = GET_MODE_ALIGNMENT (mode);
 
-  if (VALID_AVX256_REG_MODE (mode))
-    cfun->machine->use_avx256_p = true;
-
   /* Force constants other than zero into memory.  We do not know how
      the instructions used to build constants modify the upper 64 bits
      of the register, once we have that information we may be able
@@ -15386,9 +15480,6 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[])
 {
   rtx op0, op1, m;
 
-  if (VALID_AVX256_REG_MODE (mode))
-    cfun->machine->use_avx256_p = true;
-
   op0 = operands[0];
   op1 = operands[1];
 
@@ -21661,12 +21752,11 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
     }
 
   /* Add UNSPEC_CALL_NEEDS_VZEROUPPER decoration.  */
-  if (TARGET_VZEROUPPER && cfun->machine->use_avx256_p)
+  if (TARGET_VZEROUPPER && !TREE_THIS_VOLATILE (cfun->decl))
     {
       rtx unspec;
       int avx256;
 
-      cfun->machine->use_vzeroupper_p = 1;
       if (cfun->machine->callee_pass_avx256_p)
 	{
 	  if (cfun->machine->callee_return_avx256_p)
@@ -22763,9 +22853,6 @@ ix86_local_alignment (tree exp, enum machine_mode mode,
       decl = NULL;
     }
 
-  if (use_avx256_p (mode, type))
-    cfun->machine->use_avx256_p = true;
-
   /* Don't do dynamic stack realignment for long long objects with
      -mpreferred-stack-boundary=2.  */
   if (!TARGET_64BIT
@@ -22872,9 +22959,6 @@ ix86_minimum_alignment (tree exp, enum machine_mode mode,
       decl = NULL;
     }
 
-  if (use_avx256_p (mode, type))
-    cfun->machine->use_avx256_p = true;
-
   if (TARGET_64BIT || align != 64 || ix86_preferred_stack_boundary >= 64)
     return align;
 
@@ -29782,7 +29866,7 @@ ix86_reorg (void)
     }
 
   /* Run the vzeroupper optimization if needed.  */
-  if (cfun->machine->use_vzeroupper_p)
+  if (TARGET_VZEROUPPER)
     move_or_delete_vzeroupper ();
 }
 
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 170ad50..6e7db03 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2296,12 +2296,6 @@ struct GTY(()) machine_function {
      stack below the return address.  */
   BOOL_BITFIELD static_chain_on_stack : 1;
 
-  /* Nonzero if the current function uses vzeroupper.  */
-  BOOL_BITFIELD use_vzeroupper_p : 1;
-
-  /* Nonzero if the current function uses 256bit AVX regisers.  */
-  BOOL_BITFIELD use_avx256_p : 1;
-
   /* Nonzero if caller passes 256bit AVX modes.  */
   BOOL_BITFIELD caller_pass_avx256_p : 1;
 
@@ -2314,6 +2308,9 @@ struct GTY(()) machine_function {
   /* Nonzero if the current callee returns 256bit AVX modes.  */
   BOOL_BITFIELD callee_return_avx256_p : 1;
 
+  /* Nonzero if rescan vzerouppers in the current function is needed.  */
+  BOOL_BITFIELD rescan_vzeroupper_p : 1;
+
   /* During prologue/epilogue generation, the current frame state.
      Otherwise, the frame state at the end of the prologue.  */
   struct machine_frame_state fs;
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c
index 5007753..667bb17 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c
@@ -14,4 +14,4 @@ foo ()
   _mm256_zeroupper ();
 }
 
-/* { dg-final { scan-assembler-times "avx_vzeroupper" 3 } } */
+/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c
index 507f945..d98ceb9 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c
@@ -16,4 +16,4 @@ foo ()
 }
 
 /* { dg-final { scan-assembler-times "\\*avx_vzeroall" 1 } } */
-/* { dg-final { scan-assembler-times "avx_vzeroupper" 3 } } */
+/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
new file mode 100644
index 0000000..3301083
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx -mtune=generic -dp" } */
+
+extern void free (void *);
+void
+bar (void *ncstrp)
+{
+  if(ncstrp==((void *)0))
+    return;
+  free(ncstrp);
+}
+
+/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-21.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-21.c
new file mode 100644
index 0000000..6dea055
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-21.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx -mtune=generic -dp" } */
+
+extern void exit (int) __attribute__ ((__noreturn__));
+
+int
+foo (int i)
+{
+  if (i == 0)
+    exit (1);
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-22.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-22.c
new file mode 100644
index 0000000..b4e4a58
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-22.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx -mtune=generic -dp" } */
+
+extern void exit (int) __attribute__ ((__noreturn__));
+extern void bar (void);
+
+int
+foo (int i)
+{
+  if (i == 0)
+    {
+      bar ();
+      exit (1);
+    }
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-23.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-23.c
new file mode 100644
index 0000000..66df800
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-23.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx -mtune=generic -dp" } */
+
+extern void fatal (void) __attribute__ ((__noreturn__));
+extern void exit (int) __attribute__ ((__noreturn__));
+
+void
+fatal (void)
+{
+  exit (1);
+}
+
+/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-24.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-24.c
new file mode 100644
index 0000000..4fdd374
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-24.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx -mtune=generic -dp" } */
+
+typedef struct bitmap_element_def {
+  struct bitmap_element_def *next;
+  unsigned int indx;
+} bitmap_element;
+typedef struct bitmap_head_def {
+  bitmap_element *first;
+  bitmap_element *current;
+  unsigned int indx;
+} bitmap_head;
+typedef struct bitmap_head_def *bitmap;
+typedef const struct bitmap_head_def *const_bitmap;
+extern void bar (void) __attribute__ ((__noreturn__));
+unsigned char
+bitmap_and_compl_into (bitmap a, const_bitmap b)
+{
+  bitmap_element *a_elt = a->first;
+  const bitmap_element *b_elt = b->first;
+  if (a == b)
+    {
+      if ((!(a)->first))
+	return 0;
+      else
+	return 1;
+    }
+  while (a_elt && b_elt)
+    {
+      if (a_elt->indx < b_elt->indx)
+	a_elt = a_elt->next;
+    }
+  if (a->indx == a->current->indx)
+    bar ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */

^ permalink raw reply	[flat|nested] 46+ messages in thread
* PATCH: PR target/46519: Missing vzeroupper
@ 2010-11-18  7:29 H.J. Lu
  2010-11-18  8:34 ` H.J. Lu
  0 siblings, 1 reply; 46+ messages in thread
From: H.J. Lu @ 2010-11-18  7:29 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 6133 bytes --]

On Wed, Nov 17, 2010 at 8:11 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Nov 17, 2010 at 3:36 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, Nov 17, 2010 at 11:23 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Wed, Nov 17, 2010 at 2:33 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Tue, Nov 16, 2010 at 11:51 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>>> On Wed, Nov 17, 2010 at 7:44 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>>>>
>>>>>> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))
>>>>>>
>>>>>> We should check NEXT_INSN (insn) != NEXT_INSN (BB_END (bb)) in
>>>>>> move_or_delete_vzeroupper_2.  This patch does it.
>>>>>
>>>>> Huh? The loop does simple linear scan of all insns in the bb, so it
>>>>> can't miss BB_END. IIUC, in your case the bb does not have BB_END
>>>>> (bb), but it has NEXT_INSN (BB_END (bb))?
>>>>
>>>> It has BB_END, but it won't be visited by NEXT_INSN starting from
>>>> BB_HEAD. insn != NEXT_INSN (BB_END (bb)) is used to check the
>>>> end of the BB everywhere in gcc.
>>>>
>>>>> Can you please provide a test case that illustrates this?
>>>>>
>>>>
>>>> I am enclosing a work in progress.  We noticed that we are
>>>> missing a few vzerouppers at -O3 on SPEC CPU 2K/2006.
>>>> One isssue is we may have
>>>>
>>>> foo:
>>>>
>>>>       call bar <<<<< Missing vzeroupper
>>>>
>>>>       256bit vectorized insn
>>>>       goto foo
>>>>
>>>> We miss vzeroupper before call bar.  We don't have a small testcase.
>>>> But this patch fixes this case by inspection. We are checking other
>>>> cases.
>>>
>>> @@ -118,9 +118,12 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool
>>> upper_128bits_set)
>>>             bb->index, upper_128bits_set);
>>>
>>>   insn = BB_HEAD (bb);
>>> +  last = NEXT_INSN (BB_END (bb));
>>>   while (insn != BB_END (bb))
>>>     {
>>>       insn = NEXT_INSN (insn);
>>> +      if (insn == last)
>>> +       break;
>>>
>>>       if (!NONDEBUG_INSN_P (insn))
>>>        continue;
>>>
>>> The change above is not needed. The new check is never triggered - the
>>> loop terminates when "insn == BB_END (bb)" at "while", so I fail to
>>> see why additional termination for "NEXT_INSN (insn) == NEXT_INSN
>>> (BB_END (bb))" is needed.
>>
>> Here is the patch for
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46519
>>
>> We have 2 blocks pointing to each others. This patch first scans
>> all blocks without moving vzeroupper so that we can have accurate
>> information about upper 128bits at block entry.
>>
>>> (The BB_HEAD (bb) is either a NOTE or CODE_LABEL so it can be skipped
>>> with NEXT_INSN.)
>>
>> Please try gcc.target/i386/avx-vzeroupper-20.c.  It will
>> trigger this condition.
>>
>>> @@ -10970,7 +10973,7 @@ ix86_expand_epilogue (int style)
>>>
>>>   /* Emit vzeroupper if needed.  */
>>>   if (TARGET_VZEROUPPER
>>> -      && cfun->machine->use_avx256_p
>>> +      && (cfun->machine->use_avx256_p || flag_tree_vectorize)
>>>       && !cfun->machine->caller_return_avx256_p)
>>>     {
>>>       cfun->machine->use_vzeroupper_p = 1;
>>> @@ -21661,7 +21664,8 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
>>>     }
>>>
>>>   /* Add UNSPEC_CALL_NEEDS_VZEROUPPER decoration.  */
>>> -  if (TARGET_VZEROUPPER && cfun->machine->use_avx256_p)
>>> +  if (TARGET_VZEROUPPER
>>> +      && (cfun->machine->use_avx256_p || flag_tree_vectorize))
>>>
>>> Decorate *ALL* calls with CALL_NEEDS_VZEROUPPER with
>>> -ftree-vectorize?! It looks that parts (or state machine) that set
>>> ...->use_avx256_p flag should be fixed.
>>
>> There are:
>>
>> foo:
>>
>>      call bar <<<<< Missing vzeroupper
>>
>>      256bit vectorized insn
>>      goto foo
>>
>> I couldn't find a hook to set use_avx256_p before RTL expansion
>> starts.
>>
>>>     {
>>>       rtx unspec;
>>>       int avx256;
>>> diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
>>> b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
>>> new file mode 100644
>>> index 0000000..3301083
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
>>> @@ -0,0 +1,13 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O3 -mavx -mtune=generic -dp" } */
>>> +
>>> +extern void free (void *);
>>> +void
>>> +bar (void *ncstrp)
>>> +{
>>> +  if(ncstrp==((void *)0))
>>> +    return;
>>> +  free(ncstrp);
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
>>>
>>> Hm, this testcase doesn't go together with the above change. There is
>>> no vectorization involved, and the scan checks that vzeroupper is NOT
>>> emitted.
>>>
>>
>> This testcase is for
>>
>> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))
>>

I sent the patch without comments too soon.

As discussed in PR, setting and checking use_avx256_p isn't reliable.
This patch removes use_avx256_p.  Any comments?

Thanks.


-- 
H.J.
---
gcc/

2010-11-17  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* config/i386/i386.c (block_info_def): Add scanned and no_avx256.
	(move_or_delete_vzeroupper_2): Properly check the end of basic
	block.  Call note_stores only if no_avx256 is false.
	(scan_live_upper_128bits_2): New.
	(scan_live_upper_128bits_1): Likewise.
	(move_or_delete_vzeroupper): Call scan_live_upper_128bits_1 to
	scan predecessor blocks of all exit points.
	(use_avx256_p): Removed.
	(init_cumulative_args): Don't set use_avx256_p.
	(ix86_function_arg): Likewise.
	(ix86_expand_move): Likewise.
	(ix86_expand_vector_move_misalign): Likewise.
	(ix86_local_alignment): Likewise.
	(ix86_minimum_alignment): Likewise.
	(ix86_expand_epilogue): Don't check use_avx256_p when generating
	vzeroupper.
	(ix86_expand_call): Likewise.

	* config/i386/i386.h (machine_function): Remove use_avx256_p.

gcc/testsuite/

2010-11-17  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* gcc.target/i386/avx-vzeroupper-10.c: Expect no avx_vzeroupper.
	* gcc.target/i386/avx-vzeroupper-11.c: Likewise.

	* gcc.target/i386/avx-vzeroupper-20.c: New.

[-- Attachment #2: gcc-pr46519-2.patch --]
[-- Type: text/plain, Size: 12434 bytes --]

gcc/

2010-11-17  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* config/i386/i386.c (block_info_def): Add scanned and no_avx256.
	(move_or_delete_vzeroupper_2): Properly check the end of basic
	block.  Call note_stores only if no_avx256 is false.
	(scan_live_upper_128bits_2): New.
	(scan_live_upper_128bits_1): Likewise.
	(move_or_delete_vzeroupper): Call scan_live_upper_128bits_1 to
	scan predecessor blocks of all exit points.
	(use_avx256_p): Removed.
	(init_cumulative_args): Don't set use_avx256_p.
	(ix86_function_arg): Likewise.
	(ix86_expand_move): Likewise.
	(ix86_expand_vector_move_misalign): Likewise.
	(ix86_local_alignment): Likewise.
	(ix86_minimum_alignment): Likewise.
	(ix86_expand_epilogue): Don't check use_avx256_p when generating
	vzeroupper.
	(ix86_expand_call): Likewise.

	* config/i386/i386.h (machine_function): Remove use_avx256_p.

gcc/testsuite/

2010-11-17  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* gcc.target/i386/avx-vzeroupper-10.c: Expect no avx_vzeroupper.
	* gcc.target/i386/avx-vzeroupper-11.c: Likewise.

	* gcc.target/i386/avx-vzeroupper-20.c: New.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 11820cf..4b450a3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -63,6 +63,10 @@ typedef struct block_info_def
   bool upper_128bits_set;
   /* TRUE if block has been processed.  */
   bool done;
+  /* TRUE if block has been scanned.  */
+  bool scanned;
+  /* TRUE if 256bit AVX register isn't referenced in block.  */
+  bool no_avx256;
 } *block_info;
 
 #define BLOCK_INFO(B)   ((block_info) (B)->aux)
@@ -108,19 +112,23 @@ check_avx256_stores (rtx dest, const_rtx set, void *data)
 static void
 move_or_delete_vzeroupper_2 (basic_block bb, bool upper_128bits_set)
 {
-  rtx insn;
+  rtx insn, last;
   rtx vzeroupper_insn = NULL_RTX;
   rtx pat;
   int avx256;
+  bool no_avx256 = BLOCK_INFO (bb)->no_avx256;
 
   if (dump_file)
     fprintf (dump_file, " BB [%i] entry: upper 128bits: %d\n",
 	     bb->index, upper_128bits_set);
 
   insn = BB_HEAD (bb);
+  last = NEXT_INSN (BB_END (bb));
   while (insn != BB_END (bb))
     {
       insn = NEXT_INSN (insn);
+      if (insn == last)
+	break;
 
       if (!NONDEBUG_INSN_P (insn))
 	continue;
@@ -176,7 +184,7 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool upper_128bits_set)
 		  vzeroupper_insn = NULL_RTX;
 		}
 	    }
-	  else if (!upper_128bits_set)
+	  else if (!upper_128bits_set && !no_avx256)
 	    note_stores (pat, check_avx256_stores, &upper_128bits_set);
 	  continue;
 	}
@@ -191,8 +199,8 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool upper_128bits_set)
 	     returns 256bit AVX register.  */
 	  upper_128bits_set = (avx256 == callee_return_avx256);
 
-	  /* Remove unnecessary vzeroupper since
-	     upper 128bits are cleared.  */
+	  /* Remove unnecessary vzeroupper since upper 128bits are
+	     cleared.  */
 	  if (dump_file)
 	    {
 	      fprintf (dump_file, "Delete redundant vzeroupper:\n");
@@ -207,8 +215,8 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool upper_128bits_set)
 	     returns 256bit AVX register.  */
 	  upper_128bits_set = (avx256 == callee_return_pass_avx256);
 
-	  /* Must remove vzeroupper since
-	     callee passes in 256bit AVX register.  */
+	  /* Must remove vzeroupper since callee passes in 256bit
+	     AVX register.  */
 	  if (dump_file)
 	    {
 	      fprintf (dump_file, "Delete callee pass vzeroupper:\n");
@@ -265,6 +273,109 @@ move_or_delete_vzeroupper_1 (basic_block block)
   move_or_delete_vzeroupper_2 (block, upper_128bits_set);
 }
 
+/* Helper function for scan_live_upper_128bits_1.  Scan BB to check
+   if the upper 128bits of any AVX registers is live at exit of BB.  */
+
+static void
+scan_live_upper_128bits_2 (basic_block bb, bool upper_128bits_set)
+{
+  rtx insn, pat;
+  int avx256;
+  bool no_avx256 = true;
+
+  if (dump_file)
+    fprintf (dump_file, " BB [%i] entry: upper 128bits: %d\n",
+	     bb->index, upper_128bits_set);
+
+  FOR_BB_INSNS (bb, insn)
+    if (NONJUMP_INSN_P (insn))
+      {
+	pat = PATTERN (insn);
+
+	/* Check insn for vzeroupper intrinsic.  */
+	if (GET_CODE (pat) == UNSPEC_VOLATILE
+	    && XINT (pat, 1) == UNSPECV_VZEROUPPER)
+	  {
+	    /* Process vzeroupper intrinsic.  */
+	    avx256 = INTVAL (XVECEXP (pat, 0, 0));
+	    if (!upper_128bits_set)
+	      {
+		/* Since the upper 128bits are cleared, callee must
+		   not pass 256bit AVX register.  We only need to check
+		   if callee returns 256bit AVX register.  */
+		upper_128bits_set = (avx256 == callee_return_avx256);
+	      }
+	    else if (avx256 == callee_return_pass_avx256
+		     || avx256 == callee_pass_avx256)
+	      {
+		/* Callee passes 256bit AVX register.  Check if callee
+		   returns 256bit AVX register.  */
+		upper_128bits_set = (avx256 == callee_return_pass_avx256);
+	      }
+	    else
+	      upper_128bits_set = false;
+	  }
+	else
+	  {
+	    /* Check insn for vzeroall intrinsic.  */
+	    if (GET_CODE (pat) == PARALLEL
+		&& GET_CODE (XVECEXP (pat, 0, 0)) == UNSPEC_VOLATILE
+		&& XINT (XVECEXP (pat, 0, 0), 1) == UNSPECV_VZEROALL)
+	      upper_128bits_set = false;
+	    else if (!upper_128bits_set)
+	      {
+		note_stores (pat, check_avx256_stores,
+			     &upper_128bits_set);
+		if (upper_128bits_set)
+		  no_avx256 = false;
+	      }
+	  }
+      }
+
+  BLOCK_INFO (bb)->upper_128bits_set = upper_128bits_set;
+  BLOCK_INFO (bb)->no_avx256 = no_avx256;
+
+  if (dump_file)
+    fprintf (dump_file, " BB [%i] exit: upper 128bits: %d\n",
+	     bb->index, upper_128bits_set);
+}
+
+/* Helper function for move_or_delete_vzeroupper.  Scan BLOCK and its
+   predecessor blocks recursively to check if the upper 128bits of any
+   AVX registers is live at exit of BLOCK.  */
+
+static void
+scan_live_upper_128bits_1 (basic_block block)
+{
+  edge e;
+  edge_iterator ei;
+  bool upper_128bits_set;
+
+  if (dump_file)
+    fprintf (dump_file, " Scan BB [%i]: status: %d\n",
+	     block->index, BLOCK_INFO (block)->scanned);
+
+  if (BLOCK_INFO (block)->scanned)
+    return;
+
+  BLOCK_INFO (block)->scanned = true;
+
+  upper_128bits_set = false;
+
+  /* Process all predecessor edges of this block.  */
+  FOR_EACH_EDGE (e, ei, block->preds)
+    {
+      if (e->src == block)
+	continue;
+      scan_live_upper_128bits_1 (e->src);
+      if (BLOCK_INFO (e->src)->upper_128bits_set)
+	upper_128bits_set = true;
+    }
+
+  /* Scan this block.  */
+  scan_live_upper_128bits_2 (block, upper_128bits_set);
+}
+
 /* Go through the instruction stream looking for vzeroupper.  Delete
    it if upper 128bit AVX registers are unused.  If it isn't deleted,
    move it to just before a jump insn.  */
@@ -287,8 +398,16 @@ move_or_delete_vzeroupper (void)
       move_or_delete_vzeroupper_2 (e->dest,
 				   cfun->machine->caller_pass_avx256_p);
       BLOCK_INFO (e->dest)->done = true;
+      BLOCK_INFO (e->dest)->scanned = true;
     }
 
+  /* Scan predecessor blocks of all exit points.  */
+  if (dump_file)
+    fprintf (dump_file, "Scan all exit points\n");
+
+  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds)
+    scan_live_upper_128bits_1 (e->src);
+
   /* Process predecessor blocks of all exit points.  */
   if (dump_file)
     fprintf (dump_file, "Process all exit points\n");
@@ -4062,17 +4181,6 @@ ix86_option_override_internal (bool main_args_p)
     }
 }
 
-/* Return TRUE if type TYPE and mode MODE use 256bit AVX modes.  */
-
-static bool
-use_avx256_p (enum machine_mode mode, const_tree type)
-{
-  return (VALID_AVX256_REG_MODE (mode)
-	  || (type
-	      && TREE_CODE (type) == VECTOR_TYPE
-	      && int_size_in_bytes (type) == 32));
-}
-
 /* Return TRUE if VAL is passed in register with 256bit AVX modes.  */
 
 static bool
@@ -5687,7 +5795,6 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* Argument info to initialize */
       if (function_pass_avx256_p (fnret_value))
 	{
 	  /* The return value of this function uses 256bit AVX modes.  */
-	  cfun->machine->use_avx256_p = true;
 	  if (caller)
 	    cfun->machine->callee_return_avx256_p = true;
 	  else
@@ -6956,7 +7063,6 @@ ix86_function_arg (CUMULATIVE_ARGS *cum, enum machine_mode omode,
   if (TARGET_VZEROUPPER && function_pass_avx256_p (arg))
     {
       /* This argument uses 256bit AVX modes.  */
-      cfun->machine->use_avx256_p = true;
       if (cum->caller)
 	cfun->machine->callee_pass_avx256_p = true;
       else
@@ -10970,7 +11076,6 @@ ix86_expand_epilogue (int style)
 
   /* Emit vzeroupper if needed.  */
   if (TARGET_VZEROUPPER
-      && cfun->machine->use_avx256_p
       && !cfun->machine->caller_return_avx256_p)
     {
       cfun->machine->use_vzeroupper_p = 1;
@@ -15130,9 +15235,6 @@ ix86_expand_move (enum machine_mode mode, rtx operands[])
   rtx op0, op1;
   enum tls_model model;
 
-  if (VALID_AVX256_REG_MODE (mode))
-    cfun->machine->use_avx256_p = true;
-
   op0 = operands[0];
   op1 = operands[1];
 
@@ -15277,9 +15379,6 @@ ix86_expand_vector_move (enum machine_mode mode, rtx operands[])
   rtx op0 = operands[0], op1 = operands[1];
   unsigned int align = GET_MODE_ALIGNMENT (mode);
 
-  if (VALID_AVX256_REG_MODE (mode))
-    cfun->machine->use_avx256_p = true;
-
   /* Force constants other than zero into memory.  We do not know how
      the instructions used to build constants modify the upper 64 bits
      of the register, once we have that information we may be able
@@ -15386,9 +15485,6 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[])
 {
   rtx op0, op1, m;
 
-  if (VALID_AVX256_REG_MODE (mode))
-    cfun->machine->use_avx256_p = true;
-
   op0 = operands[0];
   op1 = operands[1];
 
@@ -21661,7 +21757,7 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
     }
 
   /* Add UNSPEC_CALL_NEEDS_VZEROUPPER decoration.  */
-  if (TARGET_VZEROUPPER && cfun->machine->use_avx256_p)
+  if (TARGET_VZEROUPPER)
     {
       rtx unspec;
       int avx256;
@@ -22763,9 +22859,6 @@ ix86_local_alignment (tree exp, enum machine_mode mode,
       decl = NULL;
     }
 
-  if (use_avx256_p (mode, type))
-    cfun->machine->use_avx256_p = true;
-
   /* Don't do dynamic stack realignment for long long objects with
      -mpreferred-stack-boundary=2.  */
   if (!TARGET_64BIT
@@ -22872,9 +22965,6 @@ ix86_minimum_alignment (tree exp, enum machine_mode mode,
       decl = NULL;
     }
 
-  if (use_avx256_p (mode, type))
-    cfun->machine->use_avx256_p = true;
-
   if (TARGET_64BIT || align != 64 || ix86_preferred_stack_boundary >= 64)
     return align;
 
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 170ad50..f7c38e5 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2299,9 +2299,6 @@ struct GTY(()) machine_function {
   /* Nonzero if the current function uses vzeroupper.  */
   BOOL_BITFIELD use_vzeroupper_p : 1;
 
-  /* Nonzero if the current function uses 256bit AVX regisers.  */
-  BOOL_BITFIELD use_avx256_p : 1;
-
   /* Nonzero if caller passes 256bit AVX modes.  */
   BOOL_BITFIELD caller_pass_avx256_p : 1;
 
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c
index 5007753..667bb17 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c
@@ -14,4 +14,4 @@ foo ()
   _mm256_zeroupper ();
 }
 
-/* { dg-final { scan-assembler-times "avx_vzeroupper" 3 } } */
+/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c
index 507f945..d98ceb9 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c
@@ -16,4 +16,4 @@ foo ()
 }
 
 /* { dg-final { scan-assembler-times "\\*avx_vzeroall" 1 } } */
-/* { dg-final { scan-assembler-times "avx_vzeroupper" 3 } } */
+/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
new file mode 100644
index 0000000..3301083
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx -mtune=generic -dp" } */
+
+extern void free (void *);
+void
+bar (void *ncstrp)
+{
+  if(ncstrp==((void *)0))
+    return;
+  free(ncstrp);
+}
+
+/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2011-01-24 17:26 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-17 19:45 PATCH: PR target/46519: Missing vzeroupper H.J. Lu
2010-12-18 19:36 ` Uros Bizjak
2010-12-18 20:11   ` H.J. Lu
2010-12-29 11:03     ` Uros Bizjak
2010-12-29 16:23       ` H.J. Lu
2010-12-30 12:42         ` Uros Bizjak
2011-01-01  1:05           ` Mark Mitchell
2011-01-01  1:38             ` H.J. Lu
2011-01-01  1:39               ` Mark Mitchell
2011-01-01  2:08                 ` H.J. Lu
2011-01-01  2:17                   ` Mark Mitchell
2011-01-01 16:01                     ` H.J. Lu
2011-01-04  1:15                       ` Mark Mitchell
2011-01-04  3:59                         ` H.J. Lu
2011-01-04  5:54                           ` Mark Mitchell
2011-01-04 22:17                             ` H.J. Lu
2011-01-04 23:53                               ` Mark Mitchell
2011-01-05  0:06                                 ` H.J. Lu
2011-01-05  0:08                                   ` Mark Mitchell
2011-01-05  0:09                                     ` H.J. Lu
2011-01-05  0:24                                       ` Mark Mitchell
2011-01-05 16:44                                         ` H.J. Lu
2011-01-05 17:12                                           ` Jakub Jelinek
2011-01-05 23:01                                             ` H.J. Lu
2011-01-13 17:19                                             ` H.J. Lu
2011-01-13 17:25                                               ` Mark Mitchell
2011-01-13 18:16                                               ` Richard Henderson
2011-01-13 18:51                                                 ` H.J. Lu
2011-01-14 16:06                                                   ` Richard Henderson
2011-01-14 16:08                                                     ` H.J. Lu
2011-01-16  8:04                                                       ` H.J. Lu
2011-01-24 18:00                                                         ` Richard Henderson
2011-01-24 18:12                                                           ` H.J. Lu
2011-01-13 18:04                                           ` Richard Henderson
2011-01-13 18:09                                             ` H.J. Lu
  -- strict thread matches above, loose matches on Subject: below --
2010-11-19 21:58 H.J. Lu
2010-11-20  0:24 ` Richard Guenther
2010-11-20  1:48   ` H.J. Lu
2010-11-20 12:11     ` Richard Guenther
2010-11-20 18:20       ` H.J. Lu
2010-11-24 19:48         ` Uros Bizjak
2010-11-24 19:53           ` H.J. Lu
2010-11-24 19:57             ` Uros Bizjak
2010-11-24 21:41               ` H.J. Lu
2010-11-18  7:29 H.J. Lu
2010-11-18  8:34 ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).