public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Yuri Rumyantsev <ysrumyan@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>,
	Igor Zamyatin <izamyatin@gmail.com>
Subject: Re: [PATCH] Simple optimization for MASK_STORE.
Date: Wed, 20 May 2015 14:10:00 -0000	[thread overview]
Message-ID: <CAEoMCqQy045OoQu-v0AgWv=i8FPJffSvw7dQXsAYccB-Tc8nLw@mail.gmail.com> (raw)
In-Reply-To: <CAFiYyc38QMSXL058QuV0TZMAku=Ur0FXhF9TEm2Lp7C_HHmWLg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1470 bytes --]

Hi All,

Here is updated patch to optimize mask stores. The main goal of it is
to avoid execution of mask store if its mask is zero vector since
loads that follow it can be blocked.
The following changes were done:
  1.  A test on sink legality was added - it simply prohibits to cross
statements with non-null vdef or vuse.
  2. New phi node is created in join block for moved MASK_STORE statements.
  3. Test was changed to check that 2 MASK_STORE statements are not
moved to the same block.
  4. New field was added to loop_vec_info structure to mark loops
having MASK_STORE's.

Any comments will be appreciated.
Yuri.

2015-05-20  Yuri Rumyantsev  <ysrumyan@gmail.com>

* config/i386/i386.c: Include files stringpool.h and tree-ssanames.h.
(ix86_vectorize_is_zero_vector): New function.
(TARGET_VECTORIZE_IS_ZERO_VECTOR): New target macro
* doc/tm.texi.in: Add @hook TARGET_VECTORIZE_IS_ZERO_VECTOR.
* doc/tm.texi: Updated.
* target.def (is_zero_vector): New DEFHOOK.
* tree-vect-stmts.c : Include tree-into-ssa.h.
(vectorizable_mask_load_store): Initialize has_mask_store field.
(is_valid_sink): New function.
(optimize_mask_stores): New function.
* tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
loops having masked stores.
* tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
correspondent macros.
(optimize_mask_stores): Update prototype.

gcc/testsuite/ChangeLog:
* gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

[-- Attachment #2: patch.2 --]
[-- Type: application/octet-stream, Size: 12544 bytes --]

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 222288)
+++ config/i386/i386.c	(working copy)
@@ -115,6 +115,8 @@
 #include "tree-iterator.h"
 #include "tree-chkp.h"
 #include "rtl-chkp.h"
+#include "stringpool.h"
+#include "tree-ssanames.h"
 
 static rtx legitimize_dllimport_symbol (rtx, bool);
 static rtx legitimize_pe_coff_extern_decl (rtx, bool);
@@ -40988,6 +40990,47 @@
   return ix86_get_builtin (code);
 }
 
+/* Returns true if SOURCE type is supported by builtin ptest.
+   NAME is lhs of created ptest call.  All created statements are added
+   to GS.  */
+
+static bool
+ix86_vectorize_build_zero_vector_test (tree source, tree name, gimple_seq *gs)
+{
+  tree type = TREE_TYPE (source);
+  gimple stmt;
+  enum ix86_builtins code;
+  tree decl, new_type, conv_expr, vec_tmp;
+
+  gcc_assert (VECTOR_TYPE_P (type));
+  if (!TARGET_AVX)
+    return false;
+
+  switch (tree_to_uhwi (TYPE_SIZE (type)))
+    {
+    case 128:
+      code = IX86_BUILTIN_PTESTZ;
+      break;
+    case 256:
+      if (!TARGET_AVX2)
+	return false;
+      code = IX86_BUILTIN_PTESTZ256;
+      break;
+    default:
+      return false;
+    }
+  decl = ix86_builtin_decl (code, true);
+  new_type = get_same_sized_vectype (long_long_integer_type_node, type);
+  conv_expr = build1 (VIEW_CONVERT_EXPR, new_type, source);
+  vec_tmp = make_ssa_name (new_type);
+  stmt = gimple_build_assign (vec_tmp, conv_expr);
+  gimple_seq_add_stmt (gs, stmt);
+  stmt = gimple_build_call (decl, 2, vec_tmp, vec_tmp);
+  gimple_call_set_lhs (stmt, name);
+  gimple_seq_add_stmt (gs, stmt);
+  return true;
+}
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 
@@ -51765,6 +51808,10 @@
 #undef TARGET_VECTORIZE_BUILTIN_GATHER
 #define TARGET_VECTORIZE_BUILTIN_GATHER ix86_vectorize_builtin_gather
 
+#undef TARGET_VECTORIZE_BUILD_ZERO_VECTOR_TEST
+#define TARGET_VECTORIZE_BUILD_ZERO_VECTOR_TEST \
+  ix86_vectorize_build_zero_vector_test
+
 #undef TARGET_BUILTIN_RECIPROCAL
 #define TARGET_BUILTIN_RECIPROCAL ix86_builtin_reciprocal
 
Index: doc/tm.texi.in
===================================================================
--- doc/tm.texi.in	(revision 222288)
+++ doc/tm.texi.in	(working copy)
@@ -4241,6 +4241,8 @@
 
 @hook TARGET_VECTORIZE_BUILTIN_GATHER
 
+@hook TARGET_VECTORIZE_BUILD_ZERO_VECTOR_TEST
+
 @hook TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN
 
 @hook TARGET_SIMD_CLONE_ADJUST
Index: target.def
===================================================================
--- target.def	(revision 222288)
+++ target.def	(working copy)
@@ -1801,6 +1801,19 @@
  (const_tree mem_vectype, const_tree index_type, int scale),
  NULL)
 
+/* Returns true if target support zero vector predicate for given vector
+   type.  */
+DEFHOOK
+(build_zero_vector_test,
+ "This hook should return boolean true if target supports zero vector\n\
+predicate.  @var{source} is the compared vector, @var{name} is ssa_name\n\
+containing boolean value true if all vector elements are zero and produced\n\
+statements are saved in @var{gs}.\n\
+The default is @code{false} which means that target does not support it.",
+ bool,
+ (tree source, tree name, gimple_seq *gs),
+ hook_bool_rtx_false)
+
 /* Target function to initialize the cost model for a loop or block.  */
 DEFHOOK
 (init_cost,
Index: testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
===================================================================
--- testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c	(revision 0)
+++ testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c	(working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target avx2 } */
+/* { dg-options "-mavx2 -O3 -fopenmp-simd -fdump-tree-vect-details" } */
+
+extern int *p1, *p2, *p3;
+int c[256];
+
+void foo (int n)
+{
+  int i;
+  for (i=0; i<n; i++)
+    if (c[i])
+      {
+	p1[i] += 1;
+	p2[i] = p3[i] +2;
+      }
+}
+
+/* { dg-final { scan-tree-dump-times "Move MASK_STORE to new bb" 2 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: tree-vect-stmts.c
===================================================================
--- tree-vect-stmts.c	(revision 222288)
+++ tree-vect-stmts.c	(working copy)
@@ -88,6 +88,7 @@
 #include "ipa-ref.h"
 #include "cgraph.h"
 #include "builtins.h"
+#include "tree-into-ssa.h"
 
 /* For lang_hooks.types.type_for_mode.  */
 #include "langhooks.h"
@@ -2073,6 +2074,7 @@
     {
       tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE;
       prev_stmt_info = NULL;
+      LOOP_VINFO_HAS_MASK_STORE (loop_vinfo) = true;
       for (i = 0; i < ncopies; i++)
 	{
 	  unsigned align, misalign;
@@ -8205,3 +8207,191 @@
   interm_types->release ();
   return false;
 }
+
+/* Helper for optimize_mask_stores: returns true if STMT sinking to end
+   of BB is valid and false otherwise.  */
+
+static bool
+is_valid_sink (gimple stmt)
+{
+  gimple_stmt_iterator it = gsi_for_stmt (stmt);
+  gimple curr_stmt;
+
+  gsi_next (&it);
+  if (gimple_vdef (stmt))
+    {
+      /* Check that store does not cross its vuses.  */
+      for (; !gsi_end_p (it); gsi_next (&it))
+	{
+	  curr_stmt = gsi_stmt (it);
+	  if (gimple_has_mem_ops (curr_stmt)
+	      && (gimple_vdef (curr_stmt) != NULL_TREE
+		  || gimple_vuse (curr_stmt) != NULL_TREE))
+	    return false;
+	}
+    }
+  else
+    {
+      /* Check that load does not cross any statement with vdef.  */
+      for (it = gsi_for_stmt (stmt); !gsi_end_p (it); gsi_next (&it))
+	{
+	  curr_stmt = gsi_stmt (it);
+	  if (gimple_vdef (curr_stmt) != NULL_TREE)
+	    return false;
+	}
+    }
+  return true;
+}
+
+/* The code below is trying to perform simple optimization - do not execute
+   masked store statement if its mask is zero mask since loads that follow
+   a masked store can be blocked.
+   It put all masked stores statements with the same mask into the new bb
+   with a check on zero mask.  */
+
+void
+optimize_mask_stores (struct loop *loop)
+{
+  basic_block bb = loop->header;
+  gimple_stmt_iterator gsi;
+  gimple stmt;
+  auto_vec<gimple> worklist;
+
+  if (loop->dont_vectorize
+      || loop->num_nodes > 2)
+    return;
+
+  /* Pick up all masked stores in loop if any.  */
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      stmt = gsi_stmt (gsi);
+      if (is_gimple_call (stmt)
+	  && gimple_call_internal_p (stmt)
+	  && gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
+	worklist.safe_push (stmt);
+    }
+
+  if (worklist.is_empty ())
+    return;
+
+  /* Loop has masked stores.  */
+  while (!worklist.is_empty ())
+    {
+      gimple last, def_stmt, last_store;
+      edge e, efalse;
+      tree mask, val;
+      basic_block store_bb, join_bb;
+      gimple_stmt_iterator gsi_to;
+      gimple_seq gs;
+      tree arg3;
+      tree vdef, new_vdef;
+      gphi *phi;
+      bool first_dump;
+
+      last = worklist.pop ();
+      mask = gimple_call_arg (last, 2);
+      /* Loop was not vectorized if mask does not have vector type.  */
+      if (!VECTOR_TYPE_P (TREE_TYPE (mask)))
+	return;
+      val = make_ssa_name (integer_type_node);
+      gs = NULL;
+      /* Escape if target does not support test on zero mask.  */
+
+      if (!targetm.vectorize.build_zero_vector_test (mask, val, &gs))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf (MSG_NOTE, "Target does not support ptest!\n");
+	  return;
+	}
+      gcc_assert (gs != NULL);
+
+      /* Create new bb.  */
+      e = split_block (bb, last);
+      join_bb = e->dest;
+      store_bb = create_empty_bb (bb);
+      add_bb_to_loop (store_bb, loop);
+      e->flags = EDGE_TRUE_VALUE;
+      efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE);
+      store_bb->frequency = bb->frequency / 2;
+      efalse->probability = REG_BR_PROB_BASE / 2;
+      make_edge (store_bb, join_bb, EDGE_FALLTHRU);
+      if (dom_info_available_p (CDI_DOMINATORS))
+	set_immediate_dominator (CDI_DOMINATORS, store_bb, bb);
+      if (dom_info_available_p (CDI_POST_DOMINATORS))
+	set_immediate_dominator (CDI_POST_DOMINATORS, bb, store_bb);
+
+      /* Create zero test condition.  */
+      stmt = gimple_build_cond (NE_EXPR, val, integer_zero_node,
+				NULL_TREE, NULL_TREE);
+      gimple_seq_add_stmt (&gs, stmt);
+      gsi = gsi_last_bb (bb);
+      gsi_insert_seq_after (&gsi, gs, GSI_SAME_STMT);
+      /* Create new PHI node for vdef of the last masked store:
+	 .MEM_2 = VDEF <.MEM_1>
+	 will be converted to
+	 .MEM.3 = VDEF <.MEM_1>
+	 and new PHI node will be created in join bb
+	 .MEM_2 = PHI <.MEM_1, .MEM_3>
+      */
+      vdef = gimple_vdef (last);
+      gcc_assert (vdef && TREE_CODE (vdef) == SSA_NAME);
+      new_vdef = make_ssa_name (gimple_vop (cfun), last);
+      phi = create_phi_node (vdef, join_bb);
+      add_phi_arg (phi, new_vdef, EDGE_SUCC (store_bb, 0), UNKNOWN_LOCATION);
+      gimple_set_vdef (last, new_vdef);
+      first_dump = true;
+
+      /* Put all masked stores with the same mask to STORE_BB if possible.  */
+      while (true)
+	{
+	  /* Move masked store to STORE_BB.  */
+	  last_store = last;
+	  gsi = gsi_for_stmt (last);
+	  gsi_to = gsi_start_bb (store_bb);
+	  gsi_move_before (&gsi, &gsi_to);
+	  update_stmt (last);
+	  if (dump_enabled_p ())
+	    {
+	      /* Issue different messages depending on FIRST_DUMP.  */
+	      if (first_dump)
+		{
+		  dump_printf (MSG_NOTE, "Move MASK_STORE to new bb#%d\n",
+			       store_bb->index);
+		  first_dump = false;
+		}
+	      else
+		dump_printf (MSG_NOTE, "Move MASK_STORE to created bb\n");
+	      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, last, 0);
+	    }
+	  /* Put definition statement of stored value in STORE_BB
+	     if possible.  */
+	  arg3 = gimple_call_arg (last, 3);
+	  if (TREE_CODE (arg3) == SSA_NAME && has_single_use (arg3))
+	    {
+	      def_stmt = SSA_NAME_DEF_STMT (arg3);
+	      /* Move def_stmt to STORE_BB if it is in the same bb and
+		 it is legal.  */
+	      if (gimple_bb (def_stmt) == bb
+		  && is_valid_sink (def_stmt))
+		{
+		  if (dump_enabled_p ())
+		    {
+		      dump_printf (MSG_NOTE, "Move stmt to created bb\n");
+		      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, def_stmt, 0);
+		    }
+		  gsi = gsi_for_stmt (def_stmt);
+		  gsi_to = gsi_start_bb (store_bb);
+		  gsi_move_before (&gsi, &gsi_to);
+		  update_stmt (def_stmt);
+		}
+	    }
+	    /* Put other masked stores with the same mask to STORE_BB.  */
+	    if (worklist.is_empty ()
+		|| gimple_call_arg (worklist.last (), 2) != mask
+		|| !is_valid_sink (worklist.last ()))
+	      break;
+	    last = worklist.pop ();
+	}
+      add_phi_arg (phi, gimple_vuse (last_store), e, UNKNOWN_LOCATION);
+    }
+}
Index: tree-vectorizer.c
===================================================================
--- tree-vectorizer.c	(revision 222288)
+++ tree-vectorizer.c	(working copy)
@@ -549,13 +549,19 @@
   for (i = 1; i < vect_loops_num; i++)
     {
       loop_vec_info loop_vinfo;
+      bool has_mask_store;
 
       loop = get_loop (cfun, i);
       if (!loop)
 	continue;
       loop_vinfo = (loop_vec_info) loop->aux;
+      has_mask_store = false;
+      if (loop_vinfo)
+	has_mask_store = LOOP_VINFO_HAS_MASK_STORE (loop_vinfo);
       destroy_loop_vec_info (loop_vinfo, true);
       loop->aux = NULL;
+      if (has_mask_store)
+	optimize_mask_stores (loop);
     }
 
   free_stmt_vec_info_vec ();
Index: tree-vectorizer.h
===================================================================
--- tree-vectorizer.h	(revision 222288)
+++ tree-vectorizer.h	(working copy)
@@ -375,6 +375,9 @@
      loop version without if-conversion.  */
   struct loop *scalar_loop;
 
+  /* Mark loops having masked stores.  */
+  bool has_mask_store;
+
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -412,6 +415,7 @@
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
 #define LOOP_VINFO_SCALAR_LOOP(L)	   (L)->scalar_loop
+#define LOOP_VINFO_HAS_MASK_STORE(L)	   (L)->has_mask_store
 
 #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \
   ((L)->may_misalign_stmts.length () > 0)
@@ -1044,6 +1048,7 @@
 			       vec<tree> *, slp_tree, int);
 extern tree vect_gen_perm_mask_any (tree, const unsigned char *);
 extern tree vect_gen_perm_mask_checked (tree, const unsigned char *);
+extern void optimize_mask_stores (struct loop *);
 
 /* In tree-vect-data-refs.c.  */
 extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);

  parent reply	other threads:[~2015-05-20 14:00 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-06 14:04 Yuri Rumyantsev
2015-05-08  9:27 ` Richard Biener
2015-05-08 18:43   ` Jeff Law
2015-05-08 19:16     ` Richard Biener
2015-05-20 14:10   ` Yuri Rumyantsev [this message]
2015-05-29 14:28     ` Yuri Rumyantsev
2015-06-09 12:15     ` Richard Biener
2015-06-18 15:41       ` Yuri Rumyantsev
2015-07-07 13:55         ` Yuri Rumyantsev
2015-07-10  5:51         ` Jeff Law
2015-07-20 15:26           ` Yuri Rumyantsev
2015-07-21 13:59             ` Richard Biener
2015-07-23 20:32             ` Jeff Law
2015-07-24  9:04               ` Yuri Rumyantsev
2015-07-24  9:24               ` Richard Biener
2015-07-24 19:26                 ` Jeff Law
2015-07-27  9:04                   ` Richard Biener
2015-08-06 11:07                     ` Yuri Rumyantsev
2015-08-13 11:40                       ` Yuri Rumyantsev
2015-08-13 11:46                         ` Richard Biener
2015-11-02 15:24                           ` Yuri Rumyantsev
2015-11-05 15:49                             ` Yuri Rumyantsev
2015-11-06 12:56                             ` Richard Biener
2015-11-06 13:29                               ` Yuri Rumyantsev
2015-11-10 12:33                                 ` Richard Biener
2015-11-10 12:48                                   ` Ilya Enkovich
2015-11-10 14:46                                     ` Richard Biener
2015-11-10 14:56                                       ` Ilya Enkovich
2015-11-10 17:02                                         ` Mike Stump
2015-11-11  9:18                                         ` Richard Biener
2015-11-11 13:13                                           ` Yuri Rumyantsev
2015-11-12 13:59                                             ` Richard Biener
2015-11-19 15:20                                               ` Yuri Rumyantsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEoMCqQy045OoQu-v0AgWv=i8FPJffSvw7dQXsAYccB-Tc8nLw@mail.gmail.com' \
    --to=ysrumyan@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=izamyatin@gmail.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).