public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Yuri Rumyantsev <ysrumyan@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Ilya Enkovich <enkovich.gnu@gmail.com>, Jeff Law <law@redhat.com>,
		gcc-patches <gcc-patches@gcc.gnu.org>,
	Igor Zamyatin <izamyatin@gmail.com>
Subject: Re: [PATCH] Simple optimization for MASK_STORE.
Date: Thu, 19 Nov 2015 15:20:00 -0000	[thread overview]
Message-ID: <CAEoMCqQ8xzCBF3tE6U3Y3_ci4v57pPSMU_mrNC3GPV7dh9Rcvw@mail.gmail.com> (raw)
In-Reply-To: <CAFiYyc3Q+y7+FOypZNeXEM6=uwEcNpmZLk1-BVDGLcjY4F4+EA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 8000 bytes --]

Hi Richard,

I send you updated version of patch which contains fixes you mentioned
and additional early exit in
register_edge_assert_for() for gcond with vector comparison - it tries
to produce assert for
  if (vect != {0,0,0,0}) but can't create such constant. This is not
essential since this is applied to very specialized context.

My answers are below.

2015-11-12 16:58 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Wed, Nov 11, 2015 at 2:13 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> What we should do to cope with this problem (structure size increasing)?
>> Should we return to vector comparison version?
>
> Ok, given this constraint I think the cleanest approach is to allow
> integer(!) vector equality(!) compares with scalar result.  This should then
> expand via cmp_optab and not via vec_cmp_optab.

  In fact it is expanded through cbranch_optab since the only use of
such comparison is for masked
store motion
>
> On gimple you can then have
>
>  if (mask_vec_1 != {0, 0, .... })
> ...
>
> Note that a fallback expansion (for optabs.c to try) would be
> the suggested view-conversion (aka, subreg) variant using
> a same-sized integer mode.
>
> Target maintainers can then choose what is a better fit for
> their target (and instruction set as register set constraints may apply).
>
> The patch you posted seems to do this but not restrict the compares
> to integer ones (please do that).
>
>        if (TREE_CODE (op0_type) == VECTOR_TYPE
>           || TREE_CODE (op1_type) == VECTOR_TYPE)
>          {
> -          error ("vector comparison returning a boolean");
> -          debug_generic_expr (op0_type);
> -          debug_generic_expr (op1_type);
> -          return true;
> +         /* Allow vector comparison returning boolean if operand types
> +            are equal and CODE is EQ/NE.  */
> +         if ((code != EQ_EXPR && code != NE_EXPR)
> +             || TREE_CODE (op0_type) != TREE_CODE (op1_type)
> +             || TYPE_VECTOR_SUBPARTS (op0_type)
> +                != TYPE_VECTOR_SUBPARTS (op1_type)
> +             || GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op0_type)))
> +                != GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op1_type))))
>
> These are all checked with the useless_type_conversion_p checks done earlier.
>
> As said I'd like to see a
>
>                 || ! VECTOR_INTEGER_TYPE_P (op0_type)

  I added check on VECTOR_BOOLEAN_TYPE_P (op0_type) instead since type
of mask was changed.
>
> check added so we and targets do not need to worry about using EQ/NE vs. CMP
> and worry about signed zeros and friends.
>
> +           {
> +             error ("type mismatch for vector comparison returning a boolean");
> +             debug_generic_expr (op0_type);
> +             debug_generic_expr (op1_type);
> +             return true;
> +           }
>
>
>
> --- a/gcc/tree-ssa-forwprop.c
> +++ b/gcc/tree-ssa-forwprop.c
> @@ -422,6 +422,15 @@ forward_propagate_into_comparison_1 (gimple *stmt,
>           enum tree_code def_code = gimple_assign_rhs_code (def_stmt);
>           bool invariant_only_p = !single_use0_p;
>
> +         /* Can't combine vector comparison with scalar boolean type of
> +            the result and VEC_COND_EXPR having vector type of comparison.  */
> +         if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
> +             && INTEGRAL_TYPE_P (type)
> +             && (TREE_CODE (type) == BOOLEAN_TYPE
> +                 || TYPE_PRECISION (type) == 1)
> +             && def_code == VEC_COND_EXPR)
> +           return NULL_TREE;
>
> this hints at larger fallout you paper over here.  So this effectively
> means we're trying combining (vec1 != vec2) != 0 for example
> and that fails miserably?  If so then the solution is to fix whatever
> does not expect this (valid) GENERIC tree.

  I changed it to the following check in combine_cond_expr_cond:
  /* Do not perform combining it types are not compatible.  */
  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
    return NULL_TREE;
>
> +  if (ENABLE_ZERO_TEST_FOR_MASK_STORE == 0)
> +    return;
>
> not sure if I like a param more than a target hook ... :/
  I introduced the param instead of a target hook to make this
transformation user visible, i.e. to switch it on/off
for different targets.
>
> +      /* Create vector comparison with boolean result.  */
> +      vectype = TREE_TYPE (mask);
> +      zero = build_zero_cst (TREE_TYPE (vectype));
> +      zero = build_vector_from_val (vectype, zero);
>
> build_zero_cst (vectype);

Done.
>
> +      stmt = gimple_build_cond (EQ_EXPR, mask, zero, NULL_TREE, NULL_TREE);
>
> you can omit the NULL_TREE operands.

  I did not find such definition for it.
>
> +      gcc_assert (vdef && TREE_CODE (vdef) == SSA_NAME);
>
> please omit the assert.

  Done.
>
> +      gimple_set_vdef (last, new_vdef);
>
> do this before you create the PHI.
>
  Done.
> +         /* Put definition statement of stored value in STORE_BB
> +            if possible.  */
> +         arg3 = gimple_call_arg (last, 3);
> +         if (TREE_CODE (arg3) == SSA_NAME && has_single_use (arg3))
> +           {
> ...
>
> is this really necessary?  It looks incomplete to me anyway.  I'd rather have
> a late sink pass if this shows necessary.  Btw,..

 I tried to avoid creation of multiple ne basic blocks for the same
mask and also I don't want to put all semi-hammock guarded by this
mask to separate block to keep it small enough since x86 chips prefer
short branches. Note also that icc does almost the same.
>
> +                it is legal.  */
> +             if (gimple_bb (def_stmt) == bb
> +                 && is_valid_sink (def_stmt, last_store))
>
> with the implementation of is_valid_sink this is effectively
>
>    && (!gimple_vuse (def_stmt)
>           || gimple_vuse (def_stmt) == gimple_vdef (last_store))
I did inlining of correspondent part of "is_valif_sink" to this place.
>
> I still think this "pass" is quite a hack, esp. as it appears as generic
> code in a GIMPLE pass.  And esp. as this hack seems to be needed
> for Haswell only, not Boradwell or Skylake.

This is not truth since for all them this transformation is performed
for skylake and broadwell since both them
belong to HASWELL family.

>
> Thanks,
> Richard.
>

ChangeLog:
2015-11-19  Yuri Rumyantsev  <ysrumyan@gmail.com>

* config/i386/i386.c: Add conditional initialization of
PARAM_ZERO_TEST_FOR_MASK_STORE.
(ix86_expand_branch): Implement vector comparison with boolean result.
* config/i386/i386.h: New macros TARGET_OPTIMIZE_MASK_STORE.
* config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
for vector comparion with eq/ne only.
* config/i386/x86-tune.def: New macros X86_TUNE_OPTIMIZE_MASK_STORE.
* fold-const.c (fold_relational_const): Add handling of vector
comparison with boolean result.
* params.def (PARAM_ZERO_TEST_FOR_MASK_STORE): New DEFPARAM.
* params.h (ENABLE_ZERO_TEST_FOR_MASK_STORE): New macros.
* tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
comparison of vector operands with boolean result for EQ/NE only.
(verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
(verify_gimple_cond): Likewise.
* tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
combining for non-compatible vector types.
* tree-vect-loop.c (is_valid_sink): New function.
(optimize_mask_stores): Likewise.
* tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
has_mask_store field of vect_info.
* tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
vectorized loops having masked stores.
* tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
correspondent macros.
(optimize_mask_stores): Add prototype.
* tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector
type.

gcc/testsuite/ChangeLog:
* gcc.target/i386/avx2-vect-mask-store-move1.c: New test.
* gcc.target/i386/avx2-vect-mask-store-move2.c: Likewise.

[-- Attachment #2: patch.6 --]
[-- Type: application/octet-stream, Size: 19415 bytes --]

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 83749d5..5a515f8 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5326,6 +5326,12 @@ ix86_option_override_internal (bool main_args_p,
 			 opts->x_param_values,
 			 opts_set->x_param_values);
 
+  if (TARGET_OPTIMIZE_MASK_STORE)
+    maybe_set_param_value (PARAM_ZERO_TEST_FOR_MASK_STORE,
+			   1,
+			   opts->x_param_values,
+			   opts_set->x_param_values);
+
   /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  */
   if (opts->x_flag_prefetch_loop_arrays < 0
       && HAVE_prefetch
@@ -21641,6 +21647,38 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label)
   machine_mode mode = GET_MODE (op0);
   rtx tmp;
 
+  /* Handle special case - vector comparsion with boolean result, transform
+     it using ptest instruction.  */
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+    {
+      rtx lhs;
+      rtx flag;
+      machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : V2DImode;
+      gcc_assert (code == EQ || code == NE);
+      if (!REG_P (op0))
+	op0 = force_reg (mode, op0);
+      if (!REG_P (op1))
+	op1 = force_reg (mode, op1);
+      /* Generate subtraction since we can't check that one operand is
+	 zero vector.  */
+	  lhs = gen_reg_rtx (mode);
+	  emit_insn (gen_rtx_SET (lhs,
+				  gen_rtx_MINUS (mode, op0, op1)));
+      lhs = gen_rtx_SUBREG (p_mode, lhs, 0);
+      tmp = gen_rtx_SET (gen_rtx_REG (CCmode, FLAGS_REG),
+			  gen_rtx_UNSPEC (CCmode,
+					  gen_rtvec (2, lhs, lhs),
+					  UNSPEC_PTEST));
+      emit_insn (tmp);
+      flag = gen_rtx_REG (CCZmode, FLAGS_REG);
+      tmp = gen_rtx_fmt_ee (code, VOIDmode, flag, const0_rtx);
+      tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp,
+				  gen_rtx_LABEL_REF (VOIDmode, label),
+				  pc_rtx);
+      emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
+      return;
+    }
+
   switch (mode)
     {
     case SFmode:
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index ceda472..5133216 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -496,6 +496,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
     ix86_tune_features[X86_TUNE_ADJUST_UNROLL]
 #define TARGET_AVOID_FALSE_DEP_FOR_BMI \
 	ix86_tune_features[X86_TUNE_AVOID_FALSE_DEP_FOR_BMI]
+#define TARGET_OPTIMIZE_MASK_STORE \
+	ix86_tune_features[X86_TUNE_OPTIMIZE_MASK_STORE]
 
 /* Feature tests against the various architecture variations.  */
 enum ix86_arch_indices {
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index e7b517a..e149cb3 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -18340,6 +18340,25 @@
 	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
   "TARGET_AVX512BW")
 
+(define_expand "cbranch<mode>4"
+  [(set (reg:CC FLAGS_REG)
+	(compare:CC (match_operand:V48_AVX2 1 "nonimmediate_operand")
+		    (match_operand:V48_AVX2 2 "register_operand")))
+   (set (pc) (if_then_else
+	       (match_operator 0 "bt_comparison_operator"
+		[(reg:CC FLAGS_REG) (const_int 0)])
+	       (label_ref (match_operand 3))
+	       (pc)))]
+  "TARGET_AVX2"
+{
+  if (MEM_P (operands[1]) && MEM_P (operands[2]))
+    operands[1] = force_reg (<MODE>mode, operands[1]);
+  ix86_expand_branch (GET_CODE (operands[0]),
+		      operands[1], operands[2], operands[3]);
+  DONE;
+})
+
+
 (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
   [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
 	(unspec:AVX256MODE2P
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index b2d3921..c5e5b63 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -527,6 +527,10 @@ DEF_TUNE (X86_TUNE_AVOID_VECTOR_DECODE, "avoid_vector_decode",
 DEF_TUNE (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI, "avoid_false_dep_for_bmi",
 	  m_SANDYBRIDGE | m_HASWELL | m_GENERIC)
 
+/* X86_TUNE_OPTMIZE_MASK_STORE: Perform masked store if is its mask is not
+   equal to zero.  */
+DEF_TUNE (X86_TUNE_OPTIMIZE_MASK_STORE, "optimize_mask_store", m_HASWELL)
+
 /*****************************************************************************/
 /* This never worked well before.                                            */
 /*****************************************************************************/
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 698062e..9988be1 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code, tree type, tree op0, tree op1)
 
   if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
     {
+      if (INTEGRAL_TYPE_P (type)
+	  && (TREE_CODE (type) == BOOLEAN_TYPE
+	      || TYPE_PRECISION (type) == 1))
+	{
+	  /* Have vector comparison with scalar boolean result.  */
+	  bool result = true;
+	  gcc_assert (code == EQ_EXPR || code == NE_EXPR);
+	  gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
+	  for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
+	    {
+	      tree elem0 = VECTOR_CST_ELT (op0, i);
+	      tree elem1 = VECTOR_CST_ELT (op1, i);
+	      tree tmp = fold_relational_const (code, type, elem0, elem1);
+	      result &= integer_onep (tmp);
+	    }
+	  if (code == NE_EXPR)
+	    result = !result;
+	  return constant_boolean_node (result, type);
+	}
       unsigned count = VECTOR_CST_NELTS (op0);
       tree *elts =  XALLOCAVEC (tree, count);
       gcc_assert (VECTOR_CST_NELTS (op1) == count
diff --git a/gcc/params.def b/gcc/params.def
index 41fd8a8..d228477 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1043,6 +1043,12 @@ DEFPARAM (PARAM_MAX_STORES_TO_SINK,
           "Maximum number of conditional store pairs that can be sunk.",
           2, 0, 0)
 
+/* Enable inserion test on zero mask for masked stores if non-zero.  */
+DEFPARAM (PARAM_ZERO_TEST_FOR_MASK_STORE,
+	  "zero-test-for-mask-store",
+	  "Enable insertion of test on zero mask for masked stores",
+	  0, 0, 1)
+
 /* Override CASE_VALUES_THRESHOLD of when to switch from doing switch
    statements via if statements to using a table jump operation.  If the value
    is 0, the default CASE_VALUES_THRESHOLD will be used.  */
diff --git a/gcc/params.h b/gcc/params.h
index 1090d00..2037c73 100644
--- a/gcc/params.h
+++ b/gcc/params.h
@@ -221,6 +221,8 @@ extern void init_param_values (int *params);
   PARAM_VALUE (PARAM_MIN_NONDEBUG_INSN_UID)
 #define MAX_STORES_TO_SINK \
   PARAM_VALUE (PARAM_MAX_STORES_TO_SINK)
+#define ENABLE_ZERO_TEST_FOR_MASK_STORE \
+  PARAM_VALUE (PARAM_ZERO_TEST_FOR_MASK_STORE)
 #define ALLOW_LOAD_DATA_RACES \
   PARAM_VALUE (PARAM_ALLOW_LOAD_DATA_RACES)
 #define ALLOW_STORE_DATA_RACES \
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
new file mode 100755
index 0000000..60bb841
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=core-avx2 -O3 -fdump-tree-vect-details" } */
+
+extern int *p1, *p2, *p3;
+int c[256];
+void foo (int n)
+{
+  int i;
+  for (i = 0; i < n; i++)
+    if (c[i])
+      {
+	p1[i] += 1;
+	p2[i] = p3[i] +2;
+      }
+}
+
+/* { dg-final { scan-tree-dump-times "Move MASK_STORE to new bb" 2 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move2.c b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move2.c
new file mode 100755
index 0000000..a383b99
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=core-avx2 -O3 -fdump-tree-vect-details" } */
+
+extern int *p1, *p2, *p3;
+int c[256];
+/* All masked load/stores must be put into one new bb.  */
+
+void foo (int n)
+{
+  int i;
+  for (i = 0; i < n; i++)
+    if (c[i])
+      {
+	p1[i] -= 1;
+	p2[i] = p3[i];
+      }
+}
+
+/* { dg-final { scan-tree-dump-times "Move MASK_STORE to new bb" 1 "vect" } } */
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
old mode 100644
new mode 100755
index 0c624aa..cfde379
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3408,10 +3408,10 @@ verify_gimple_call (gcall *stmt)
 }
 
 /* Verifies the gimple comparison with the result type TYPE and
-   the operands OP0 and OP1.  */
+   the operands OP0 and OP1, comparison code is CODE.  */
 
 static bool
-verify_gimple_comparison (tree type, tree op0, tree op1)
+verify_gimple_comparison (tree type, tree op0, tree op1, enum tree_code code)
 {
   tree op0_type = TREE_TYPE (op0);
   tree op1_type = TREE_TYPE (op1);
@@ -3448,10 +3448,16 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
       if (TREE_CODE (op0_type) == VECTOR_TYPE
 	  || TREE_CODE (op1_type) == VECTOR_TYPE)
         {
-          error ("vector comparison returning a boolean");
-          debug_generic_expr (op0_type);
-          debug_generic_expr (op1_type);
-          return true;
+	  /* Allow vector comparison returning boolean if operand types
+	     are equal and CODE is EQ/NE.  */
+	  if ((code != EQ_EXPR && code != NE_EXPR)
+	      || !VECTOR_BOOLEAN_TYPE_P (op0_type))
+	    {
+	      error ("type mismatch for vector comparison returning a boolean");
+	      debug_generic_expr (op0_type);
+	      debug_generic_expr (op1_type);
+	      return true;
+	    }
         }
     }
   /* Or a boolean vector type with the same element count
@@ -3832,7 +3838,7 @@ verify_gimple_assign_binary (gassign *stmt)
     case LTGT_EXPR:
       /* Comparisons are also binary, but the result type is not
 	 connected to the operand types.  */
-      return verify_gimple_comparison (lhs_type, rhs1, rhs2);
+      return verify_gimple_comparison (lhs_type, rhs1, rhs2, rhs_code);
 
     case WIDEN_MULT_EXPR:
       if (TREE_CODE (lhs_type) != INTEGER_TYPE)
@@ -4541,7 +4547,8 @@ verify_gimple_cond (gcond *stmt)
 
   return verify_gimple_comparison (boolean_type_node,
 				   gimple_cond_lhs (stmt),
-				   gimple_cond_rhs (stmt));
+				   gimple_cond_rhs (stmt),
+				   gimple_cond_code (stmt));
 }
 
 /* Verify the GIMPLE statement STMT.  Returns true if there is an
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index b82ae3c..73ee3be 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum tree_code code, tree type,
 
   gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
 
+  /* Do not perform combining it types are not compatible.  */
+  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
+      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
+    return NULL_TREE;
+
   fold_defer_overflow_warnings ();
   t = fold_binary_loc (gimple_location (stmt), code, type, op0, op1);
   if (!t)
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index c3dbfd3..4a247c8 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6826,3 +6826,169 @@ vect_transform_loop (loop_vec_info loop_vinfo)
       dump_printf (MSG_NOTE, "\n");
     }
 }
+
+/* Helper for optimize_mask_stores: returns true if STMT sinking to end
+   of BB is valid and false otherwise.  */
+
+static bool
+is_valid_sink (gimple *stmt, gimple *last_store)
+{
+  tree vdef;
+  imm_use_iterator imm_it;
+  use_operand_p use_p;
+  basic_block bb = gimple_bb (stmt);
+
+  if (is_gimple_call (stmt)
+      && !gimple_call_internal_p (stmt))
+    /* Do not consider non-internal call as valid to sink.  */
+    return false;
+
+  if ((vdef = gimple_vdef (stmt)))
+    {
+      /* Check that ther are no store vuses in current bb.  */
+      FOR_EACH_IMM_USE_FAST (use_p, imm_it, vdef)
+	if (gimple_bb (USE_STMT (use_p)) == bb)
+	  return false;
+      return true;
+    }
+  else if (gimple_vuse (stmt) == NULL_TREE)
+    return true;
+  else if (gimple_vuse (stmt) == gimple_vuse (last_store))
+    return true;
+  return false;
+}
+
+/* The code below is trying to perform simple optimization - do not execute
+   masked store statement if its mask is zero vector since loads that follow
+   a masked store can be blocked.  It puts all masked stores with the same
+   mask-vector into the new bb with a check on zero mask.  */
+
+void
+optimize_mask_stores (struct loop *loop)
+{
+  basic_block bb = loop->header;
+  gimple_stmt_iterator gsi;
+  gimple *stmt;
+  auto_vec<gimple *> worklist;
+
+  if (ENABLE_ZERO_TEST_FOR_MASK_STORE == 0)
+    return;
+
+  /* Pick up all masked stores in loop if any.  */
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      stmt = gsi_stmt (gsi);
+      if (is_gimple_call (stmt)
+	  && gimple_call_internal_p (stmt)
+	  && gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
+	worklist.safe_push (stmt);
+    }
+
+  if (worklist.is_empty ())
+    return;
+
+  /* Loop has masked stores.  */
+  while (!worklist.is_empty ())
+    {
+      gimple *last, *def_stmt, *last_store;
+      edge e, efalse;
+      tree mask;
+      basic_block store_bb, join_bb;
+      gimple_stmt_iterator gsi_to;
+      tree arg3;
+      tree vdef, new_vdef;
+      gphi *phi;
+      bool first_dump;
+      tree vectype;
+      tree zero;
+
+      last = worklist.pop ();
+      mask = gimple_call_arg (last, 2);
+      /* Create new bb.  */
+      e = split_block (bb, last);
+      join_bb = e->dest;
+      store_bb = create_empty_bb (bb);
+      add_bb_to_loop (store_bb, loop);
+      e->flags = EDGE_TRUE_VALUE;
+      efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE);
+      /* Put STORE_BB to likely part.  */
+      efalse->probability = PROB_UNLIKELY;
+      store_bb->frequency = PROB_ALWAYS - EDGE_FREQUENCY (efalse);
+      make_edge (store_bb, join_bb, EDGE_FALLTHRU);
+      if (dom_info_available_p (CDI_DOMINATORS))
+	set_immediate_dominator (CDI_DOMINATORS, store_bb, bb);
+      /* Create vector comparison with boolean result.  */
+      vectype = TREE_TYPE (mask);
+      zero = build_zero_cst (vectype);
+      stmt = gimple_build_cond (EQ_EXPR, mask, zero, NULL_TREE, NULL_TREE);
+      gsi = gsi_last_bb (bb);
+      gsi_insert_after (&gsi, stmt, GSI_SAME_STMT);
+      /* Create new PHI node for vdef of the last masked store:
+	 .MEM_2 = VDEF <.MEM_1>
+	 will be converted to
+	 .MEM.3 = VDEF <.MEM_1>
+	 and new PHI node will be created in join bb
+	 .MEM_2 = PHI <.MEM_1, .MEM_3>
+      */
+      vdef = gimple_vdef (last);
+      new_vdef = make_ssa_name (gimple_vop (cfun), last);
+      gimple_set_vdef (last, new_vdef);
+      phi = create_phi_node (vdef, join_bb);
+      add_phi_arg (phi, new_vdef, EDGE_SUCC (store_bb, 0), UNKNOWN_LOCATION);
+      first_dump = true;
+
+      /* Put all masked stores with the same mask to STORE_BB if possible.  */
+      while (true)
+	{
+	  /* Move masked store to STORE_BB.  */
+	  last_store = last;
+	  gsi = gsi_for_stmt (last);
+	  gsi_to = gsi_start_bb (store_bb);
+	  gsi_move_before (&gsi, &gsi_to);
+	  update_stmt (last);
+	  if (dump_enabled_p ())
+	    {
+	      /* Issue different messages depending on FIRST_DUMP.  */
+	      if (first_dump)
+		{
+		  dump_printf (MSG_NOTE, "Move MASK_STORE to new bb#%d\n",
+			       store_bb->index);
+		  first_dump = false;
+		}
+	      else
+		dump_printf (MSG_NOTE, "Move MASK_STORE to created bb\n");
+	      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, last, 0);
+	    }
+	  /* Put definition statement of stored value in STORE_BB
+	     if possible.  */
+	  arg3 = gimple_call_arg (last, 3);
+	  if (TREE_CODE (arg3) == SSA_NAME && has_single_use (arg3))
+	    {
+	      def_stmt = SSA_NAME_DEF_STMT (arg3);
+	      /* Move def_stmt to STORE_BB if it is in the same BB and
+		 it is valid sink.  */
+	      if (gimple_bb (def_stmt) == bb
+		  && (!gimple_vuse (def_stmt)
+		      || gimple_vuse (def_stmt) == gimple_vuse (last_store)))
+		{
+		  if (dump_enabled_p ())
+		    {
+		      dump_printf (MSG_NOTE, "Move stmt to created bb\n");
+		      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, def_stmt, 0);
+		    }
+		  gsi = gsi_for_stmt (def_stmt);
+		  gsi_to = gsi_start_bb (store_bb);
+		  gsi_move_before (&gsi, &gsi_to);
+		  update_stmt (def_stmt);
+		}
+	    }
+	    /* Put other masked stores with the same mask to STORE_BB.  */
+	    if (worklist.is_empty ()
+		|| gimple_call_arg (worklist.last (), 2) != mask
+		|| !is_valid_sink (worklist.last (), last_store))
+	      break;
+	    last = worklist.pop ();
+	}
+      add_phi_arg (phi, gimple_vuse (last_store), e, UNKNOWN_LOCATION);
+    }
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 4bb58b9..55b1956 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -2000,6 +2000,7 @@ vectorizable_mask_load_store (gimple *stmt, gimple_stmt_iterator *gsi,
     {
       tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE;
       prev_stmt_info = NULL;
+      LOOP_VINFO_HAS_MASK_STORE (loop_vinfo) = true;
       for (i = 0; i < ncopies; i++)
 	{
 	  unsigned align, misalign;
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index b721c56..6732616 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -598,12 +598,18 @@ vectorize_loops (void)
   for (i = 1; i < vect_loops_num; i++)
     {
       loop_vec_info loop_vinfo;
+      bool has_mask_store;
 
       loop = get_loop (cfun, i);
       if (!loop)
 	continue;
       loop_vinfo = (loop_vec_info) loop->aux;
+      has_mask_store = false;
+      if (loop_vinfo)
+	has_mask_store = LOOP_VINFO_HAS_MASK_STORE (loop_vinfo);
       destroy_loop_vec_info (loop_vinfo, true);
+      if (has_mask_store)
+	optimize_mask_stores (loop);
       loop->aux = NULL;
     }
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 7867c26..040051c 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -330,6 +330,9 @@ typedef struct _loop_vec_info : public vec_info {
      loop version without if-conversion.  */
   struct loop *scalar_loop;
 
+  /* Mark loops having masked stores.  */
+  bool has_mask_store;
+
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -367,6 +370,7 @@ typedef struct _loop_vec_info : public vec_info {
 #define LOOP_VINFO_SCALAR_LOOP(L)	   (L)->scalar_loop
 #define LOOP_VINFO_SCALAR_ITERATION_COST(L) (L)->scalar_cost_vec
 #define LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST(L) (L)->single_scalar_iteration_cost
+#define LOOP_VINFO_HAS_MASK_STORE(L)      (L)->has_mask_store
 
 #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \
   ((L)->may_misalign_stmts.length () > 0)
@@ -1001,6 +1005,7 @@ extern void vect_get_vec_defs (tree, tree, gimple *, vec<tree> *,
 			       vec<tree> *, slp_tree, int);
 extern tree vect_gen_perm_mask_any (tree, const unsigned char *);
 extern tree vect_gen_perm_mask_checked (tree, const unsigned char *);
+extern void optimize_mask_stores (struct loop *);
 
 /* In tree-vect-data-refs.c.  */
 extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index e67048e..1605520c 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e, gimple_stmt_iterator si,
 						&comp_code, &val))
     return;
 
+  /* Use of vector comparison in gcond is very restricted and used to check
+     that the mask in masked store is zero, so assert for such comparison
+     is not implemented yet.  */
+  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
+    return;
+
   /* Register ASSERT_EXPRs for name.  */
   register_edge_assert_for_2 (name, e, si, cond_code, cond_op0,
 			      cond_op1, is_else_edge);

      reply	other threads:[~2015-11-19 15:20 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-06 14:04 Yuri Rumyantsev
2015-05-08  9:27 ` Richard Biener
2015-05-08 18:43   ` Jeff Law
2015-05-08 19:16     ` Richard Biener
2015-05-20 14:10   ` Yuri Rumyantsev
2015-05-29 14:28     ` Yuri Rumyantsev
2015-06-09 12:15     ` Richard Biener
2015-06-18 15:41       ` Yuri Rumyantsev
2015-07-07 13:55         ` Yuri Rumyantsev
2015-07-10  5:51         ` Jeff Law
2015-07-20 15:26           ` Yuri Rumyantsev
2015-07-21 13:59             ` Richard Biener
2015-07-23 20:32             ` Jeff Law
2015-07-24  9:04               ` Yuri Rumyantsev
2015-07-24  9:24               ` Richard Biener
2015-07-24 19:26                 ` Jeff Law
2015-07-27  9:04                   ` Richard Biener
2015-08-06 11:07                     ` Yuri Rumyantsev
2015-08-13 11:40                       ` Yuri Rumyantsev
2015-08-13 11:46                         ` Richard Biener
2015-11-02 15:24                           ` Yuri Rumyantsev
2015-11-05 15:49                             ` Yuri Rumyantsev
2015-11-06 12:56                             ` Richard Biener
2015-11-06 13:29                               ` Yuri Rumyantsev
2015-11-10 12:33                                 ` Richard Biener
2015-11-10 12:48                                   ` Ilya Enkovich
2015-11-10 14:46                                     ` Richard Biener
2015-11-10 14:56                                       ` Ilya Enkovich
2015-11-10 17:02                                         ` Mike Stump
2015-11-11  9:18                                         ` Richard Biener
2015-11-11 13:13                                           ` Yuri Rumyantsev
2015-11-12 13:59                                             ` Richard Biener
2015-11-19 15:20                                               ` Yuri Rumyantsev [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEoMCqQ8xzCBF3tE6U3Y3_ci4v57pPSMU_mrNC3GPV7dh9Rcvw@mail.gmail.com \
    --to=ysrumyan@gmail.com \
    --cc=enkovich.gnu@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=izamyatin@gmail.com \
    --cc=law@redhat.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).