[PATCH PR68542]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH PR68542]
@ 2015-11-30 13:24 Yuri Rumyantsev
  2015-12-04 12:18 ` Richard Biener
  0 siblings, 1 reply; 20+ messages in thread
From: Yuri Rumyantsev @ 2015-11-30 13:24 UTC (permalink / raw)
  To: gcc-patches, Richard Biener, Igor Zamyatin, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 2309 bytes --]

Hi All,

Here is a patch for 481.wrf preformance regression for avx2 which is
sligthly modified mask store optimization. This transformation allows
perform unpredication for semi-hammock containing masked stores, other
words if we have a loop like
for (i=0; i<n; i++)
  if (c[i]) {
    p1[i] += 1;
    p2[i] = p3[i] +2;
  }

then it will be transformed to
   if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
     vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165);
     vect__12.22_172 = vect__11.19_170 + vect_cst__171;
     MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172);
     vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165);
     vect__19.28_184 = vect__18.25_182 + vect_cst__183;
     MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184);
   }
i.e. it will put all computations related to masked stores to semi-hammock.

Bootstrapping and regression testing did not show any new failures.

ChangeLog:
2015-11-30  Yuri Rumyantsev  <ysrumyan@gmail.com>

PR middle-end/68542
* config/i386/i386.c (ix86_expand_branch): Implement integral vector
comparison with boolean result.
* config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
for vector comparion with eq/ne only.
* fold-const.c (fold_relational_const): Add handling of vector
comparison with boolean result.
* tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
comparison of vector operands with boolean result for EQ/NE only.
(verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
(verify_gimple_cond): Likewise.
* tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
combining for non-compatible vector types.
* tree-vect-loop.c (is_valid_sink): New function.
(optimize_mask_stores): Likewise.
* tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
has_mask_store field of vect_info.
* tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
vectorized loops having masked stores.
* tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
correspondent macros.
(optimize_mask_stores): Add prototype.
* tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector
type.

gcc/testsuite/ChangeLog:
* gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

[-- Attachment #2: PR68542.patch --]
[-- Type: application/octet-stream, Size: 18882 bytes --]

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 83749d5..b256078 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21641,6 +21641,40 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label)
   machine_mode mode = GET_MODE (op0);
   rtx tmp;
 
+  /* Handle special case - vector comparsion with boolean result, transform
+     it using ptest instruction.  */
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+    {
+      rtx lhs;
+      rtx flag;
+      machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : V2DImode;
+      gcc_assert (code == EQ || code == NE);
+      if (!REG_P (op0))
+	op0 = force_reg (mode, op0);
+      if (!REG_P (op1))
+	op1 = force_reg (mode, op1);
+      /* Generate subtraction since we can't check that one operand is
+	 zero vector.  */
+      lhs = gen_reg_rtx (mode);
+      emit_insn (gen_rtx_SET (lhs, gen_rtx_MINUS (mode, op0, op1)));
+      lhs = gen_rtx_SUBREG (p_mode, lhs, 0);
+      tmp = gen_rtx_SET (gen_rtx_REG (CCmode, FLAGS_REG),
+			 gen_rtx_UNSPEC (CCmode,
+					 gen_rtvec (2, lhs, lhs),
+					 UNSPEC_PTEST));
+      emit_insn (tmp);
+      flag = gen_rtx_REG (CCZmode, FLAGS_REG);
+      tmp = gen_rtx_fmt_ee (code, VOIDmode, flag, const0_rtx);
+      if (code == EQ)
+	tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp,
+				    gen_rtx_LABEL_REF (VOIDmode, label), pc_rtx);
+      else
+	tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp,
+				    pc_rtx, gen_rtx_LABEL_REF (VOIDmode, label));
+      emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
+      return;
+    }
+
   switch (mode)
     {
     case SFmode:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index e7b517a..e149cb3 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -18340,6 +18340,25 @@
 	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
   "TARGET_AVX512BW")
 
+(define_expand "cbranch<mode>4"
+  [(set (reg:CC FLAGS_REG)
+	(compare:CC (match_operand:V48_AVX2 1 "nonimmediate_operand")
+		    (match_operand:V48_AVX2 2 "register_operand")))
+   (set (pc) (if_then_else
+	       (match_operator 0 "bt_comparison_operator"
+		[(reg:CC FLAGS_REG) (const_int 0)])
+	       (label_ref (match_operand 3))
+	       (pc)))]
+  "TARGET_AVX2"
+{
+  if (MEM_P (operands[1]) && MEM_P (operands[2]))
+    operands[1] = force_reg (<MODE>mode, operands[1]);
+  ix86_expand_branch (GET_CODE (operands[0]),
+		      operands[1], operands[2], operands[3]);
+  DONE;
+})
+
+
 (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
   [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
 	(unspec:AVX256MODE2P
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 698062e..9988be1 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code, tree type, tree op0, tree op1)
 
   if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
     {
+      if (INTEGRAL_TYPE_P (type)
+	  && (TREE_CODE (type) == BOOLEAN_TYPE
+	      || TYPE_PRECISION (type) == 1))
+	{
+	  /* Have vector comparison with scalar boolean result.  */
+	  bool result = true;
+	  gcc_assert (code == EQ_EXPR || code == NE_EXPR);
+	  gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
+	  for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
+	    {
+	      tree elem0 = VECTOR_CST_ELT (op0, i);
+	      tree elem1 = VECTOR_CST_ELT (op1, i);
+	      tree tmp = fold_relational_const (code, type, elem0, elem1);
+	      result &= integer_onep (tmp);
+	    }
+	  if (code == NE_EXPR)
+	    result = !result;
+	  return constant_boolean_node (result, type);
+	}
       unsigned count = VECTOR_CST_NELTS (op0);
       tree *elts =  XALLOCAVEC (tree, count);
       gcc_assert (VECTOR_CST_NELTS (op1) == count
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
new file mode 100644
index 0000000..7d52f70
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
@@ -0,0 +1,81 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx2 } */
+/* { dg-options "-march=core-avx2 -O3 -fdump-tree-vect-details" } */
+
+#include <stdlib.h>
+#include "avx2-check.h"
+
+#define N 32
+int *p1, *p2, *p3;
+int c[N];
+int p1ref[N], p2ref[N];
+
+__attribute__((noinline)) void foo (int n)
+{
+  int i;
+  for (i=0; i<n; i++)
+    if (c[i]) {
+      p1[i] += 1;
+      p2[i] = p3[i] +2;
+    }
+}
+
+__attribute__((noinline)) void init ()
+{
+  p1ref[0]=1; p2ref[0]=2;
+  p1ref[1]=3; p2ref[1]=5;
+  p1ref[2]=5; p2ref[2]=8;
+  p1ref[3]=7; p2ref[3]=11;
+  p1ref[4]=9; p2ref[4]=14;
+  p1ref[5]=11; p2ref[5]=17;
+  p1ref[6]=13; p2ref[6]=20;
+  p1ref[7]=15; p2ref[7]=23;
+  p1ref[8]=16; p2ref[8]=8;
+  p1ref[9]=18; p2ref[9]=9;
+  p1ref[10]=20; p2ref[10]=10;
+  p1ref[11]=22; p2ref[11]=11;
+  p1ref[12]=24; p2ref[12]=12;
+  p1ref[13]=26; p2ref[13]=13;
+  p1ref[14]=28; p2ref[14]=14;
+  p1ref[15]=30; p2ref[15]=15;
+  p1ref[16]=33; p2ref[16]=50;
+  p1ref[17]=35; p2ref[17]=53;
+  p1ref[18]=37; p2ref[18]=56;
+  p1ref[19]=39; p2ref[19]=59;
+  p1ref[20]=41; p2ref[20]=62;
+  p1ref[21]=43; p2ref[21]=65;
+  p1ref[22]=45; p2ref[22]=68;
+  p1ref[23]=47; p2ref[23]=71;
+  p1ref[24]=48; p2ref[24]=24;
+  p1ref[25]=50; p2ref[25]=25;
+  p1ref[26]=52; p2ref[26]=26;
+  p1ref[27]=54; p2ref[27]=27;
+  p1ref[28]=56; p2ref[28]=28;
+  p1ref[29]=58; p2ref[29]=29;
+  p1ref[30]=60; p2ref[30]=30;
+  p1ref[31]=62; p2ref[31]=31;
+}
+
+static void
+avx2_test (void)
+{
+  int * P = malloc(N * 3 * sizeof (int));
+  int i;
+
+  p1 = &P[0];
+  p2 = &P[N];
+  p3 = &P[2 * N];
+  for (i=0; i<N; i++) {
+    p1[i] = i + i;
+    p3[i] = i * 3;
+    p2[i] = i;
+    c[i] = (i >> 3) & 1? 0: 1;
+  }
+  init ();
+  foo (N);
+  for (i=0; i<N;i++)
+    if (p1[i] != p1ref[i] || p2[i] != p2ref[i])
+      abort ();
+}
+
+/* { dg-final { scan-tree-dump-times "Move MASK_STORE to new bb" 1 "vect" } } */
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 0c624aa..97491e1 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3408,10 +3408,10 @@ verify_gimple_call (gcall *stmt)
 }
 
 /* Verifies the gimple comparison with the result type TYPE and
-   the operands OP0 and OP1.  */
+   the operands OP0 and OP1, comparison code is CODE.  */
 
 static bool
-verify_gimple_comparison (tree type, tree op0, tree op1)
+verify_gimple_comparison (tree type, tree op0, tree op1, enum tree_code code)
 {
   tree op0_type = TREE_TYPE (op0);
   tree op1_type = TREE_TYPE (op1);
@@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
       if (TREE_CODE (op0_type) == VECTOR_TYPE
 	  || TREE_CODE (op1_type) == VECTOR_TYPE)
         {
-          error ("vector comparison returning a boolean");
-          debug_generic_expr (op0_type);
-          debug_generic_expr (op1_type);
-          return true;
+	  /* Allow vector comparison returning boolean if operand types
+	     are equal and CODE is EQ/NE.  */
+	  if ((code != EQ_EXPR && code != NE_EXPR)
+	      || !(VECTOR_BOOLEAN_TYPE_P (op0_type)
+		   || VECTOR_INTEGER_TYPE_P (op0_type)))
+	    {
+	      error ("type mismatch for vector comparison returning a boolean");
+	      debug_generic_expr (op0_type);
+	      debug_generic_expr (op1_type);
+	      return true;
+	    }
         }
     }
   /* Or a boolean vector type with the same element count
@@ -3832,7 +3839,7 @@ verify_gimple_assign_binary (gassign *stmt)
     case LTGT_EXPR:
       /* Comparisons are also binary, but the result type is not
 	 connected to the operand types.  */
-      return verify_gimple_comparison (lhs_type, rhs1, rhs2);
+      return verify_gimple_comparison (lhs_type, rhs1, rhs2, rhs_code);
 
     case WIDEN_MULT_EXPR:
       if (TREE_CODE (lhs_type) != INTEGER_TYPE)
@@ -4541,7 +4548,8 @@ verify_gimple_cond (gcond *stmt)
 
   return verify_gimple_comparison (boolean_type_node,
 				   gimple_cond_lhs (stmt),
-				   gimple_cond_rhs (stmt));
+				   gimple_cond_rhs (stmt),
+				   gimple_cond_code (stmt));
 }
 
 /* Verify the GIMPLE statement STMT.  Returns true if there is an
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index b82ae3c..73ee3be 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum tree_code code, tree type,
 
   gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
 
+  /* Do not perform combining it types are not compatible.  */
+  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
+      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
+    return NULL_TREE;
+
   fold_defer_overflow_warnings ();
   t = fold_binary_loc (gimple_location (stmt), code, type, op0, op1);
   if (!t)
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index c3dbfd3..80f6976 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6826,3 +6826,225 @@ vect_transform_loop (loop_vec_info loop_vinfo)
       dump_printf (MSG_NOTE, "\n");
     }
 }
+
+/* Helper for optimize_mask_stores: returns true if STMT sinking to end
+   of BB is valid and false otherwise.  */
+
+static bool
+is_valid_sink (gimple *stmt, gimple *last_store)
+{
+  tree vdef;
+  imm_use_iterator imm_it;
+  use_operand_p use_p;
+  basic_block bb = gimple_bb (stmt);
+
+  if (is_gimple_call (stmt)
+      && !gimple_call_internal_p (stmt))
+    /* Do not consider non-internal call as valid to sink.  */
+    return false;
+
+  if ((vdef = gimple_vdef (stmt)))
+    {
+      /* Check that ther are no store vuses in current bb.  */
+      FOR_EACH_IMM_USE_FAST (use_p, imm_it, vdef)
+	if (gimple_bb (USE_STMT (use_p)) == bb)
+	  return false;
+      return true;
+    }
+  else if (gimple_vuse (stmt) == NULL_TREE)
+    return true;
+  else if (gimple_vuse (stmt) == gimple_vuse (last_store))
+    return true;
+  return false;
+}
+
+/* The code below is trying to perform simple optimization - revert
+   if-conversion for masked stores, i.e. if the mask of a store is zero
+   do not perform it and all stored value producers also if possible.
+   For example,
+     for (i=0; i<n; i++)
+       if (c[i]) {
+	 p1[i] += 1;
+	 p2[i] = p3[i] +2;
+     }
+   this transformation will produce the following semi-hammock:
+
+   if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
+     vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165);
+     vect__12.22_172 = vect__11.19_170 + vect_cst__171;
+     MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172);
+     vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165);
+     vect__19.28_184 = vect__18.25_182 + vect_cst__183;
+     MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184);
+   }
+*/
+
+void
+optimize_mask_stores (struct loop *loop)
+{
+  basic_block bb = loop->header;
+  gimple_stmt_iterator gsi;
+  gimple *stmt, *stmt1;
+  auto_vec<gimple *> worklist;
+
+  vect_location = find_loop_location (loop);
+  /* Pick up all masked stores in loop if any.  */
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      stmt = gsi_stmt (gsi);
+      if (is_gimple_call (stmt)
+	  && gimple_call_internal_p (stmt)
+	  && gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
+	worklist.safe_push (stmt);
+    }
+
+  if (worklist.is_empty ())
+    return;
+
+  /* Loop has masked stores.  */
+  while (!worklist.is_empty ())
+    {
+      gimple *last, *last_store;
+      edge e, efalse;
+      tree mask;
+      basic_block store_bb, join_bb;
+      gimple_stmt_iterator gsi_to;
+      /* tree arg3; */
+      tree vdef, new_vdef;
+      gphi *phi;
+      bool first_dump;
+      tree vectype;
+      tree zero;
+
+      last = worklist.pop ();
+      mask = gimple_call_arg (last, 2);
+      /* Create new bb.  */
+      e = split_block (bb, last);
+      join_bb = e->dest;
+      store_bb = create_empty_bb (bb);
+      add_bb_to_loop (store_bb, loop);
+      e->flags = EDGE_TRUE_VALUE;
+      efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE);
+      /* Put STORE_BB to likely part.  */
+      efalse->probability = PROB_UNLIKELY;
+      store_bb->frequency = PROB_ALWAYS - EDGE_FREQUENCY (efalse);
+      make_edge (store_bb, join_bb, EDGE_FALLTHRU);
+      if (dom_info_available_p (CDI_DOMINATORS))
+	set_immediate_dominator (CDI_DOMINATORS, store_bb, bb);
+      /* Create vector comparison with boolean result.  */
+      vectype = TREE_TYPE (mask);
+      zero = build_zero_cst (vectype);
+      stmt = gimple_build_cond (EQ_EXPR, mask, zero, NULL_TREE, NULL_TREE);
+      gsi = gsi_last_bb (bb);
+      gsi_insert_after (&gsi, stmt, GSI_SAME_STMT);
+      /* Create new PHI node for vdef of the last masked store:
+	 .MEM_2 = VDEF <.MEM_1>
+	 will be converted to
+	 .MEM.3 = VDEF <.MEM_1>
+	 and new PHI node will be created in join bb
+	 .MEM_2 = PHI <.MEM_1, .MEM_3>
+      */
+      vdef = gimple_vdef (last);
+      new_vdef = make_ssa_name (gimple_vop (cfun), last);
+      gimple_set_vdef (last, new_vdef);
+      phi = create_phi_node (vdef, join_bb);
+      add_phi_arg (phi, new_vdef, EDGE_SUCC (store_bb, 0), UNKNOWN_LOCATION);
+      first_dump = true;
+
+      /* Put all masked stores with the same mask to STORE_BB if possible.  */
+      while (true)
+	{
+	  gimple_stmt_iterator gsi_from;
+	  /* Move masked store to STORE_BB.  */
+	  last_store = last;
+	  gsi = gsi_for_stmt (last);
+	  gsi_from = gsi;
+	  /* Shift gsi to previous stmt for further traversal.  */
+	  gsi_prev (&gsi);
+	  gsi_to = gsi_start_bb (store_bb);
+	  gsi_move_before (&gsi_from, &gsi_to);
+	  update_stmt (last);
+	  if (dump_enabled_p ())
+	    {
+	      /* Issue different messages depending on FIRST_DUMP.  */
+	      if (first_dump)
+		{
+		  dump_printf_loc (MSG_NOTE, vect_location,
+				   "Move MASK_STORE to new bb#%d\n",
+			       store_bb->index);
+		  first_dump = false;
+		}
+	      else
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "Move MASK_STORE to created bb\n");
+	      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, last, 0);
+	    }
+	    /* Move all stored value producers if possible.  */
+	    while (!gsi_end_p (gsi))
+	      {
+		tree lhs;
+		imm_use_iterator imm_iter;
+		use_operand_p use_p;
+		bool res;
+		stmt1 = gsi_stmt (gsi);
+		/* Do not consider statements writing to memory.  */
+		if (gimple_vdef (stmt1))
+		  break;
+		gsi_prev (&gsi);
+		if (is_gimple_call (stmt1))
+		  lhs = gimple_call_lhs (stmt1);
+		else
+		  if (gimple_code (stmt1) == GIMPLE_ASSIGN)
+		    lhs = gimple_assign_lhs (stmt1);
+		  else
+		    break;
+
+		/* LHS of vectorized stmt must be SSA_NAME.  */
+		if (TREE_CODE (lhs) != SSA_NAME)
+		  break;
+
+		/* Skip scalar statements.  */
+		if (!VECTOR_TYPE_P (TREE_TYPE (lhs)))
+		  continue;
+
+		/* Check that LHS does not have uses outside of STORE_BB.  */
+		res = true;
+		FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
+		  {
+		    gimple *use_stmt;
+		    use_stmt = USE_STMT (use_p);
+		    if (gimple_bb (use_stmt) != store_bb)
+		      {
+			res = false;
+			break;
+		      }
+		  }
+		if (!res)
+		  break;
+
+		if (gimple_vuse (stmt1)
+		    && gimple_vuse (stmt1) != gimple_vuse (last_store))
+		  break;
+
+		/* Can move STMT1 to STORE_BB.  */
+		if (dump_enabled_p ())
+		  {
+		    dump_printf_loc (MSG_NOTE, vect_location,
+				     "Move stmt to created bb\n");
+		    dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt1, 0);
+		  }
+		gsi_from = gsi_for_stmt (stmt1);
+		gsi_to = gsi_start_bb (store_bb);
+		gsi_move_before (&gsi_from, &gsi_to);
+		update_stmt (stmt1);
+	      }
+	    /* Put other masked stores with the same mask to STORE_BB.  */
+	    if (worklist.is_empty ()
+		|| gimple_call_arg (worklist.last (), 2) != mask
+		|| !is_valid_sink (worklist.last (), last_store))
+	      break;
+	    last = worklist.pop ();
+	}
+      add_phi_arg (phi, gimple_vuse (last_store), e, UNKNOWN_LOCATION);
+    }
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 4bb58b9..55b1956 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -2000,6 +2000,7 @@ vectorizable_mask_load_store (gimple *stmt, gimple_stmt_iterator *gsi,
     {
       tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE;
       prev_stmt_info = NULL;
+      LOOP_VINFO_HAS_MASK_STORE (loop_vinfo) = true;
       for (i = 0; i < ncopies; i++)
 	{
 	  unsigned align, misalign;
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index b721c56..6732616 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -598,12 +598,18 @@ vectorize_loops (void)
   for (i = 1; i < vect_loops_num; i++)
     {
       loop_vec_info loop_vinfo;
+      bool has_mask_store;
 
       loop = get_loop (cfun, i);
       if (!loop)
 	continue;
       loop_vinfo = (loop_vec_info) loop->aux;
+      has_mask_store = false;
+      if (loop_vinfo)
+	has_mask_store = LOOP_VINFO_HAS_MASK_STORE (loop_vinfo);
       destroy_loop_vec_info (loop_vinfo, true);
+      if (has_mask_store)
+	optimize_mask_stores (loop);
       loop->aux = NULL;
     }
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 7867c26..040051c 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -330,6 +330,9 @@ typedef struct _loop_vec_info : public vec_info {
      loop version without if-conversion.  */
   struct loop *scalar_loop;
 
+  /* Mark loops having masked stores.  */
+  bool has_mask_store;
+
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -367,6 +370,7 @@ typedef struct _loop_vec_info : public vec_info {
 #define LOOP_VINFO_SCALAR_LOOP(L)	   (L)->scalar_loop
 #define LOOP_VINFO_SCALAR_ITERATION_COST(L) (L)->scalar_cost_vec
 #define LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST(L) (L)->single_scalar_iteration_cost
+#define LOOP_VINFO_HAS_MASK_STORE(L)      (L)->has_mask_store
 
 #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \
   ((L)->may_misalign_stmts.length () > 0)
@@ -1001,6 +1005,7 @@ extern void vect_get_vec_defs (tree, tree, gimple *, vec<tree> *,
 			       vec<tree> *, slp_tree, int);
 extern tree vect_gen_perm_mask_any (tree, const unsigned char *);
 extern tree vect_gen_perm_mask_checked (tree, const unsigned char *);
+extern void optimize_mask_stores (struct loop *);
 
 /* In tree-vect-data-refs.c.  */
 extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index e67048e..1605520c 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e, gimple_stmt_iterator si,
 						&comp_code, &val))
     return;
 
+  /* Use of vector comparison in gcond is very restricted and used to check
+     that the mask in masked store is zero, so assert for such comparison
+     is not implemented yet.  */
+  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
+    return;
+
   /* Register ASSERT_EXPRs for name.  */
   register_edge_assert_for_2 (name, e, si, cond_code, cond_op0,
 			      cond_op1, is_else_edge);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR68542]
  2015-11-30 13:24 [PATCH PR68542] Yuri Rumyantsev
@ 2015-12-04 12:18 ` Richard Biener
  2015-12-04 15:07   ` Yuri Rumyantsev
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Biener @ 2015-12-04 12:18 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin, Kirill Yukhin

On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Hi All,
>
> Here is a patch for 481.wrf preformance regression for avx2 which is
> sligthly modified mask store optimization. This transformation allows
> perform unpredication for semi-hammock containing masked stores, other
> words if we have a loop like
> for (i=0; i<n; i++)
>   if (c[i]) {
>     p1[i] += 1;
>     p2[i] = p3[i] +2;
>   }
>
> then it will be transformed to
>    if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
>      vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165);
>      vect__12.22_172 = vect__11.19_170 + vect_cst__171;
>      MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172);
>      vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165);
>      vect__19.28_184 = vect__18.25_182 + vect_cst__183;
>      MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184);
>    }
> i.e. it will put all computations related to masked stores to semi-hammock.
>
> Bootstrapping and regression testing did not show any new failures.

Can you please split out the middle-end support for vector equality compares?

@@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
       if (TREE_CODE (op0_type) == VECTOR_TYPE
          || TREE_CODE (op1_type) == VECTOR_TYPE)
         {
-          error ("vector comparison returning a boolean");
-          debug_generic_expr (op0_type);
-          debug_generic_expr (op1_type);
-          return true;
+         /* Allow vector comparison returning boolean if operand types
+            are equal and CODE is EQ/NE.  */
+         if ((code != EQ_EXPR && code != NE_EXPR)
+             || !(VECTOR_BOOLEAN_TYPE_P (op0_type)
+                  || VECTOR_INTEGER_TYPE_P (op0_type)))
+           {
+             error ("type mismatch for vector comparison returning a boolean");
+             debug_generic_expr (op0_type);
+             debug_generic_expr (op1_type);
+             return true;
+           }
         }
     }

please merge the conditions with a &&

@@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code,
tree type, tree op0, tree op1)

   if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
     {
+      if (INTEGRAL_TYPE_P (type)
+         && (TREE_CODE (type) == BOOLEAN_TYPE
+             || TYPE_PRECISION (type) == 1))
+       {
+         /* Have vector comparison with scalar boolean result.  */
+         bool result = true;
+         gcc_assert (code == EQ_EXPR || code == NE_EXPR);
+         gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
+         for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
+           {
+             tree elem0 = VECTOR_CST_ELT (op0, i);
+             tree elem1 = VECTOR_CST_ELT (op1, i);
+             tree tmp = fold_relational_const (code, type, elem0, elem1);
+             result &= integer_onep (tmp);
+         if (code == NE_EXPR)
+           result = !result;
+         return constant_boolean_node (result, type);

... just assumes it is either EQ_EXPR or NE_EXPR.   I believe you want
to change the
guarding condition to just

   if (! VECTOR_TYPE_P (type))

and assert the boolean/precision.  Please also merge the asserts into
one with &&

diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index b82ae3c..73ee3be 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
tree_code code, tree type,

   gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);

+  /* Do not perform combining it types are not compatible.  */
+  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
+      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
+    return NULL_TREE;
+

again, how does this happen?

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index e67048e..1605520c 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e,
gimple_stmt_iterator si,
                                                &comp_code, &val))
     return;

+  /* Use of vector comparison in gcond is very restricted and used to check
+     that the mask in masked store is zero, so assert for such comparison
+     is not implemented yet.  */
+  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
+    return;
+

VECTOR_TYPE_P

I believe the comment should simply say that VRP doesn't track ranges for
vector types.

In the previous review I suggested you should make sure that RTL expansion
ends up using a well-defined optab for these compares.  To make sure
this happens across targets I suggest you make these comparisons available
via the GCC vector extension.  Thus allow

typedef int v4si __attribute__((vector_size(16)));

int foo (v4si a, v4si b)
{
  if (a == b)
    return 4;
}

and != and also using floating point vectors.

Otherwise it's hard to see the impact of this change.  Obvious choices
are the eq/ne optabs for FP compares and [u]cmp optabs for integer
compares.

A half-way implementation like your VRP comment suggests (only
==/!= zero against integer vectors is implemented?!) this doesn't sound
good without also limiting the feature this way in the verifier.

Btw, the regression with WRF is >50% on AMD Bulldozer (which only
has AVX, not AVX2).

Thanks,
Richard.

> ChangeLog:
> 2015-11-30  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> PR middle-end/68542
> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
> comparison with boolean result.
> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
> for vector comparion with eq/ne only.
> * fold-const.c (fold_relational_const): Add handling of vector
> comparison with boolean result.
> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
> comparison of vector operands with boolean result for EQ/NE only.
> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
> (verify_gimple_cond): Likewise.
> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
> combining for non-compatible vector types.
> * tree-vect-loop.c (is_valid_sink): New function.
> (optimize_mask_stores): Likewise.
> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
> has_mask_store field of vect_info.
> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
> vectorized loops having masked stores.
> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
> correspondent macros.
> (optimize_mask_stores): Add prototype.
> * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector
> type.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR68542]
  2015-12-04 12:18 ` Richard Biener
@ 2015-12-04 15:07   ` Yuri Rumyantsev
  2015-12-07 10:57     ` Yuri Rumyantsev
  2015-12-10 13:36     ` Richard Biener
  0 siblings, 2 replies; 20+ messages in thread
From: Yuri Rumyantsev @ 2015-12-04 15:07 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin, Kirill Yukhin

Hi Richard.

Thanks a lot for your review.
Below are my answers.

You asked why I inserted additional check to
++ b/gcc/tree-ssa-forwprop.c
@@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
tree_code code, tree type,

   gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);

+  /* Do not perform combining it types are not compatible.  */
+  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
+      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
+    return NULL_TREE;
+

again, how does this happen?

This is because without it I've got assert in fold_convert_loc
      gcc_assert (TREE_CODE (orig) == VECTOR_TYPE
 && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig)));

since it tries to convert vector of bool to scalar bool.
Here is essential part of call-stack:

#0  internal_error (gmsgid=0x1e48397 "in %s, at %s:%d")
    at ../../gcc/diagnostic.c:1259
#1  0x0000000001743ada in fancy_abort (
    file=0x1847fc3 "../../gcc/fold-const.c", line=2217,
    function=0x184b9d0 <fold_convert_loc(unsigned int, tree_node*,
tree_node*)::__FUNCTION__> "fold_convert_loc") at
../../gcc/diagnostic.c:1332
#2  0x00000000009c8330 in fold_convert_loc (loc=0, type=0x7ffff18a9d20,
    arg=0x7ffff1a7f488) at ../../gcc/fold-const.c:2216
#3  0x00000000009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR,
    type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
    op2=0x7ffff18c2030) at ../../gcc/fold-const.c:11453
#4  0x00000000009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR,
    type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
    op2=0x7ffff18c2030) at ../../gcc/fold-const.c:12394
#5  0x00000000009d870c in fold_binary_op_with_conditional_arg (loc=0,
    code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
    op1=0x7ffff1a48780, cond=0x7ffff1a7f460, arg=0x7ffff1a48780,
    cond_first_p=1) at ../../gcc/fold-const.c:6465
#6  0x00000000009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR,
    type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff1a48780)
    at ../../gcc/fold-const.c:9211
#7  0x0000000000ecb8fa in combine_cond_expr_cond (stmt=0x7ffff1a487d0,
    code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
    op1=0x7ffff1a48780, invariant_only=true)
    at ../../gcc/tree-ssa-forwprop.c:382


Secondly, I did not catch your idea to implement GCC Vector Extension
for vector comparison with bool result since
such extension completely depends on comparison context, e.g. for your
example, result type of comparison depends on using - for
if-comparison it is scalar, but for c = (a==b) - result type is
vector. I don't think that this is reasonable for current release.

And finally about AMD performance. I checked that this transformation
works for "-march=bdver4" option and regression for 481.wrf must
disappear too.

Thanks.
Yuri.

2015-12-04 15:18 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Hi All,
>>
>> Here is a patch for 481.wrf preformance regression for avx2 which is
>> sligthly modified mask store optimization. This transformation allows
>> perform unpredication for semi-hammock containing masked stores, other
>> words if we have a loop like
>> for (i=0; i<n; i++)
>>   if (c[i]) {
>>     p1[i] += 1;
>>     p2[i] = p3[i] +2;
>>   }
>>
>> then it will be transformed to
>>    if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
>>      vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165);
>>      vect__12.22_172 = vect__11.19_170 + vect_cst__171;
>>      MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172);
>>      vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165);
>>      vect__19.28_184 = vect__18.25_182 + vect_cst__183;
>>      MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184);
>>    }
>> i.e. it will put all computations related to masked stores to semi-hammock.
>>
>> Bootstrapping and regression testing did not show any new failures.
>
> Can you please split out the middle-end support for vector equality compares?
>
> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
>        if (TREE_CODE (op0_type) == VECTOR_TYPE
>           || TREE_CODE (op1_type) == VECTOR_TYPE)
>          {
> -          error ("vector comparison returning a boolean");
> -          debug_generic_expr (op0_type);
> -          debug_generic_expr (op1_type);
> -          return true;
> +         /* Allow vector comparison returning boolean if operand types
> +            are equal and CODE is EQ/NE.  */
> +         if ((code != EQ_EXPR && code != NE_EXPR)
> +             || !(VECTOR_BOOLEAN_TYPE_P (op0_type)
> +                  || VECTOR_INTEGER_TYPE_P (op0_type)))
> +           {
> +             error ("type mismatch for vector comparison returning a boolean");
> +             debug_generic_expr (op0_type);
> +             debug_generic_expr (op1_type);
> +             return true;
> +           }
>          }
>      }
>
> please merge the conditions with a &&
>
> @@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code,
> tree type, tree op0, tree op1)
>
>    if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>      {
> +      if (INTEGRAL_TYPE_P (type)
> +         && (TREE_CODE (type) == BOOLEAN_TYPE
> +             || TYPE_PRECISION (type) == 1))
> +       {
> +         /* Have vector comparison with scalar boolean result.  */
> +         bool result = true;
> +         gcc_assert (code == EQ_EXPR || code == NE_EXPR);
> +         gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
> +         for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
> +           {
> +             tree elem0 = VECTOR_CST_ELT (op0, i);
> +             tree elem1 = VECTOR_CST_ELT (op1, i);
> +             tree tmp = fold_relational_const (code, type, elem0, elem1);
> +             result &= integer_onep (tmp);
> +         if (code == NE_EXPR)
> +           result = !result;
> +         return constant_boolean_node (result, type);
>
> ... just assumes it is either EQ_EXPR or NE_EXPR.   I believe you want
> to change the
> guarding condition to just
>
>    if (! VECTOR_TYPE_P (type))
>
> and assert the boolean/precision.  Please also merge the asserts into
> one with &&
>
> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
> index b82ae3c..73ee3be 100644
> --- a/gcc/tree-ssa-forwprop.c
> +++ b/gcc/tree-ssa-forwprop.c
> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
> tree_code code, tree type,
>
>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>
> +  /* Do not perform combining it types are not compatible.  */
> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
> +    return NULL_TREE;
> +
>
> again, how does this happen?
>
> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> index e67048e..1605520c 100644
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e,
> gimple_stmt_iterator si,
>                                                 &comp_code, &val))
>      return;
>
> +  /* Use of vector comparison in gcond is very restricted and used to check
> +     that the mask in masked store is zero, so assert for such comparison
> +     is not implemented yet.  */
> +  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
> +    return;
> +
>
> VECTOR_TYPE_P
>
> I believe the comment should simply say that VRP doesn't track ranges for
> vector types.
>
> In the previous review I suggested you should make sure that RTL expansion
> ends up using a well-defined optab for these compares.  To make sure
> this happens across targets I suggest you make these comparisons available
> via the GCC vector extension.  Thus allow
>
> typedef int v4si __attribute__((vector_size(16)));
>
> int foo (v4si a, v4si b)
> {
>   if (a == b)
>     return 4;
> }
>
> and != and also using floating point vectors.
>
> Otherwise it's hard to see the impact of this change.  Obvious choices
> are the eq/ne optabs for FP compares and [u]cmp optabs for integer
> compares.
>
> A half-way implementation like your VRP comment suggests (only
> ==/!= zero against integer vectors is implemented?!) this doesn't sound
> good without also limiting the feature this way in the verifier.
>
> Btw, the regression with WRF is >50% on AMD Bulldozer (which only
> has AVX, not AVX2).
>
> Thanks,
> Richard.
>
>> ChangeLog:
>> 2015-11-30  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> PR middle-end/68542
>> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
>> comparison with boolean result.
>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
>> for vector comparion with eq/ne only.
>> * fold-const.c (fold_relational_const): Add handling of vector
>> comparison with boolean result.
>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>> comparison of vector operands with boolean result for EQ/NE only.
>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>> (verify_gimple_cond): Likewise.
>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
>> combining for non-compatible vector types.
>> * tree-vect-loop.c (is_valid_sink): New function.
>> (optimize_mask_stores): Likewise.
>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
>> has_mask_store field of vect_info.
>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
>> vectorized loops having masked stores.
>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
>> correspondent macros.
>> (optimize_mask_stores): Add prototype.
>> * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector
>> type.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR68542]
  2015-12-04 15:07   ` Yuri Rumyantsev
@ 2015-12-07 10:57     ` Yuri Rumyantsev
  2015-12-08 12:34       ` Yuri Rumyantsev
  2015-12-10 13:36     ` Richard Biener
  1 sibling, 1 reply; 20+ messages in thread
From: Yuri Rumyantsev @ 2015-12-07 10:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 11030 bytes --]

Richard!

Here is middle-end part of patch with changes proposed by you.

Is it OK for trunk?

Thanks.
Yuri.

ChangeLog:
2015-12-07  Yuri Rumyantsev  <ysrumyan@gmail.com>

PR middle-end/68542
* fold-const.c (fold_relational_const): Add handling of vector
comparison with boolean result.
* tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
comparison of vector operands with boolean result for EQ/NE only.
(verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
(verify_gimple_cond): Likewise.
* tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
combining for non-compatible vector types.
* tree-vrp.c (register_edge_assert_for): VRP does not track ranges for
vector types.



2015-12-04 18:07 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
> Hi Richard.
>
> Thanks a lot for your review.
> Below are my answers.
>
> You asked why I inserted additional check to
> ++ b/gcc/tree-ssa-forwprop.c
> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
> tree_code code, tree type,
>
>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>
> +  /* Do not perform combining it types are not compatible.  */
> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
> +    return NULL_TREE;
> +
>
> again, how does this happen?
>
> This is because without it I've got assert in fold_convert_loc
>       gcc_assert (TREE_CODE (orig) == VECTOR_TYPE
>  && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig)));
>
> since it tries to convert vector of bool to scalar bool.
> Here is essential part of call-stack:
>
> #0  internal_error (gmsgid=0x1e48397 "in %s, at %s:%d")
>     at ../../gcc/diagnostic.c:1259
> #1  0x0000000001743ada in fancy_abort (
>     file=0x1847fc3 "../../gcc/fold-const.c", line=2217,
>     function=0x184b9d0 <fold_convert_loc(unsigned int, tree_node*,
> tree_node*)::__FUNCTION__> "fold_convert_loc") at
> ../../gcc/diagnostic.c:1332
> #2  0x00000000009c8330 in fold_convert_loc (loc=0, type=0x7ffff18a9d20,
>     arg=0x7ffff1a7f488) at ../../gcc/fold-const.c:2216
> #3  0x00000000009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR,
>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:11453
> #4  0x00000000009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR,
>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:12394
> #5  0x00000000009d870c in fold_binary_op_with_conditional_arg (loc=0,
>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>     op1=0x7ffff1a48780, cond=0x7ffff1a7f460, arg=0x7ffff1a48780,
>     cond_first_p=1) at ../../gcc/fold-const.c:6465
> #6  0x00000000009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR,
>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff1a48780)
>     at ../../gcc/fold-const.c:9211
> #7  0x0000000000ecb8fa in combine_cond_expr_cond (stmt=0x7ffff1a487d0,
>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>     op1=0x7ffff1a48780, invariant_only=true)
>     at ../../gcc/tree-ssa-forwprop.c:382
>
>
> Secondly, I did not catch your idea to implement GCC Vector Extension
> for vector comparison with bool result since
> such extension completely depends on comparison context, e.g. for your
> example, result type of comparison depends on using - for
> if-comparison it is scalar, but for c = (a==b) - result type is
> vector. I don't think that this is reasonable for current release.
>
> And finally about AMD performance. I checked that this transformation
> works for "-march=bdver4" option and regression for 481.wrf must
> disappear too.
>
> Thanks.
> Yuri.
>
> 2015-12-04 15:18 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Hi All,
>>>
>>> Here is a patch for 481.wrf preformance regression for avx2 which is
>>> sligthly modified mask store optimization. This transformation allows
>>> perform unpredication for semi-hammock containing masked stores, other
>>> words if we have a loop like
>>> for (i=0; i<n; i++)
>>>   if (c[i]) {
>>>     p1[i] += 1;
>>>     p2[i] = p3[i] +2;
>>>   }
>>>
>>> then it will be transformed to
>>>    if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
>>>      vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165);
>>>      vect__12.22_172 = vect__11.19_170 + vect_cst__171;
>>>      MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172);
>>>      vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165);
>>>      vect__19.28_184 = vect__18.25_182 + vect_cst__183;
>>>      MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184);
>>>    }
>>> i.e. it will put all computations related to masked stores to semi-hammock.
>>>
>>> Bootstrapping and regression testing did not show any new failures.
>>
>> Can you please split out the middle-end support for vector equality compares?
>>
>> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
>>        if (TREE_CODE (op0_type) == VECTOR_TYPE
>>           || TREE_CODE (op1_type) == VECTOR_TYPE)
>>          {
>> -          error ("vector comparison returning a boolean");
>> -          debug_generic_expr (op0_type);
>> -          debug_generic_expr (op1_type);
>> -          return true;
>> +         /* Allow vector comparison returning boolean if operand types
>> +            are equal and CODE is EQ/NE.  */
>> +         if ((code != EQ_EXPR && code != NE_EXPR)
>> +             || !(VECTOR_BOOLEAN_TYPE_P (op0_type)
>> +                  || VECTOR_INTEGER_TYPE_P (op0_type)))
>> +           {
>> +             error ("type mismatch for vector comparison returning a boolean");
>> +             debug_generic_expr (op0_type);
>> +             debug_generic_expr (op1_type);
>> +             return true;
>> +           }
>>          }
>>      }
>>
>> please merge the conditions with a &&
>>
>> @@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code,
>> tree type, tree op0, tree op1)
>>
>>    if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>>      {
>> +      if (INTEGRAL_TYPE_P (type)
>> +         && (TREE_CODE (type) == BOOLEAN_TYPE
>> +             || TYPE_PRECISION (type) == 1))
>> +       {
>> +         /* Have vector comparison with scalar boolean result.  */
>> +         bool result = true;
>> +         gcc_assert (code == EQ_EXPR || code == NE_EXPR);
>> +         gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
>> +         for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
>> +           {
>> +             tree elem0 = VECTOR_CST_ELT (op0, i);
>> +             tree elem1 = VECTOR_CST_ELT (op1, i);
>> +             tree tmp = fold_relational_const (code, type, elem0, elem1);
>> +             result &= integer_onep (tmp);
>> +         if (code == NE_EXPR)
>> +           result = !result;
>> +         return constant_boolean_node (result, type);
>>
>> ... just assumes it is either EQ_EXPR or NE_EXPR.   I believe you want
>> to change the
>> guarding condition to just
>>
>>    if (! VECTOR_TYPE_P (type))
>>
>> and assert the boolean/precision.  Please also merge the asserts into
>> one with &&
>>
>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
>> index b82ae3c..73ee3be 100644
>> --- a/gcc/tree-ssa-forwprop.c
>> +++ b/gcc/tree-ssa-forwprop.c
>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>> tree_code code, tree type,
>>
>>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>
>> +  /* Do not perform combining it types are not compatible.  */
>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
>> +    return NULL_TREE;
>> +
>>
>> again, how does this happen?
>>
>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>> index e67048e..1605520c 100644
>> --- a/gcc/tree-vrp.c
>> +++ b/gcc/tree-vrp.c
>> @@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e,
>> gimple_stmt_iterator si,
>>                                                 &comp_code, &val))
>>      return;
>>
>> +  /* Use of vector comparison in gcond is very restricted and used to check
>> +     that the mask in masked store is zero, so assert for such comparison
>> +     is not implemented yet.  */
>> +  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
>> +    return;
>> +
>>
>> VECTOR_TYPE_P
>>
>> I believe the comment should simply say that VRP doesn't track ranges for
>> vector types.
>>
>> In the previous review I suggested you should make sure that RTL expansion
>> ends up using a well-defined optab for these compares.  To make sure
>> this happens across targets I suggest you make these comparisons available
>> via the GCC vector extension.  Thus allow
>>
>> typedef int v4si __attribute__((vector_size(16)));
>>
>> int foo (v4si a, v4si b)
>> {
>>   if (a == b)
>>     return 4;
>> }
>>
>> and != and also using floating point vectors.
>>
>> Otherwise it's hard to see the impact of this change.  Obvious choices
>> are the eq/ne optabs for FP compares and [u]cmp optabs for integer
>> compares.
>>
>> A half-way implementation like your VRP comment suggests (only
>> ==/!= zero against integer vectors is implemented?!) this doesn't sound
>> good without also limiting the feature this way in the verifier.
>>
>> Btw, the regression with WRF is >50% on AMD Bulldozer (which only
>> has AVX, not AVX2).
>>
>> Thanks,
>> Richard.
>>
>>> ChangeLog:
>>> 2015-11-30  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> PR middle-end/68542
>>> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
>>> comparison with boolean result.
>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
>>> for vector comparion with eq/ne only.
>>> * fold-const.c (fold_relational_const): Add handling of vector
>>> comparison with boolean result.
>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>>> comparison of vector operands with boolean result for EQ/NE only.
>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>>> (verify_gimple_cond): Likewise.
>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
>>> combining for non-compatible vector types.
>>> * tree-vect-loop.c (is_valid_sink): New function.
>>> (optimize_mask_stores): Likewise.
>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
>>> has_mask_store field of vect_info.
>>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
>>> vectorized loops having masked stores.
>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
>>> correspondent macros.
>>> (optimize_mask_stores): Add prototype.
>>> * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector
>>> type.
>>>
>>> gcc/testsuite/ChangeLog:
>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

[-- Attachment #2: PR68542.middle-end.patch --]
[-- Type: application/octet-stream, Size: 4378 bytes --]

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 698062e..7125621 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -13888,6 +13888,23 @@ fold_relational_const (enum tree_code code, tree type, tree op0, tree op1)
 
   if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
     {
+      if (!VECTOR_TYPE_P (type))
+	{
+	  /* Have vector comparison with scalar boolean result.  */
+	  bool result = true;
+	  gcc_assert ((code == EQ_EXPR || code == NE_EXPR)
+		      && VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
+	  for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
+	    {
+	      tree elem0 = VECTOR_CST_ELT (op0, i);
+	      tree elem1 = VECTOR_CST_ELT (op1, i);
+	      tree tmp = fold_relational_const (code, type, elem0, elem1);
+	      result &= integer_onep (tmp);
+	    }
+	  if (code == NE_EXPR)
+	    result = !result;
+	  return constant_boolean_node (result, type);
+	}
       unsigned count = VECTOR_CST_NELTS (op0);
       tree *elts =  XALLOCAVEC (tree, count);
       gcc_assert (VECTOR_CST_NELTS (op1) == count
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 0c624aa..770b35b 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3408,10 +3408,10 @@ verify_gimple_call (gcall *stmt)
 }
 
 /* Verifies the gimple comparison with the result type TYPE and
-   the operands OP0 and OP1.  */
+   the operands OP0 and OP1, comparison code is CODE.  */
 
 static bool
-verify_gimple_comparison (tree type, tree op0, tree op1)
+verify_gimple_comparison (tree type, tree op0, tree op1, enum tree_code code)
 {
   tree op0_type = TREE_TYPE (op0);
   tree op1_type = TREE_TYPE (op1);
@@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
       if (TREE_CODE (op0_type) == VECTOR_TYPE
 	  || TREE_CODE (op1_type) == VECTOR_TYPE)
         {
-          error ("vector comparison returning a boolean");
-          debug_generic_expr (op0_type);
-          debug_generic_expr (op1_type);
-          return true;
+	  /* Allow vector comparison returning boolean if operand types
+	     are boolean or integral and CODE is EQ/NE.  */
+	  if (code != EQ_EXPR && code != NE_EXPR
+	      && !VECTOR_BOOLEAN_TYPE_P (op0_type)
+	      && !VECTOR_INTEGER_TYPE_P (op0_type))
+	    {
+	      error ("type mismatch for vector comparison returning a boolean");
+	      debug_generic_expr (op0_type);
+	      debug_generic_expr (op1_type);
+	      return true;
+	    }
         }
     }
   /* Or a boolean vector type with the same element count
@@ -3832,7 +3839,7 @@ verify_gimple_assign_binary (gassign *stmt)
     case LTGT_EXPR:
       /* Comparisons are also binary, but the result type is not
 	 connected to the operand types.  */
-      return verify_gimple_comparison (lhs_type, rhs1, rhs2);
+      return verify_gimple_comparison (lhs_type, rhs1, rhs2, rhs_code);
 
     case WIDEN_MULT_EXPR:
       if (TREE_CODE (lhs_type) != INTEGER_TYPE)
@@ -4541,7 +4548,8 @@ verify_gimple_cond (gcond *stmt)
 
   return verify_gimple_comparison (boolean_type_node,
 				   gimple_cond_lhs (stmt),
-				   gimple_cond_rhs (stmt));
+				   gimple_cond_rhs (stmt),
+				   gimple_cond_code (stmt));
 }
 
 /* Verify the GIMPLE statement STMT.  Returns true if there is an
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index b82ae3c..73ee3be 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum tree_code code, tree type,
 
   gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
 
+  /* Do not perform combining it types are not compatible.  */
+  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
+      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
+    return NULL_TREE;
+
   fold_defer_overflow_warnings ();
   t = fold_binary_loc (gimple_location (stmt), code, type, op0, op1);
   if (!t)
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index e67048e..e8cdb1d 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -5760,6 +5760,10 @@ register_edge_assert_for (tree name, edge e, gimple_stmt_iterator si,
 						&comp_code, &val))
     return;
 
+  /* VRP doesn't track ranges for vector types.  */
+  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
+    return;
+
   /* Register ASSERT_EXPRs for name.  */
   register_edge_assert_for_2 (name, e, si, cond_code, cond_op0,
 			      cond_op1, is_else_edge);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR68542]
  2015-12-07 10:57     ` Yuri Rumyantsev
@ 2015-12-08 12:34       ` Yuri Rumyantsev
  0 siblings, 0 replies; 20+ messages in thread
From: Yuri Rumyantsev @ 2015-12-08 12:34 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 12343 bytes --]

Hi Richard,

Here is the second part of patch.

Is it OK for trunk?

I assume that it should fix huge degradation on 481.wrf for -march=bdver4 also.

ChangeLog:
2015-12-08  Yuri Rumyantsev  <ysrumyan@gmail.com>

PR middle-end/68542
* config/i386/i386.c (ix86_expand_branch): Implement integral vector
comparison with boolean result.
* config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
for vector comparion with eq/ne only.
* tree-vect-loop.c (is_valid_sink): New function.
(optimize_mask_stores): Likewise.
* tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
has_mask_store field of vect_info.
* tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
vectorized loops having masked stores.
* tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
correspondent macros.
(optimize_mask_stores): Add prototype.

gcc/testsuite/ChangeLog:
* gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

2015-12-07 13:57 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
> Richard!
>
> Here is middle-end part of patch with changes proposed by you.
>
> Is it OK for trunk?
>
> Thanks.
> Yuri.
>
> ChangeLog:
> 2015-12-07  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> PR middle-end/68542
> * fold-const.c (fold_relational_const): Add handling of vector
> comparison with boolean result.
> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
> comparison of vector operands with boolean result for EQ/NE only.
> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
> (verify_gimple_cond): Likewise.
> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
> combining for non-compatible vector types.
> * tree-vrp.c (register_edge_assert_for): VRP does not track ranges for
> vector types.
>
>
>
> 2015-12-04 18:07 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>> Hi Richard.
>>
>> Thanks a lot for your review.
>> Below are my answers.
>>
>> You asked why I inserted additional check to
>> ++ b/gcc/tree-ssa-forwprop.c
>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>> tree_code code, tree type,
>>
>>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>
>> +  /* Do not perform combining it types are not compatible.  */
>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
>> +    return NULL_TREE;
>> +
>>
>> again, how does this happen?
>>
>> This is because without it I've got assert in fold_convert_loc
>>       gcc_assert (TREE_CODE (orig) == VECTOR_TYPE
>>  && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig)));
>>
>> since it tries to convert vector of bool to scalar bool.
>> Here is essential part of call-stack:
>>
>> #0  internal_error (gmsgid=0x1e48397 "in %s, at %s:%d")
>>     at ../../gcc/diagnostic.c:1259
>> #1  0x0000000001743ada in fancy_abort (
>>     file=0x1847fc3 "../../gcc/fold-const.c", line=2217,
>>     function=0x184b9d0 <fold_convert_loc(unsigned int, tree_node*,
>> tree_node*)::__FUNCTION__> "fold_convert_loc") at
>> ../../gcc/diagnostic.c:1332
>> #2  0x00000000009c8330 in fold_convert_loc (loc=0, type=0x7ffff18a9d20,
>>     arg=0x7ffff1a7f488) at ../../gcc/fold-const.c:2216
>> #3  0x00000000009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR,
>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:11453
>> #4  0x00000000009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR,
>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:12394
>> #5  0x00000000009d870c in fold_binary_op_with_conditional_arg (loc=0,
>>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>>     op1=0x7ffff1a48780, cond=0x7ffff1a7f460, arg=0x7ffff1a48780,
>>     cond_first_p=1) at ../../gcc/fold-const.c:6465
>> #6  0x00000000009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR,
>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff1a48780)
>>     at ../../gcc/fold-const.c:9211
>> #7  0x0000000000ecb8fa in combine_cond_expr_cond (stmt=0x7ffff1a487d0,
>>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>>     op1=0x7ffff1a48780, invariant_only=true)
>>     at ../../gcc/tree-ssa-forwprop.c:382
>>
>>
>> Secondly, I did not catch your idea to implement GCC Vector Extension
>> for vector comparison with bool result since
>> such extension completely depends on comparison context, e.g. for your
>> example, result type of comparison depends on using - for
>> if-comparison it is scalar, but for c = (a==b) - result type is
>> vector. I don't think that this is reasonable for current release.
>>
>> And finally about AMD performance. I checked that this transformation
>> works for "-march=bdver4" option and regression for 481.wrf must
>> disappear too.
>>
>> Thanks.
>> Yuri.
>>
>> 2015-12-04 15:18 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Hi All,
>>>>
>>>> Here is a patch for 481.wrf preformance regression for avx2 which is
>>>> sligthly modified mask store optimization. This transformation allows
>>>> perform unpredication for semi-hammock containing masked stores, other
>>>> words if we have a loop like
>>>> for (i=0; i<n; i++)
>>>>   if (c[i]) {
>>>>     p1[i] += 1;
>>>>     p2[i] = p3[i] +2;
>>>>   }
>>>>
>>>> then it will be transformed to
>>>>    if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
>>>>      vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165);
>>>>      vect__12.22_172 = vect__11.19_170 + vect_cst__171;
>>>>      MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172);
>>>>      vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165);
>>>>      vect__19.28_184 = vect__18.25_182 + vect_cst__183;
>>>>      MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184);
>>>>    }
>>>> i.e. it will put all computations related to masked stores to semi-hammock.
>>>>
>>>> Bootstrapping and regression testing did not show any new failures.
>>>
>>> Can you please split out the middle-end support for vector equality compares?
>>>
>>> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
>>>        if (TREE_CODE (op0_type) == VECTOR_TYPE
>>>           || TREE_CODE (op1_type) == VECTOR_TYPE)
>>>          {
>>> -          error ("vector comparison returning a boolean");
>>> -          debug_generic_expr (op0_type);
>>> -          debug_generic_expr (op1_type);
>>> -          return true;
>>> +         /* Allow vector comparison returning boolean if operand types
>>> +            are equal and CODE is EQ/NE.  */
>>> +         if ((code != EQ_EXPR && code != NE_EXPR)
>>> +             || !(VECTOR_BOOLEAN_TYPE_P (op0_type)
>>> +                  || VECTOR_INTEGER_TYPE_P (op0_type)))
>>> +           {
>>> +             error ("type mismatch for vector comparison returning a boolean");
>>> +             debug_generic_expr (op0_type);
>>> +             debug_generic_expr (op1_type);
>>> +             return true;
>>> +           }
>>>          }
>>>      }
>>>
>>> please merge the conditions with a &&
>>>
>>> @@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code,
>>> tree type, tree op0, tree op1)
>>>
>>>    if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>>>      {
>>> +      if (INTEGRAL_TYPE_P (type)
>>> +         && (TREE_CODE (type) == BOOLEAN_TYPE
>>> +             || TYPE_PRECISION (type) == 1))
>>> +       {
>>> +         /* Have vector comparison with scalar boolean result.  */
>>> +         bool result = true;
>>> +         gcc_assert (code == EQ_EXPR || code == NE_EXPR);
>>> +         gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
>>> +         for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
>>> +           {
>>> +             tree elem0 = VECTOR_CST_ELT (op0, i);
>>> +             tree elem1 = VECTOR_CST_ELT (op1, i);
>>> +             tree tmp = fold_relational_const (code, type, elem0, elem1);
>>> +             result &= integer_onep (tmp);
>>> +         if (code == NE_EXPR)
>>> +           result = !result;
>>> +         return constant_boolean_node (result, type);
>>>
>>> ... just assumes it is either EQ_EXPR or NE_EXPR.   I believe you want
>>> to change the
>>> guarding condition to just
>>>
>>>    if (! VECTOR_TYPE_P (type))
>>>
>>> and assert the boolean/precision.  Please also merge the asserts into
>>> one with &&
>>>
>>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
>>> index b82ae3c..73ee3be 100644
>>> --- a/gcc/tree-ssa-forwprop.c
>>> +++ b/gcc/tree-ssa-forwprop.c
>>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>>> tree_code code, tree type,
>>>
>>>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>>
>>> +  /* Do not perform combining it types are not compatible.  */
>>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>>> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
>>> +    return NULL_TREE;
>>> +
>>>
>>> again, how does this happen?
>>>
>>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>>> index e67048e..1605520c 100644
>>> --- a/gcc/tree-vrp.c
>>> +++ b/gcc/tree-vrp.c
>>> @@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e,
>>> gimple_stmt_iterator si,
>>>                                                 &comp_code, &val))
>>>      return;
>>>
>>> +  /* Use of vector comparison in gcond is very restricted and used to check
>>> +     that the mask in masked store is zero, so assert for such comparison
>>> +     is not implemented yet.  */
>>> +  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
>>> +    return;
>>> +
>>>
>>> VECTOR_TYPE_P
>>>
>>> I believe the comment should simply say that VRP doesn't track ranges for
>>> vector types.
>>>
>>> In the previous review I suggested you should make sure that RTL expansion
>>> ends up using a well-defined optab for these compares.  To make sure
>>> this happens across targets I suggest you make these comparisons available
>>> via the GCC vector extension.  Thus allow
>>>
>>> typedef int v4si __attribute__((vector_size(16)));
>>>
>>> int foo (v4si a, v4si b)
>>> {
>>>   if (a == b)
>>>     return 4;
>>> }
>>>
>>> and != and also using floating point vectors.
>>>
>>> Otherwise it's hard to see the impact of this change.  Obvious choices
>>> are the eq/ne optabs for FP compares and [u]cmp optabs for integer
>>> compares.
>>>
>>> A half-way implementation like your VRP comment suggests (only
>>> ==/!= zero against integer vectors is implemented?!) this doesn't sound
>>> good without also limiting the feature this way in the verifier.
>>>
>>> Btw, the regression with WRF is >50% on AMD Bulldozer (which only
>>> has AVX, not AVX2).
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> ChangeLog:
>>>> 2015-11-30  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> PR middle-end/68542
>>>> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
>>>> comparison with boolean result.
>>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
>>>> for vector comparion with eq/ne only.
>>>> * fold-const.c (fold_relational_const): Add handling of vector
>>>> comparison with boolean result.
>>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>>>> comparison of vector operands with boolean result for EQ/NE only.
>>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>>>> (verify_gimple_cond): Likewise.
>>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
>>>> combining for non-compatible vector types.
>>>> * tree-vect-loop.c (is_valid_sink): New function.
>>>> (optimize_mask_stores): Likewise.
>>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
>>>> has_mask_store field of vect_info.
>>>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
>>>> vectorized loops having masked stores.
>>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
>>>> correspondent macros.
>>>> (optimize_mask_stores): Add prototype.
>>>> * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector
>>>> type.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

[-- Attachment #2: PR68542.patch2 --]
[-- Type: application/octet-stream, Size: 14298 bytes --]

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 83749d5..b256078 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21641,6 +21641,40 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label)
   machine_mode mode = GET_MODE (op0);
   rtx tmp;
 
+  /* Handle special case - vector comparsion with boolean result, transform
+     it using ptest instruction.  */
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+    {
+      rtx lhs;
+      rtx flag;
+      machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : V2DImode;
+      gcc_assert (code == EQ || code == NE);
+      if (!REG_P (op0))
+	op0 = force_reg (mode, op0);
+      if (!REG_P (op1))
+	op1 = force_reg (mode, op1);
+      /* Generate subtraction since we can't check that one operand is
+	 zero vector.  */
+      lhs = gen_reg_rtx (mode);
+      emit_insn (gen_rtx_SET (lhs, gen_rtx_MINUS (mode, op0, op1)));
+      lhs = gen_rtx_SUBREG (p_mode, lhs, 0);
+      tmp = gen_rtx_SET (gen_rtx_REG (CCmode, FLAGS_REG),
+			 gen_rtx_UNSPEC (CCmode,
+					 gen_rtvec (2, lhs, lhs),
+					 UNSPEC_PTEST));
+      emit_insn (tmp);
+      flag = gen_rtx_REG (CCZmode, FLAGS_REG);
+      tmp = gen_rtx_fmt_ee (code, VOIDmode, flag, const0_rtx);
+      if (code == EQ)
+	tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp,
+				    gen_rtx_LABEL_REF (VOIDmode, label), pc_rtx);
+      else
+	tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp,
+				    pc_rtx, gen_rtx_LABEL_REF (VOIDmode, label));
+      emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
+      return;
+    }
+
   switch (mode)
     {
     case SFmode:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index e7b517a..e149cb3 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -18340,6 +18340,25 @@
 	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
   "TARGET_AVX512BW")
 
+(define_expand "cbranch<mode>4"
+  [(set (reg:CC FLAGS_REG)
+	(compare:CC (match_operand:V48_AVX2 1 "nonimmediate_operand")
+		    (match_operand:V48_AVX2 2 "register_operand")))
+   (set (pc) (if_then_else
+	       (match_operator 0 "bt_comparison_operator"
+		[(reg:CC FLAGS_REG) (const_int 0)])
+	       (label_ref (match_operand 3))
+	       (pc)))]
+  "TARGET_AVX2"
+{
+  if (MEM_P (operands[1]) && MEM_P (operands[2]))
+    operands[1] = force_reg (<MODE>mode, operands[1]);
+  ix86_expand_branch (GET_CODE (operands[0]),
+		      operands[1], operands[2], operands[3]);
+  DONE;
+})
+
+
 (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
   [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
 	(unspec:AVX256MODE2P
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
new file mode 100644
index 0000000..7d52f70
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
@@ -0,0 +1,81 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx2 } */
+/* { dg-options "-march=core-avx2 -O3 -fdump-tree-vect-details" } */
+
+#include <stdlib.h>
+#include "avx2-check.h"
+
+#define N 32
+int *p1, *p2, *p3;
+int c[N];
+int p1ref[N], p2ref[N];
+
+__attribute__((noinline)) void foo (int n)
+{
+  int i;
+  for (i=0; i<n; i++)
+    if (c[i]) {
+      p1[i] += 1;
+      p2[i] = p3[i] +2;
+    }
+}
+
+__attribute__((noinline)) void init ()
+{
+  p1ref[0]=1; p2ref[0]=2;
+  p1ref[1]=3; p2ref[1]=5;
+  p1ref[2]=5; p2ref[2]=8;
+  p1ref[3]=7; p2ref[3]=11;
+  p1ref[4]=9; p2ref[4]=14;
+  p1ref[5]=11; p2ref[5]=17;
+  p1ref[6]=13; p2ref[6]=20;
+  p1ref[7]=15; p2ref[7]=23;
+  p1ref[8]=16; p2ref[8]=8;
+  p1ref[9]=18; p2ref[9]=9;
+  p1ref[10]=20; p2ref[10]=10;
+  p1ref[11]=22; p2ref[11]=11;
+  p1ref[12]=24; p2ref[12]=12;
+  p1ref[13]=26; p2ref[13]=13;
+  p1ref[14]=28; p2ref[14]=14;
+  p1ref[15]=30; p2ref[15]=15;
+  p1ref[16]=33; p2ref[16]=50;
+  p1ref[17]=35; p2ref[17]=53;
+  p1ref[18]=37; p2ref[18]=56;
+  p1ref[19]=39; p2ref[19]=59;
+  p1ref[20]=41; p2ref[20]=62;
+  p1ref[21]=43; p2ref[21]=65;
+  p1ref[22]=45; p2ref[22]=68;
+  p1ref[23]=47; p2ref[23]=71;
+  p1ref[24]=48; p2ref[24]=24;
+  p1ref[25]=50; p2ref[25]=25;
+  p1ref[26]=52; p2ref[26]=26;
+  p1ref[27]=54; p2ref[27]=27;
+  p1ref[28]=56; p2ref[28]=28;
+  p1ref[29]=58; p2ref[29]=29;
+  p1ref[30]=60; p2ref[30]=30;
+  p1ref[31]=62; p2ref[31]=31;
+}
+
+static void
+avx2_test (void)
+{
+  int * P = malloc(N * 3 * sizeof (int));
+  int i;
+
+  p1 = &P[0];
+  p2 = &P[N];
+  p3 = &P[2 * N];
+  for (i=0; i<N; i++) {
+    p1[i] = i + i;
+    p3[i] = i * 3;
+    p2[i] = i;
+    c[i] = (i >> 3) & 1? 0: 1;
+  }
+  init ();
+  foo (N);
+  for (i=0; i<N;i++)
+    if (p1[i] != p1ref[i] || p2[i] != p2ref[i])
+      abort ();
+}
+
+/* { dg-final { scan-tree-dump-times "Move MASK_STORE to new bb" 1 "vect" } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index c3dbfd3..80f6976 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6826,3 +6826,225 @@ vect_transform_loop (loop_vec_info loop_vinfo)
       dump_printf (MSG_NOTE, "\n");
     }
 }
+
+/* Helper for optimize_mask_stores: returns true if STMT sinking to end
+   of BB is valid and false otherwise.  */
+
+static bool
+is_valid_sink (gimple *stmt, gimple *last_store)
+{
+  tree vdef;
+  imm_use_iterator imm_it;
+  use_operand_p use_p;
+  basic_block bb = gimple_bb (stmt);
+
+  if (is_gimple_call (stmt)
+      && !gimple_call_internal_p (stmt))
+    /* Do not consider non-internal call as valid to sink.  */
+    return false;
+
+  if ((vdef = gimple_vdef (stmt)))
+    {
+      /* Check that ther are no store vuses in current bb.  */
+      FOR_EACH_IMM_USE_FAST (use_p, imm_it, vdef)
+	if (gimple_bb (USE_STMT (use_p)) == bb)
+	  return false;
+      return true;
+    }
+  else if (gimple_vuse (stmt) == NULL_TREE)
+    return true;
+  else if (gimple_vuse (stmt) == gimple_vuse (last_store))
+    return true;
+  return false;
+}
+
+/* The code below is trying to perform simple optimization - revert
+   if-conversion for masked stores, i.e. if the mask of a store is zero
+   do not perform it and all stored value producers also if possible.
+   For example,
+     for (i=0; i<n; i++)
+       if (c[i]) {
+	 p1[i] += 1;
+	 p2[i] = p3[i] +2;
+     }
+   this transformation will produce the following semi-hammock:
+
+   if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
+     vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165);
+     vect__12.22_172 = vect__11.19_170 + vect_cst__171;
+     MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172);
+     vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165);
+     vect__19.28_184 = vect__18.25_182 + vect_cst__183;
+     MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184);
+   }
+*/
+
+void
+optimize_mask_stores (struct loop *loop)
+{
+  basic_block bb = loop->header;
+  gimple_stmt_iterator gsi;
+  gimple *stmt, *stmt1;
+  auto_vec<gimple *> worklist;
+
+  vect_location = find_loop_location (loop);
+  /* Pick up all masked stores in loop if any.  */
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      stmt = gsi_stmt (gsi);
+      if (is_gimple_call (stmt)
+	  && gimple_call_internal_p (stmt)
+	  && gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
+	worklist.safe_push (stmt);
+    }
+
+  if (worklist.is_empty ())
+    return;
+
+  /* Loop has masked stores.  */
+  while (!worklist.is_empty ())
+    {
+      gimple *last, *last_store;
+      edge e, efalse;
+      tree mask;
+      basic_block store_bb, join_bb;
+      gimple_stmt_iterator gsi_to;
+      /* tree arg3; */
+      tree vdef, new_vdef;
+      gphi *phi;
+      bool first_dump;
+      tree vectype;
+      tree zero;
+
+      last = worklist.pop ();
+      mask = gimple_call_arg (last, 2);
+      /* Create new bb.  */
+      e = split_block (bb, last);
+      join_bb = e->dest;
+      store_bb = create_empty_bb (bb);
+      add_bb_to_loop (store_bb, loop);
+      e->flags = EDGE_TRUE_VALUE;
+      efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE);
+      /* Put STORE_BB to likely part.  */
+      efalse->probability = PROB_UNLIKELY;
+      store_bb->frequency = PROB_ALWAYS - EDGE_FREQUENCY (efalse);
+      make_edge (store_bb, join_bb, EDGE_FALLTHRU);
+      if (dom_info_available_p (CDI_DOMINATORS))
+	set_immediate_dominator (CDI_DOMINATORS, store_bb, bb);
+      /* Create vector comparison with boolean result.  */
+      vectype = TREE_TYPE (mask);
+      zero = build_zero_cst (vectype);
+      stmt = gimple_build_cond (EQ_EXPR, mask, zero, NULL_TREE, NULL_TREE);
+      gsi = gsi_last_bb (bb);
+      gsi_insert_after (&gsi, stmt, GSI_SAME_STMT);
+      /* Create new PHI node for vdef of the last masked store:
+	 .MEM_2 = VDEF <.MEM_1>
+	 will be converted to
+	 .MEM.3 = VDEF <.MEM_1>
+	 and new PHI node will be created in join bb
+	 .MEM_2 = PHI <.MEM_1, .MEM_3>
+      */
+      vdef = gimple_vdef (last);
+      new_vdef = make_ssa_name (gimple_vop (cfun), last);
+      gimple_set_vdef (last, new_vdef);
+      phi = create_phi_node (vdef, join_bb);
+      add_phi_arg (phi, new_vdef, EDGE_SUCC (store_bb, 0), UNKNOWN_LOCATION);
+      first_dump = true;
+
+      /* Put all masked stores with the same mask to STORE_BB if possible.  */
+      while (true)
+	{
+	  gimple_stmt_iterator gsi_from;
+	  /* Move masked store to STORE_BB.  */
+	  last_store = last;
+	  gsi = gsi_for_stmt (last);
+	  gsi_from = gsi;
+	  /* Shift gsi to previous stmt for further traversal.  */
+	  gsi_prev (&gsi);
+	  gsi_to = gsi_start_bb (store_bb);
+	  gsi_move_before (&gsi_from, &gsi_to);
+	  update_stmt (last);
+	  if (dump_enabled_p ())
+	    {
+	      /* Issue different messages depending on FIRST_DUMP.  */
+	      if (first_dump)
+		{
+		  dump_printf_loc (MSG_NOTE, vect_location,
+				   "Move MASK_STORE to new bb#%d\n",
+			       store_bb->index);
+		  first_dump = false;
+		}
+	      else
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "Move MASK_STORE to created bb\n");
+	      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, last, 0);
+	    }
+	    /* Move all stored value producers if possible.  */
+	    while (!gsi_end_p (gsi))
+	      {
+		tree lhs;
+		imm_use_iterator imm_iter;
+		use_operand_p use_p;
+		bool res;
+		stmt1 = gsi_stmt (gsi);
+		/* Do not consider statements writing to memory.  */
+		if (gimple_vdef (stmt1))
+		  break;
+		gsi_prev (&gsi);
+		if (is_gimple_call (stmt1))
+		  lhs = gimple_call_lhs (stmt1);
+		else
+		  if (gimple_code (stmt1) == GIMPLE_ASSIGN)
+		    lhs = gimple_assign_lhs (stmt1);
+		  else
+		    break;
+
+		/* LHS of vectorized stmt must be SSA_NAME.  */
+		if (TREE_CODE (lhs) != SSA_NAME)
+		  break;
+
+		/* Skip scalar statements.  */
+		if (!VECTOR_TYPE_P (TREE_TYPE (lhs)))
+		  continue;
+
+		/* Check that LHS does not have uses outside of STORE_BB.  */
+		res = true;
+		FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
+		  {
+		    gimple *use_stmt;
+		    use_stmt = USE_STMT (use_p);
+		    if (gimple_bb (use_stmt) != store_bb)
+		      {
+			res = false;
+			break;
+		      }
+		  }
+		if (!res)
+		  break;
+
+		if (gimple_vuse (stmt1)
+		    && gimple_vuse (stmt1) != gimple_vuse (last_store))
+		  break;
+
+		/* Can move STMT1 to STORE_BB.  */
+		if (dump_enabled_p ())
+		  {
+		    dump_printf_loc (MSG_NOTE, vect_location,
+				     "Move stmt to created bb\n");
+		    dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt1, 0);
+		  }
+		gsi_from = gsi_for_stmt (stmt1);
+		gsi_to = gsi_start_bb (store_bb);
+		gsi_move_before (&gsi_from, &gsi_to);
+		update_stmt (stmt1);
+	      }
+	    /* Put other masked stores with the same mask to STORE_BB.  */
+	    if (worklist.is_empty ()
+		|| gimple_call_arg (worklist.last (), 2) != mask
+		|| !is_valid_sink (worklist.last (), last_store))
+	      break;
+	    last = worklist.pop ();
+	}
+      add_phi_arg (phi, gimple_vuse (last_store), e, UNKNOWN_LOCATION);
+    }
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 4bb58b9..55b1956 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -2000,6 +2000,7 @@ vectorizable_mask_load_store (gimple *stmt, gimple_stmt_iterator *gsi,
     {
       tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE;
       prev_stmt_info = NULL;
+      LOOP_VINFO_HAS_MASK_STORE (loop_vinfo) = true;
       for (i = 0; i < ncopies; i++)
 	{
 	  unsigned align, misalign;
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index b721c56..6732616 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -598,12 +598,18 @@ vectorize_loops (void)
   for (i = 1; i < vect_loops_num; i++)
     {
       loop_vec_info loop_vinfo;
+      bool has_mask_store;
 
       loop = get_loop (cfun, i);
       if (!loop)
 	continue;
       loop_vinfo = (loop_vec_info) loop->aux;
+      has_mask_store = false;
+      if (loop_vinfo)
+	has_mask_store = LOOP_VINFO_HAS_MASK_STORE (loop_vinfo);
       destroy_loop_vec_info (loop_vinfo, true);
+      if (has_mask_store)
+	optimize_mask_stores (loop);
       loop->aux = NULL;
     }
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 7867c26..040051c 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -330,6 +330,9 @@ typedef struct _loop_vec_info : public vec_info {
      loop version without if-conversion.  */
   struct loop *scalar_loop;
 
+  /* Mark loops having masked stores.  */
+  bool has_mask_store;
+
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -367,6 +370,7 @@ typedef struct _loop_vec_info : public vec_info {
 #define LOOP_VINFO_SCALAR_LOOP(L)	   (L)->scalar_loop
 #define LOOP_VINFO_SCALAR_ITERATION_COST(L) (L)->scalar_cost_vec
 #define LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST(L) (L)->single_scalar_iteration_cost
+#define LOOP_VINFO_HAS_MASK_STORE(L)      (L)->has_mask_store
 
 #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \
   ((L)->may_misalign_stmts.length () > 0)
@@ -1001,6 +1005,7 @@ extern void vect_get_vec_defs (tree, tree, gimple *, vec<tree> *,
 			       vec<tree> *, slp_tree, int);
 extern tree vect_gen_perm_mask_any (tree, const unsigned char *);
 extern tree vect_gen_perm_mask_checked (tree, const unsigned char *);
+extern void optimize_mask_stores (struct loop *);
 
 /* In tree-vect-data-refs.c.  */
 extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR68542]
  2015-12-04 15:07   ` Yuri Rumyantsev
  2015-12-07 10:57     ` Yuri Rumyantsev
@ 2015-12-10 13:36     ` Richard Biener
  2015-12-11 14:03       ` Yuri Rumyantsev
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Biener @ 2015-12-10 13:36 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin, Kirill Yukhin

On Fri, Dec 4, 2015 at 4:07 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Hi Richard.
>
> Thanks a lot for your review.
> Below are my answers.
>
> You asked why I inserted additional check to
> ++ b/gcc/tree-ssa-forwprop.c
> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
> tree_code code, tree type,
>
>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>
> +  /* Do not perform combining it types are not compatible.  */
> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
> +    return NULL_TREE;
> +
>
> again, how does this happen?
>
> This is because without it I've got assert in fold_convert_loc
>       gcc_assert (TREE_CODE (orig) == VECTOR_TYPE
>  && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig)));
>
> since it tries to convert vector of bool to scalar bool.
> Here is essential part of call-stack:
>
> #0  internal_error (gmsgid=0x1e48397 "in %s, at %s:%d")
>     at ../../gcc/diagnostic.c:1259
> #1  0x0000000001743ada in fancy_abort (
>     file=0x1847fc3 "../../gcc/fold-const.c", line=2217,
>     function=0x184b9d0 <fold_convert_loc(unsigned int, tree_node*,
> tree_node*)::__FUNCTION__> "fold_convert_loc") at
> ../../gcc/diagnostic.c:1332
> #2  0x00000000009c8330 in fold_convert_loc (loc=0, type=0x7ffff18a9d20,
>     arg=0x7ffff1a7f488) at ../../gcc/fold-const.c:2216
> #3  0x00000000009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR,
>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:11453
> #4  0x00000000009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR,
>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:12394
> #5  0x00000000009d870c in fold_binary_op_with_conditional_arg (loc=0,
>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>     op1=0x7ffff1a48780, cond=0x7ffff1a7f460, arg=0x7ffff1a48780,
>     cond_first_p=1) at ../../gcc/fold-const.c:6465
> #6  0x00000000009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR,
>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff1a48780)
>     at ../../gcc/fold-const.c:9211
> #7  0x0000000000ecb8fa in combine_cond_expr_cond (stmt=0x7ffff1a487d0,
>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>     op1=0x7ffff1a48780, invariant_only=true)
>     at ../../gcc/tree-ssa-forwprop.c:382

Ok, but that only shows that

      /* Convert A ? 1 : 0 to simply A.  */
      if ((code == VEC_COND_EXPR ? integer_all_onesp (op1)
                                 : (integer_onep (op1)
                                    && !VECTOR_TYPE_P (type)))
          && integer_zerop (op2)
          /* If we try to convert OP0 to our type, the
             call to fold will try to move the conversion inside
             a COND, which will recurse.  In that case, the COND_EXPR
             is probably the best choice, so leave it alone.  */
          && type == TREE_TYPE (arg0))
        return pedantic_non_lvalue_loc (loc, arg0);

      /* Convert A ? 0 : 1 to !A.  This prefers the use of NOT_EXPR
         over COND_EXPR in cases such as floating point comparisons.  */
      if (integer_zerop (op1)
          && (code == VEC_COND_EXPR ? integer_all_onesp (op2)
                                    : (integer_onep (op2)
                                       && !VECTOR_TYPE_P (type)))
          && truth_value_p (TREE_CODE (arg0)))
        return pedantic_non_lvalue_loc (loc,
                                    fold_convert_loc (loc, type,
                                              invert_truthvalue_loc (loc,
                                                                     arg0)));

are wrong?  I can't say for sure without a testcase.

That said, papering over this in tree-ssa-forwprop.c is not the
correct thing to do.

> Secondly, I did not catch your idea to implement GCC Vector Extension
> for vector comparison with bool result since
> such extension completely depends on comparison context, e.g. for your
> example, result type of comparison depends on using - for
> if-comparison it is scalar, but for c = (a==b) - result type is
> vector. I don't think that this is reasonable for current release.

The idea was to be able to write testcases exercising different EQ/NE vector
compares.  But yes, if that's non-trivial the it's not appropriate for stage3.

Can you add a testcase for the forwprop issue and try to fix the offending
bogus folders instead?

Thanks,
Richard.

> And finally about AMD performance. I checked that this transformation
> works for "-march=bdver4" option and regression for 481.wrf must
> disappear too.
>
> Thanks.
> Yuri.
>
> 2015-12-04 15:18 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Hi All,
>>>
>>> Here is a patch for 481.wrf preformance regression for avx2 which is
>>> sligthly modified mask store optimization. This transformation allows
>>> perform unpredication for semi-hammock containing masked stores, other
>>> words if we have a loop like
>>> for (i=0; i<n; i++)
>>>   if (c[i]) {
>>>     p1[i] += 1;
>>>     p2[i] = p3[i] +2;
>>>   }
>>>
>>> then it will be transformed to
>>>    if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
>>>      vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165);
>>>      vect__12.22_172 = vect__11.19_170 + vect_cst__171;
>>>      MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172);
>>>      vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165);
>>>      vect__19.28_184 = vect__18.25_182 + vect_cst__183;
>>>      MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184);
>>>    }
>>> i.e. it will put all computations related to masked stores to semi-hammock.
>>>
>>> Bootstrapping and regression testing did not show any new failures.
>>
>> Can you please split out the middle-end support for vector equality compares?
>>
>> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
>>        if (TREE_CODE (op0_type) == VECTOR_TYPE
>>           || TREE_CODE (op1_type) == VECTOR_TYPE)
>>          {
>> -          error ("vector comparison returning a boolean");
>> -          debug_generic_expr (op0_type);
>> -          debug_generic_expr (op1_type);
>> -          return true;
>> +         /* Allow vector comparison returning boolean if operand types
>> +            are equal and CODE is EQ/NE.  */
>> +         if ((code != EQ_EXPR && code != NE_EXPR)
>> +             || !(VECTOR_BOOLEAN_TYPE_P (op0_type)
>> +                  || VECTOR_INTEGER_TYPE_P (op0_type)))
>> +           {
>> +             error ("type mismatch for vector comparison returning a boolean");
>> +             debug_generic_expr (op0_type);
>> +             debug_generic_expr (op1_type);
>> +             return true;
>> +           }
>>          }
>>      }
>>
>> please merge the conditions with a &&
>>
>> @@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code,
>> tree type, tree op0, tree op1)
>>
>>    if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>>      {
>> +      if (INTEGRAL_TYPE_P (type)
>> +         && (TREE_CODE (type) == BOOLEAN_TYPE
>> +             || TYPE_PRECISION (type) == 1))
>> +       {
>> +         /* Have vector comparison with scalar boolean result.  */
>> +         bool result = true;
>> +         gcc_assert (code == EQ_EXPR || code == NE_EXPR);
>> +         gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
>> +         for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
>> +           {
>> +             tree elem0 = VECTOR_CST_ELT (op0, i);
>> +             tree elem1 = VECTOR_CST_ELT (op1, i);
>> +             tree tmp = fold_relational_const (code, type, elem0, elem1);
>> +             result &= integer_onep (tmp);
>> +         if (code == NE_EXPR)
>> +           result = !result;
>> +         return constant_boolean_node (result, type);
>>
>> ... just assumes it is either EQ_EXPR or NE_EXPR.   I believe you want
>> to change the
>> guarding condition to just
>>
>>    if (! VECTOR_TYPE_P (type))
>>
>> and assert the boolean/precision.  Please also merge the asserts into
>> one with &&
>>
>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
>> index b82ae3c..73ee3be 100644
>> --- a/gcc/tree-ssa-forwprop.c
>> +++ b/gcc/tree-ssa-forwprop.c
>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>> tree_code code, tree type,
>>
>>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>
>> +  /* Do not perform combining it types are not compatible.  */
>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
>> +    return NULL_TREE;
>> +
>>
>> again, how does this happen?
>>
>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>> index e67048e..1605520c 100644
>> --- a/gcc/tree-vrp.c
>> +++ b/gcc/tree-vrp.c
>> @@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e,
>> gimple_stmt_iterator si,
>>                                                 &comp_code, &val))
>>      return;
>>
>> +  /* Use of vector comparison in gcond is very restricted and used to check
>> +     that the mask in masked store is zero, so assert for such comparison
>> +     is not implemented yet.  */
>> +  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
>> +    return;
>> +
>>
>> VECTOR_TYPE_P
>>
>> I believe the comment should simply say that VRP doesn't track ranges for
>> vector types.
>>
>> In the previous review I suggested you should make sure that RTL expansion
>> ends up using a well-defined optab for these compares.  To make sure
>> this happens across targets I suggest you make these comparisons available
>> via the GCC vector extension.  Thus allow
>>
>> typedef int v4si __attribute__((vector_size(16)));
>>
>> int foo (v4si a, v4si b)
>> {
>>   if (a == b)
>>     return 4;
>> }
>>
>> and != and also using floating point vectors.
>>
>> Otherwise it's hard to see the impact of this change.  Obvious choices
>> are the eq/ne optabs for FP compares and [u]cmp optabs for integer
>> compares.
>>
>> A half-way implementation like your VRP comment suggests (only
>> ==/!= zero against integer vectors is implemented?!) this doesn't sound
>> good without also limiting the feature this way in the verifier.
>>
>> Btw, the regression with WRF is >50% on AMD Bulldozer (which only
>> has AVX, not AVX2).
>>
>> Thanks,
>> Richard.
>>
>>> ChangeLog:
>>> 2015-11-30  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> PR middle-end/68542
>>> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
>>> comparison with boolean result.
>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
>>> for vector comparion with eq/ne only.
>>> * fold-const.c (fold_relational_const): Add handling of vector
>>> comparison with boolean result.
>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>>> comparison of vector operands with boolean result for EQ/NE only.
>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>>> (verify_gimple_cond): Likewise.
>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
>>> combining for non-compatible vector types.
>>> * tree-vect-loop.c (is_valid_sink): New function.
>>> (optimize_mask_stores): Likewise.
>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
>>> has_mask_store field of vect_info.
>>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
>>> vectorized loops having masked stores.
>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
>>> correspondent macros.
>>> (optimize_mask_stores): Add prototype.
>>> * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector
>>> type.
>>>
>>> gcc/testsuite/ChangeLog:
>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR68542]
  2015-12-10 13:36     ` Richard Biener
@ 2015-12-11 14:03       ` Yuri Rumyantsev
  2015-12-16 13:37         ` Richard Biener
  0 siblings, 1 reply; 20+ messages in thread
From: Yuri Rumyantsev @ 2015-12-11 14:03 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 13594 bytes --]

Richard.
Thanks for your review.
I re-designed fix for assert by adding additional checks for vector
comparison with boolean result to fold_binary_op_with_conditional_arg
and remove early exit to combine_cond_expr_cond.
Unfortunately, I am not able to provide you with test-case since it is
in my second patch related to back-end patch which I sent earlier
(12-08).

Bootstrapping and regression testing did not show any new failures.
Is it OK for trunk?

ChangeLog:
2015-12-11  Yuri Rumyantsev  <ysrumyan@gmail.com>

PR middle-end/68542
* fold-const.c (fold_binary_op_with_conditional_arg): Add checks oh
vector comparison with boolean result to avoid ICE.
(fold_relational_const): Add handling of vector
comparison with boolean result.
* tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
comparison of vector operands with boolean result for EQ/NE only.
(verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
(verify_gimple_cond): Likewise.
* tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
combining for non-compatible vector types.
* tree-vrp.c (register_edge_assert_for): VRP does not track ranges for
vector types.

2015-12-10 16:36 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Fri, Dec 4, 2015 at 4:07 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Hi Richard.
>>
>> Thanks a lot for your review.
>> Below are my answers.
>>
>> You asked why I inserted additional check to
>> ++ b/gcc/tree-ssa-forwprop.c
>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>> tree_code code, tree type,
>>
>>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>
>> +  /* Do not perform combining it types are not compatible.  */
>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
>> +    return NULL_TREE;
>> +
>>
>> again, how does this happen?
>>
>> This is because without it I've got assert in fold_convert_loc
>>       gcc_assert (TREE_CODE (orig) == VECTOR_TYPE
>>  && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig)));
>>
>> since it tries to convert vector of bool to scalar bool.
>> Here is essential part of call-stack:
>>
>> #0  internal_error (gmsgid=0x1e48397 "in %s, at %s:%d")
>>     at ../../gcc/diagnostic.c:1259
>> #1  0x0000000001743ada in fancy_abort (
>>     file=0x1847fc3 "../../gcc/fold-const.c", line=2217,
>>     function=0x184b9d0 <fold_convert_loc(unsigned int, tree_node*,
>> tree_node*)::__FUNCTION__> "fold_convert_loc") at
>> ../../gcc/diagnostic.c:1332
>> #2  0x00000000009c8330 in fold_convert_loc (loc=0, type=0x7ffff18a9d20,
>>     arg=0x7ffff1a7f488) at ../../gcc/fold-const.c:2216
>> #3  0x00000000009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR,
>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:11453
>> #4  0x00000000009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR,
>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:12394
>> #5  0x00000000009d870c in fold_binary_op_with_conditional_arg (loc=0,
>>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>>     op1=0x7ffff1a48780, cond=0x7ffff1a7f460, arg=0x7ffff1a48780,
>>     cond_first_p=1) at ../../gcc/fold-const.c:6465
>> #6  0x00000000009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR,
>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff1a48780)
>>     at ../../gcc/fold-const.c:9211
>> #7  0x0000000000ecb8fa in combine_cond_expr_cond (stmt=0x7ffff1a487d0,
>>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>>     op1=0x7ffff1a48780, invariant_only=true)
>>     at ../../gcc/tree-ssa-forwprop.c:382
>
> Ok, but that only shows that
>
>       /* Convert A ? 1 : 0 to simply A.  */
>       if ((code == VEC_COND_EXPR ? integer_all_onesp (op1)
>                                  : (integer_onep (op1)
>                                     && !VECTOR_TYPE_P (type)))
>           && integer_zerop (op2)
>           /* If we try to convert OP0 to our type, the
>              call to fold will try to move the conversion inside
>              a COND, which will recurse.  In that case, the COND_EXPR
>              is probably the best choice, so leave it alone.  */
>           && type == TREE_TYPE (arg0))
>         return pedantic_non_lvalue_loc (loc, arg0);
>
>       /* Convert A ? 0 : 1 to !A.  This prefers the use of NOT_EXPR
>          over COND_EXPR in cases such as floating point comparisons.  */
>       if (integer_zerop (op1)
>           && (code == VEC_COND_EXPR ? integer_all_onesp (op2)
>                                     : (integer_onep (op2)
>                                        && !VECTOR_TYPE_P (type)))
>           && truth_value_p (TREE_CODE (arg0)))
>         return pedantic_non_lvalue_loc (loc,
>                                     fold_convert_loc (loc, type,
>                                               invert_truthvalue_loc (loc,
>                                                                      arg0)));
>
> are wrong?  I can't say for sure without a testcase.
>
> That said, papering over this in tree-ssa-forwprop.c is not the
> correct thing to do.
>
>> Secondly, I did not catch your idea to implement GCC Vector Extension
>> for vector comparison with bool result since
>> such extension completely depends on comparison context, e.g. for your
>> example, result type of comparison depends on using - for
>> if-comparison it is scalar, but for c = (a==b) - result type is
>> vector. I don't think that this is reasonable for current release.
>
> The idea was to be able to write testcases exercising different EQ/NE vector
> compares.  But yes, if that's non-trivial the it's not appropriate for stage3.
>
> Can you add a testcase for the forwprop issue and try to fix the offending
> bogus folders instead?
>
> Thanks,
> Richard.
>
>> And finally about AMD performance. I checked that this transformation
>> works for "-march=bdver4" option and regression for 481.wrf must
>> disappear too.
>>
>> Thanks.
>> Yuri.
>>
>> 2015-12-04 15:18 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Hi All,
>>>>
>>>> Here is a patch for 481.wrf preformance regression for avx2 which is
>>>> sligthly modified mask store optimization. This transformation allows
>>>> perform unpredication for semi-hammock containing masked stores, other
>>>> words if we have a loop like
>>>> for (i=0; i<n; i++)
>>>>   if (c[i]) {
>>>>     p1[i] += 1;
>>>>     p2[i] = p3[i] +2;
>>>>   }
>>>>
>>>> then it will be transformed to
>>>>    if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
>>>>      vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165);
>>>>      vect__12.22_172 = vect__11.19_170 + vect_cst__171;
>>>>      MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172);
>>>>      vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165);
>>>>      vect__19.28_184 = vect__18.25_182 + vect_cst__183;
>>>>      MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184);
>>>>    }
>>>> i.e. it will put all computations related to masked stores to semi-hammock.
>>>>
>>>> Bootstrapping and regression testing did not show any new failures.
>>>
>>> Can you please split out the middle-end support for vector equality compares?
>>>
>>> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
>>>        if (TREE_CODE (op0_type) == VECTOR_TYPE
>>>           || TREE_CODE (op1_type) == VECTOR_TYPE)
>>>          {
>>> -          error ("vector comparison returning a boolean");
>>> -          debug_generic_expr (op0_type);
>>> -          debug_generic_expr (op1_type);
>>> -          return true;
>>> +         /* Allow vector comparison returning boolean if operand types
>>> +            are equal and CODE is EQ/NE.  */
>>> +         if ((code != EQ_EXPR && code != NE_EXPR)
>>> +             || !(VECTOR_BOOLEAN_TYPE_P (op0_type)
>>> +                  || VECTOR_INTEGER_TYPE_P (op0_type)))
>>> +           {
>>> +             error ("type mismatch for vector comparison returning a boolean");
>>> +             debug_generic_expr (op0_type);
>>> +             debug_generic_expr (op1_type);
>>> +             return true;
>>> +           }
>>>          }
>>>      }
>>>
>>> please merge the conditions with a &&
>>>
>>> @@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code,
>>> tree type, tree op0, tree op1)
>>>
>>>    if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>>>      {
>>> +      if (INTEGRAL_TYPE_P (type)
>>> +         && (TREE_CODE (type) == BOOLEAN_TYPE
>>> +             || TYPE_PRECISION (type) == 1))
>>> +       {
>>> +         /* Have vector comparison with scalar boolean result.  */
>>> +         bool result = true;
>>> +         gcc_assert (code == EQ_EXPR || code == NE_EXPR);
>>> +         gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
>>> +         for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
>>> +           {
>>> +             tree elem0 = VECTOR_CST_ELT (op0, i);
>>> +             tree elem1 = VECTOR_CST_ELT (op1, i);
>>> +             tree tmp = fold_relational_const (code, type, elem0, elem1);
>>> +             result &= integer_onep (tmp);
>>> +         if (code == NE_EXPR)
>>> +           result = !result;
>>> +         return constant_boolean_node (result, type);
>>>
>>> ... just assumes it is either EQ_EXPR or NE_EXPR.   I believe you want
>>> to change the
>>> guarding condition to just
>>>
>>>    if (! VECTOR_TYPE_P (type))
>>>
>>> and assert the boolean/precision.  Please also merge the asserts into
>>> one with &&
>>>
>>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
>>> index b82ae3c..73ee3be 100644
>>> --- a/gcc/tree-ssa-forwprop.c
>>> +++ b/gcc/tree-ssa-forwprop.c
>>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>>> tree_code code, tree type,
>>>
>>>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>>
>>> +  /* Do not perform combining it types are not compatible.  */
>>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>>> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
>>> +    return NULL_TREE;
>>> +
>>>
>>> again, how does this happen?
>>>
>>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>>> index e67048e..1605520c 100644
>>> --- a/gcc/tree-vrp.c
>>> +++ b/gcc/tree-vrp.c
>>> @@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e,
>>> gimple_stmt_iterator si,
>>>                                                 &comp_code, &val))
>>>      return;
>>>
>>> +  /* Use of vector comparison in gcond is very restricted and used to check
>>> +     that the mask in masked store is zero, so assert for such comparison
>>> +     is not implemented yet.  */
>>> +  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
>>> +    return;
>>> +
>>>
>>> VECTOR_TYPE_P
>>>
>>> I believe the comment should simply say that VRP doesn't track ranges for
>>> vector types.
>>>
>>> In the previous review I suggested you should make sure that RTL expansion
>>> ends up using a well-defined optab for these compares.  To make sure
>>> this happens across targets I suggest you make these comparisons available
>>> via the GCC vector extension.  Thus allow
>>>
>>> typedef int v4si __attribute__((vector_size(16)));
>>>
>>> int foo (v4si a, v4si b)
>>> {
>>>   if (a == b)
>>>     return 4;
>>> }
>>>
>>> and != and also using floating point vectors.
>>>
>>> Otherwise it's hard to see the impact of this change.  Obvious choices
>>> are the eq/ne optabs for FP compares and [u]cmp optabs for integer
>>> compares.
>>>
>>> A half-way implementation like your VRP comment suggests (only
>>> ==/!= zero against integer vectors is implemented?!) this doesn't sound
>>> good without also limiting the feature this way in the verifier.
>>>
>>> Btw, the regression with WRF is >50% on AMD Bulldozer (which only
>>> has AVX, not AVX2).
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> ChangeLog:
>>>> 2015-11-30  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> PR middle-end/68542
>>>> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
>>>> comparison with boolean result.
>>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
>>>> for vector comparion with eq/ne only.
>>>> * fold-const.c (fold_relational_const): Add handling of vector
>>>> comparison with boolean result.
>>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>>>> comparison of vector operands with boolean result for EQ/NE only.
>>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>>>> (verify_gimple_cond): Likewise.
>>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
>>>> combining for non-compatible vector types.
>>>> * tree-vect-loop.c (is_valid_sink): New function.
>>>> (optimize_mask_stores): Likewise.
>>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
>>>> has_mask_store field of vect_info.
>>>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
>>>> vectorized loops having masked stores.
>>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
>>>> correspondent macros.
>>>> (optimize_mask_stores): Add prototype.
>>>> * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector
>>>> type.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

[-- Attachment #2: PR68542.middle-end.patch2 --]
[-- Type: application/octet-stream, Size: 4998 bytes --]

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 698062e..1a9ed5f3 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -6420,15 +6420,23 @@ fold_binary_op_with_conditional_arg (location_t loc,
       if (VOID_TYPE_P (TREE_TYPE (false_value)))
 	rhs = false_value;
     }
-  else
+  else if (TREE_CODE (type) == VECTOR_TYPE)
     {
       tree testtype = TREE_TYPE (cond);
       test = cond;
       true_value = constant_boolean_node (true, testtype);
       false_value = constant_boolean_node (false, testtype);
     }
+  else
+    {
+      test = cond;
+      cond_type = type;
+      true_value = boolean_true_node;
+      false_value = boolean_false_node;
+    }
 
-  if (TREE_CODE (TREE_TYPE (test)) == VECTOR_TYPE)
+  if (TREE_CODE (TREE_TYPE (test)) == VECTOR_TYPE
+      && TREE_CODE (type) == VECTOR_TYPE)
     cond_code = VEC_COND_EXPR;
 
   /* This transformation is only worthwhile if we don't have to wrap ARG
@@ -11445,7 +11453,8 @@ fold_ternary_loc (location_t loc, enum tree_code code, tree type,
       if (integer_zerop (op1)
 	  && (code == VEC_COND_EXPR ? integer_all_onesp (op2)
 				    : (integer_onep (op2)
-				       && !VECTOR_TYPE_P (type)))
+				       && !VECTOR_TYPE_P (type)
+				       && !VECTOR_TYPE_P (TREE_TYPE (arg0))))
 	  && truth_value_p (TREE_CODE (arg0)))
 	return pedantic_non_lvalue_loc (loc,
 				    fold_convert_loc (loc, type,
@@ -13888,6 +13897,23 @@ fold_relational_const (enum tree_code code, tree type, tree op0, tree op1)
 
   if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
     {
+      if (!VECTOR_TYPE_P (type))
+	{
+	  /* Have vector comparison with scalar boolean result.  */
+	  bool result = true;
+	  gcc_assert ((code == EQ_EXPR || code == NE_EXPR)
+		      && VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
+	  for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
+	    {
+	      tree elem0 = VECTOR_CST_ELT (op0, i);
+	      tree elem1 = VECTOR_CST_ELT (op1, i);
+	      tree tmp = fold_relational_const (code, type, elem0, elem1);
+	      result &= integer_onep (tmp);
+	    }
+	  if (code == NE_EXPR)
+	    result = !result;
+	  return constant_boolean_node (result, type);
+	}
       unsigned count = VECTOR_CST_NELTS (op0);
       tree *elts =  XALLOCAVEC (tree, count);
       gcc_assert (VECTOR_CST_NELTS (op1) == count
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 0c624aa..770b35b 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3408,10 +3408,10 @@ verify_gimple_call (gcall *stmt)
 }
 
 /* Verifies the gimple comparison with the result type TYPE and
-   the operands OP0 and OP1.  */
+   the operands OP0 and OP1, comparison code is CODE.  */
 
 static bool
-verify_gimple_comparison (tree type, tree op0, tree op1)
+verify_gimple_comparison (tree type, tree op0, tree op1, enum tree_code code)
 {
   tree op0_type = TREE_TYPE (op0);
   tree op1_type = TREE_TYPE (op1);
@@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
       if (TREE_CODE (op0_type) == VECTOR_TYPE
 	  || TREE_CODE (op1_type) == VECTOR_TYPE)
         {
-          error ("vector comparison returning a boolean");
-          debug_generic_expr (op0_type);
-          debug_generic_expr (op1_type);
-          return true;
+	  /* Allow vector comparison returning boolean if operand types
+	     are boolean or integral and CODE is EQ/NE.  */
+	  if (code != EQ_EXPR && code != NE_EXPR
+	      && !VECTOR_BOOLEAN_TYPE_P (op0_type)
+	      && !VECTOR_INTEGER_TYPE_P (op0_type))
+	    {
+	      error ("type mismatch for vector comparison returning a boolean");
+	      debug_generic_expr (op0_type);
+	      debug_generic_expr (op1_type);
+	      return true;
+	    }
         }
     }
   /* Or a boolean vector type with the same element count
@@ -3832,7 +3839,7 @@ verify_gimple_assign_binary (gassign *stmt)
     case LTGT_EXPR:
       /* Comparisons are also binary, but the result type is not
 	 connected to the operand types.  */
-      return verify_gimple_comparison (lhs_type, rhs1, rhs2);
+      return verify_gimple_comparison (lhs_type, rhs1, rhs2, rhs_code);
 
     case WIDEN_MULT_EXPR:
       if (TREE_CODE (lhs_type) != INTEGER_TYPE)
@@ -4541,7 +4548,8 @@ verify_gimple_cond (gcond *stmt)
 
   return verify_gimple_comparison (boolean_type_node,
 				   gimple_cond_lhs (stmt),
-				   gimple_cond_rhs (stmt));
+				   gimple_cond_rhs (stmt),
+				   gimple_cond_code (stmt));
 }
 
 /* Verify the GIMPLE statement STMT.  Returns true if there is an
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index e67048e..e8cdb1d 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -5760,6 +5760,10 @@ register_edge_assert_for (tree name, edge e, gimple_stmt_iterator si,
 						&comp_code, &val))
     return;
 
+  /* VRP doesn't track ranges for vector types.  */
+  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
+    return;
+
   /* Register ASSERT_EXPRs for name.  */
   register_edge_assert_for_2 (name, e, si, cond_code, cond_op0,
 			      cond_op1, is_else_edge);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR68542]
  2015-12-11 14:03       ` Yuri Rumyantsev
@ 2015-12-16 13:37         ` Richard Biener
  2015-12-18 10:20           ` Yuri Rumyantsev
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Biener @ 2015-12-16 13:37 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin, Kirill Yukhin

On Fri, Dec 11, 2015 at 3:03 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard.
> Thanks for your review.
> I re-designed fix for assert by adding additional checks for vector
> comparison with boolean result to fold_binary_op_with_conditional_arg
> and remove early exit to combine_cond_expr_cond.
> Unfortunately, I am not able to provide you with test-case since it is
> in my second patch related to back-end patch which I sent earlier
> (12-08).
>
> Bootstrapping and regression testing did not show any new failures.
> Is it OK for trunk?

+  else if (TREE_CODE (type) == VECTOR_TYPE)
     {
       tree testtype = TREE_TYPE (cond);
       test = cond;
       true_value = constant_boolean_node (true, testtype);
       false_value = constant_boolean_node (false, testtype);
     }
+  else
+    {
+      test = cond;
+      cond_type = type;
+      true_value = boolean_true_node;
+      false_value = boolean_false_node;
+    }

So this is, say, vec1 != vec2 with scalar vs. vector result.  If we have
scalar result and thus, say, scalar + vec1 != vec2.  I believe rather
than doing the above (not seeing how this not would generate wrong
code eventually) we should simply detect the case of mixing vector
and scalar types and bail out.  At least without some comments
your patch makes the function even more difficult to understand than
it is already.

@@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
       if (TREE_CODE (op0_type) == VECTOR_TYPE
          || TREE_CODE (op1_type) == VECTOR_TYPE)
         {
-          error ("vector comparison returning a boolean");
-          debug_generic_expr (op0_type);
-          debug_generic_expr (op1_type);
-          return true;
+         /* Allow vector comparison returning boolean if operand types
+            are boolean or integral and CODE is EQ/NE.  */
+         if (code != EQ_EXPR && code != NE_EXPR
+             && !VECTOR_BOOLEAN_TYPE_P (op0_type)
+             && !VECTOR_INTEGER_TYPE_P (op0_type))
+           {
+             error ("type mismatch for vector comparison returning a boolean");
+             debug_generic_expr (op0_type);
+             debug_generic_expr (op1_type);
+             return true;
+           }
         }
     }
   /* Or a boolean vector type with the same element count

as said before please merge the cascaded if()s.  Better wording for
the error is "unsupported operation or type for vector comparison
returning a boolean"

Otherwise the patch looks sensible to me though it shows that overloading of
EQ/NE_EXPR for scalar result and vector operands might have some more unexpected
fallout (which is why I originally prefered the view-convert to large
integer type variant).

Thanks,
Richard.


> ChangeLog:
> 2015-12-11  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> PR middle-end/68542
> * fold-const.c (fold_binary_op_with_conditional_arg): Add checks oh
> vector comparison with boolean result to avoid ICE.
> (fold_relational_const): Add handling of vector
> comparison with boolean result.
> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
> comparison of vector operands with boolean result for EQ/NE only.
> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
> (verify_gimple_cond): Likewise.
> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
> combining for non-compatible vector types.
> * tree-vrp.c (register_edge_assert_for): VRP does not track ranges for
> vector types.
>
> 2015-12-10 16:36 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Fri, Dec 4, 2015 at 4:07 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Hi Richard.
>>>
>>> Thanks a lot for your review.
>>> Below are my answers.
>>>
>>> You asked why I inserted additional check to
>>> ++ b/gcc/tree-ssa-forwprop.c
>>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>>> tree_code code, tree type,
>>>
>>>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>>
>>> +  /* Do not perform combining it types are not compatible.  */
>>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>>> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
>>> +    return NULL_TREE;
>>> +
>>>
>>> again, how does this happen?
>>>
>>> This is because without it I've got assert in fold_convert_loc
>>>       gcc_assert (TREE_CODE (orig) == VECTOR_TYPE
>>>  && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig)));
>>>
>>> since it tries to convert vector of bool to scalar bool.
>>> Here is essential part of call-stack:
>>>
>>> #0  internal_error (gmsgid=0x1e48397 "in %s, at %s:%d")
>>>     at ../../gcc/diagnostic.c:1259
>>> #1  0x0000000001743ada in fancy_abort (
>>>     file=0x1847fc3 "../../gcc/fold-const.c", line=2217,
>>>     function=0x184b9d0 <fold_convert_loc(unsigned int, tree_node*,
>>> tree_node*)::__FUNCTION__> "fold_convert_loc") at
>>> ../../gcc/diagnostic.c:1332
>>> #2  0x00000000009c8330 in fold_convert_loc (loc=0, type=0x7ffff18a9d20,
>>>     arg=0x7ffff1a7f488) at ../../gcc/fold-const.c:2216
>>> #3  0x00000000009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR,
>>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>>>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:11453
>>> #4  0x00000000009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR,
>>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>>>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:12394
>>> #5  0x00000000009d870c in fold_binary_op_with_conditional_arg (loc=0,
>>>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>>>     op1=0x7ffff1a48780, cond=0x7ffff1a7f460, arg=0x7ffff1a48780,
>>>     cond_first_p=1) at ../../gcc/fold-const.c:6465
>>> #6  0x00000000009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR,
>>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff1a48780)
>>>     at ../../gcc/fold-const.c:9211
>>> #7  0x0000000000ecb8fa in combine_cond_expr_cond (stmt=0x7ffff1a487d0,
>>>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>>>     op1=0x7ffff1a48780, invariant_only=true)
>>>     at ../../gcc/tree-ssa-forwprop.c:382
>>
>> Ok, but that only shows that
>>
>>       /* Convert A ? 1 : 0 to simply A.  */
>>       if ((code == VEC_COND_EXPR ? integer_all_onesp (op1)
>>                                  : (integer_onep (op1)
>>                                     && !VECTOR_TYPE_P (type)))
>>           && integer_zerop (op2)
>>           /* If we try to convert OP0 to our type, the
>>              call to fold will try to move the conversion inside
>>              a COND, which will recurse.  In that case, the COND_EXPR
>>              is probably the best choice, so leave it alone.  */
>>           && type == TREE_TYPE (arg0))
>>         return pedantic_non_lvalue_loc (loc, arg0);
>>
>>       /* Convert A ? 0 : 1 to !A.  This prefers the use of NOT_EXPR
>>          over COND_EXPR in cases such as floating point comparisons.  */
>>       if (integer_zerop (op1)
>>           && (code == VEC_COND_EXPR ? integer_all_onesp (op2)
>>                                     : (integer_onep (op2)
>>                                        && !VECTOR_TYPE_P (type)))
>>           && truth_value_p (TREE_CODE (arg0)))
>>         return pedantic_non_lvalue_loc (loc,
>>                                     fold_convert_loc (loc, type,
>>                                               invert_truthvalue_loc (loc,
>>                                                                      arg0)));
>>
>> are wrong?  I can't say for sure without a testcase.
>>
>> That said, papering over this in tree-ssa-forwprop.c is not the
>> correct thing to do.
>>
>>> Secondly, I did not catch your idea to implement GCC Vector Extension
>>> for vector comparison with bool result since
>>> such extension completely depends on comparison context, e.g. for your
>>> example, result type of comparison depends on using - for
>>> if-comparison it is scalar, but for c = (a==b) - result type is
>>> vector. I don't think that this is reasonable for current release.
>>
>> The idea was to be able to write testcases exercising different EQ/NE vector
>> compares.  But yes, if that's non-trivial the it's not appropriate for stage3.
>>
>> Can you add a testcase for the forwprop issue and try to fix the offending
>> bogus folders instead?
>>
>> Thanks,
>> Richard.
>>
>>> And finally about AMD performance. I checked that this transformation
>>> works for "-march=bdver4" option and regression for 481.wrf must
>>> disappear too.
>>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2015-12-04 15:18 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Hi All,
>>>>>
>>>>> Here is a patch for 481.wrf preformance regression for avx2 which is
>>>>> sligthly modified mask store optimization. This transformation allows
>>>>> perform unpredication for semi-hammock containing masked stores, other
>>>>> words if we have a loop like
>>>>> for (i=0; i<n; i++)
>>>>>   if (c[i]) {
>>>>>     p1[i] += 1;
>>>>>     p2[i] = p3[i] +2;
>>>>>   }
>>>>>
>>>>> then it will be transformed to
>>>>>    if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
>>>>>      vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165);
>>>>>      vect__12.22_172 = vect__11.19_170 + vect_cst__171;
>>>>>      MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172);
>>>>>      vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165);
>>>>>      vect__19.28_184 = vect__18.25_182 + vect_cst__183;
>>>>>      MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184);
>>>>>    }
>>>>> i.e. it will put all computations related to masked stores to semi-hammock.
>>>>>
>>>>> Bootstrapping and regression testing did not show any new failures.
>>>>
>>>> Can you please split out the middle-end support for vector equality compares?
>>>>
>>>> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
>>>>        if (TREE_CODE (op0_type) == VECTOR_TYPE
>>>>           || TREE_CODE (op1_type) == VECTOR_TYPE)
>>>>          {
>>>> -          error ("vector comparison returning a boolean");
>>>> -          debug_generic_expr (op0_type);
>>>> -          debug_generic_expr (op1_type);
>>>> -          return true;
>>>> +         /* Allow vector comparison returning boolean if operand types
>>>> +            are equal and CODE is EQ/NE.  */
>>>> +         if ((code != EQ_EXPR && code != NE_EXPR)
>>>> +             || !(VECTOR_BOOLEAN_TYPE_P (op0_type)
>>>> +                  || VECTOR_INTEGER_TYPE_P (op0_type)))
>>>> +           {
>>>> +             error ("type mismatch for vector comparison returning a boolean");
>>>> +             debug_generic_expr (op0_type);
>>>> +             debug_generic_expr (op1_type);
>>>> +             return true;
>>>> +           }
>>>>          }
>>>>      }
>>>>
>>>> please merge the conditions with a &&
>>>>
>>>> @@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code,
>>>> tree type, tree op0, tree op1)
>>>>
>>>>    if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>>>>      {
>>>> +      if (INTEGRAL_TYPE_P (type)
>>>> +         && (TREE_CODE (type) == BOOLEAN_TYPE
>>>> +             || TYPE_PRECISION (type) == 1))
>>>> +       {
>>>> +         /* Have vector comparison with scalar boolean result.  */
>>>> +         bool result = true;
>>>> +         gcc_assert (code == EQ_EXPR || code == NE_EXPR);
>>>> +         gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
>>>> +         for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
>>>> +           {
>>>> +             tree elem0 = VECTOR_CST_ELT (op0, i);
>>>> +             tree elem1 = VECTOR_CST_ELT (op1, i);
>>>> +             tree tmp = fold_relational_const (code, type, elem0, elem1);
>>>> +             result &= integer_onep (tmp);
>>>> +         if (code == NE_EXPR)
>>>> +           result = !result;
>>>> +         return constant_boolean_node (result, type);
>>>>
>>>> ... just assumes it is either EQ_EXPR or NE_EXPR.   I believe you want
>>>> to change the
>>>> guarding condition to just
>>>>
>>>>    if (! VECTOR_TYPE_P (type))
>>>>
>>>> and assert the boolean/precision.  Please also merge the asserts into
>>>> one with &&
>>>>
>>>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
>>>> index b82ae3c..73ee3be 100644
>>>> --- a/gcc/tree-ssa-forwprop.c
>>>> +++ b/gcc/tree-ssa-forwprop.c
>>>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>>>> tree_code code, tree type,
>>>>
>>>>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>>>
>>>> +  /* Do not perform combining it types are not compatible.  */
>>>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>>>> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
>>>> +    return NULL_TREE;
>>>> +
>>>>
>>>> again, how does this happen?
>>>>
>>>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>>>> index e67048e..1605520c 100644
>>>> --- a/gcc/tree-vrp.c
>>>> +++ b/gcc/tree-vrp.c
>>>> @@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e,
>>>> gimple_stmt_iterator si,
>>>>                                                 &comp_code, &val))
>>>>      return;
>>>>
>>>> +  /* Use of vector comparison in gcond is very restricted and used to check
>>>> +     that the mask in masked store is zero, so assert for such comparison
>>>> +     is not implemented yet.  */
>>>> +  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
>>>> +    return;
>>>> +
>>>>
>>>> VECTOR_TYPE_P
>>>>
>>>> I believe the comment should simply say that VRP doesn't track ranges for
>>>> vector types.
>>>>
>>>> In the previous review I suggested you should make sure that RTL expansion
>>>> ends up using a well-defined optab for these compares.  To make sure
>>>> this happens across targets I suggest you make these comparisons available
>>>> via the GCC vector extension.  Thus allow
>>>>
>>>> typedef int v4si __attribute__((vector_size(16)));
>>>>
>>>> int foo (v4si a, v4si b)
>>>> {
>>>>   if (a == b)
>>>>     return 4;
>>>> }
>>>>
>>>> and != and also using floating point vectors.
>>>>
>>>> Otherwise it's hard to see the impact of this change.  Obvious choices
>>>> are the eq/ne optabs for FP compares and [u]cmp optabs for integer
>>>> compares.
>>>>
>>>> A half-way implementation like your VRP comment suggests (only
>>>> ==/!= zero against integer vectors is implemented?!) this doesn't sound
>>>> good without also limiting the feature this way in the verifier.
>>>>
>>>> Btw, the regression with WRF is >50% on AMD Bulldozer (which only
>>>> has AVX, not AVX2).
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> ChangeLog:
>>>>> 2015-11-30  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>
>>>>> PR middle-end/68542
>>>>> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
>>>>> comparison with boolean result.
>>>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
>>>>> for vector comparion with eq/ne only.
>>>>> * fold-const.c (fold_relational_const): Add handling of vector
>>>>> comparison with boolean result.
>>>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>>>>> comparison of vector operands with boolean result for EQ/NE only.
>>>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>>>>> (verify_gimple_cond): Likewise.
>>>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
>>>>> combining for non-compatible vector types.
>>>>> * tree-vect-loop.c (is_valid_sink): New function.
>>>>> (optimize_mask_stores): Likewise.
>>>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
>>>>> has_mask_store field of vect_info.
>>>>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
>>>>> vectorized loops having masked stores.
>>>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
>>>>> correspondent macros.
>>>>> (optimize_mask_stores): Add prototype.
>>>>> * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector
>>>>> type.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR68542]
  2015-12-16 13:37         ` Richard Biener
@ 2015-12-18 10:20           ` Yuri Rumyantsev
  2016-01-11 10:06             ` Yuri Rumyantsev
  0 siblings, 1 reply; 20+ messages in thread
From: Yuri Rumyantsev @ 2015-12-18 10:20 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 17616 bytes --]

Hi Richard,

Here is updated patch for middle-end part of the whole patch which
fixes all your remarks I hope.

Regression testing and bootstrapping did not show any new failures.
Is it OK for trunk?

Yuri.

ChangeLog:
2015-12-18  Yuri Rumyantsev  <ysrumyan@gmail.com>

PR middle-end/68542
* fold-const.c (fold_binary_op_with_conditional_arg): Bail out for case
of mixind vector and scalar types.
(fold_relational_const): Add handling of vector
comparison with boolean result.
* tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
comparison of vector operands with boolean result for EQ/NE only.
(verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
(verify_gimple_cond): Likewise.
* tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
combining for non-compatible vector types.
* tree-vrp.c (register_edge_assert_for): VRP does not track ranges for
vector types.

2015-12-16 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Fri, Dec 11, 2015 at 3:03 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard.
>> Thanks for your review.
>> I re-designed fix for assert by adding additional checks for vector
>> comparison with boolean result to fold_binary_op_with_conditional_arg
>> and remove early exit to combine_cond_expr_cond.
>> Unfortunately, I am not able to provide you with test-case since it is
>> in my second patch related to back-end patch which I sent earlier
>> (12-08).
>>
>> Bootstrapping and regression testing did not show any new failures.
>> Is it OK for trunk?
>
> +  else if (TREE_CODE (type) == VECTOR_TYPE)
>      {
>        tree testtype = TREE_TYPE (cond);
>        test = cond;
>        true_value = constant_boolean_node (true, testtype);
>        false_value = constant_boolean_node (false, testtype);
>      }
> +  else
> +    {
> +      test = cond;
> +      cond_type = type;
> +      true_value = boolean_true_node;
> +      false_value = boolean_false_node;
> +    }
>
> So this is, say, vec1 != vec2 with scalar vs. vector result.  If we have
> scalar result and thus, say, scalar + vec1 != vec2.  I believe rather
> than doing the above (not seeing how this not would generate wrong
> code eventually) we should simply detect the case of mixing vector
> and scalar types and bail out.  At least without some comments
> your patch makes the function even more difficult to understand than
> it is already.
>
> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
>        if (TREE_CODE (op0_type) == VECTOR_TYPE
>           || TREE_CODE (op1_type) == VECTOR_TYPE)
>          {
> -          error ("vector comparison returning a boolean");
> -          debug_generic_expr (op0_type);
> -          debug_generic_expr (op1_type);
> -          return true;
> +         /* Allow vector comparison returning boolean if operand types
> +            are boolean or integral and CODE is EQ/NE.  */
> +         if (code != EQ_EXPR && code != NE_EXPR
> +             && !VECTOR_BOOLEAN_TYPE_P (op0_type)
> +             && !VECTOR_INTEGER_TYPE_P (op0_type))
> +           {
> +             error ("type mismatch for vector comparison returning a boolean");
> +             debug_generic_expr (op0_type);
> +             debug_generic_expr (op1_type);
> +             return true;
> +           }
>          }
>      }
>    /* Or a boolean vector type with the same element count
>
> as said before please merge the cascaded if()s.  Better wording for
> the error is "unsupported operation or type for vector comparison
> returning a boolean"
>
> Otherwise the patch looks sensible to me though it shows that overloading of
> EQ/NE_EXPR for scalar result and vector operands might have some more unexpected
> fallout (which is why I originally prefered the view-convert to large
> integer type variant).
>
> Thanks,
> Richard.
>
>
>> ChangeLog:
>> 2015-12-11  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> PR middle-end/68542
>> * fold-const.c (fold_binary_op_with_conditional_arg): Add checks oh
>> vector comparison with boolean result to avoid ICE.
>> (fold_relational_const): Add handling of vector
>> comparison with boolean result.
>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>> comparison of vector operands with boolean result for EQ/NE only.
>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>> (verify_gimple_cond): Likewise.
>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
>> combining for non-compatible vector types.
>> * tree-vrp.c (register_edge_assert_for): VRP does not track ranges for
>> vector types.
>>
>> 2015-12-10 16:36 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Fri, Dec 4, 2015 at 4:07 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Hi Richard.
>>>>
>>>> Thanks a lot for your review.
>>>> Below are my answers.
>>>>
>>>> You asked why I inserted additional check to
>>>> ++ b/gcc/tree-ssa-forwprop.c
>>>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>>>> tree_code code, tree type,
>>>>
>>>>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>>>
>>>> +  /* Do not perform combining it types are not compatible.  */
>>>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>>>> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
>>>> +    return NULL_TREE;
>>>> +
>>>>
>>>> again, how does this happen?
>>>>
>>>> This is because without it I've got assert in fold_convert_loc
>>>>       gcc_assert (TREE_CODE (orig) == VECTOR_TYPE
>>>>  && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig)));
>>>>
>>>> since it tries to convert vector of bool to scalar bool.
>>>> Here is essential part of call-stack:
>>>>
>>>> #0  internal_error (gmsgid=0x1e48397 "in %s, at %s:%d")
>>>>     at ../../gcc/diagnostic.c:1259
>>>> #1  0x0000000001743ada in fancy_abort (
>>>>     file=0x1847fc3 "../../gcc/fold-const.c", line=2217,
>>>>     function=0x184b9d0 <fold_convert_loc(unsigned int, tree_node*,
>>>> tree_node*)::__FUNCTION__> "fold_convert_loc") at
>>>> ../../gcc/diagnostic.c:1332
>>>> #2  0x00000000009c8330 in fold_convert_loc (loc=0, type=0x7ffff18a9d20,
>>>>     arg=0x7ffff1a7f488) at ../../gcc/fold-const.c:2216
>>>> #3  0x00000000009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR,
>>>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>>>>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:11453
>>>> #4  0x00000000009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR,
>>>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>>>>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:12394
>>>> #5  0x00000000009d870c in fold_binary_op_with_conditional_arg (loc=0,
>>>>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>>>>     op1=0x7ffff1a48780, cond=0x7ffff1a7f460, arg=0x7ffff1a48780,
>>>>     cond_first_p=1) at ../../gcc/fold-const.c:6465
>>>> #6  0x00000000009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR,
>>>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff1a48780)
>>>>     at ../../gcc/fold-const.c:9211
>>>> #7  0x0000000000ecb8fa in combine_cond_expr_cond (stmt=0x7ffff1a487d0,
>>>>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>>>>     op1=0x7ffff1a48780, invariant_only=true)
>>>>     at ../../gcc/tree-ssa-forwprop.c:382
>>>
>>> Ok, but that only shows that
>>>
>>>       /* Convert A ? 1 : 0 to simply A.  */
>>>       if ((code == VEC_COND_EXPR ? integer_all_onesp (op1)
>>>                                  : (integer_onep (op1)
>>>                                     && !VECTOR_TYPE_P (type)))
>>>           && integer_zerop (op2)
>>>           /* If we try to convert OP0 to our type, the
>>>              call to fold will try to move the conversion inside
>>>              a COND, which will recurse.  In that case, the COND_EXPR
>>>              is probably the best choice, so leave it alone.  */
>>>           && type == TREE_TYPE (arg0))
>>>         return pedantic_non_lvalue_loc (loc, arg0);
>>>
>>>       /* Convert A ? 0 : 1 to !A.  This prefers the use of NOT_EXPR
>>>          over COND_EXPR in cases such as floating point comparisons.  */
>>>       if (integer_zerop (op1)
>>>           && (code == VEC_COND_EXPR ? integer_all_onesp (op2)
>>>                                     : (integer_onep (op2)
>>>                                        && !VECTOR_TYPE_P (type)))
>>>           && truth_value_p (TREE_CODE (arg0)))
>>>         return pedantic_non_lvalue_loc (loc,
>>>                                     fold_convert_loc (loc, type,
>>>                                               invert_truthvalue_loc (loc,
>>>                                                                      arg0)));
>>>
>>> are wrong?  I can't say for sure without a testcase.
>>>
>>> That said, papering over this in tree-ssa-forwprop.c is not the
>>> correct thing to do.
>>>
>>>> Secondly, I did not catch your idea to implement GCC Vector Extension
>>>> for vector comparison with bool result since
>>>> such extension completely depends on comparison context, e.g. for your
>>>> example, result type of comparison depends on using - for
>>>> if-comparison it is scalar, but for c = (a==b) - result type is
>>>> vector. I don't think that this is reasonable for current release.
>>>
>>> The idea was to be able to write testcases exercising different EQ/NE vector
>>> compares.  But yes, if that's non-trivial the it's not appropriate for stage3.
>>>
>>> Can you add a testcase for the forwprop issue and try to fix the offending
>>> bogus folders instead?
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> And finally about AMD performance. I checked that this transformation
>>>> works for "-march=bdver4" option and regression for 481.wrf must
>>>> disappear too.
>>>>
>>>> Thanks.
>>>> Yuri.
>>>>
>>>> 2015-12-04 15:18 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> Here is a patch for 481.wrf preformance regression for avx2 which is
>>>>>> sligthly modified mask store optimization. This transformation allows
>>>>>> perform unpredication for semi-hammock containing masked stores, other
>>>>>> words if we have a loop like
>>>>>> for (i=0; i<n; i++)
>>>>>>   if (c[i]) {
>>>>>>     p1[i] += 1;
>>>>>>     p2[i] = p3[i] +2;
>>>>>>   }
>>>>>>
>>>>>> then it will be transformed to
>>>>>>    if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
>>>>>>      vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165);
>>>>>>      vect__12.22_172 = vect__11.19_170 + vect_cst__171;
>>>>>>      MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172);
>>>>>>      vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165);
>>>>>>      vect__19.28_184 = vect__18.25_182 + vect_cst__183;
>>>>>>      MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184);
>>>>>>    }
>>>>>> i.e. it will put all computations related to masked stores to semi-hammock.
>>>>>>
>>>>>> Bootstrapping and regression testing did not show any new failures.
>>>>>
>>>>> Can you please split out the middle-end support for vector equality compares?
>>>>>
>>>>> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
>>>>>        if (TREE_CODE (op0_type) == VECTOR_TYPE
>>>>>           || TREE_CODE (op1_type) == VECTOR_TYPE)
>>>>>          {
>>>>> -          error ("vector comparison returning a boolean");
>>>>> -          debug_generic_expr (op0_type);
>>>>> -          debug_generic_expr (op1_type);
>>>>> -          return true;
>>>>> +         /* Allow vector comparison returning boolean if operand types
>>>>> +            are equal and CODE is EQ/NE.  */
>>>>> +         if ((code != EQ_EXPR && code != NE_EXPR)
>>>>> +             || !(VECTOR_BOOLEAN_TYPE_P (op0_type)
>>>>> +                  || VECTOR_INTEGER_TYPE_P (op0_type)))
>>>>> +           {
>>>>> +             error ("type mismatch for vector comparison returning a boolean");
>>>>> +             debug_generic_expr (op0_type);
>>>>> +             debug_generic_expr (op1_type);
>>>>> +             return true;
>>>>> +           }
>>>>>          }
>>>>>      }
>>>>>
>>>>> please merge the conditions with a &&
>>>>>
>>>>> @@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code,
>>>>> tree type, tree op0, tree op1)
>>>>>
>>>>>    if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>>>>>      {
>>>>> +      if (INTEGRAL_TYPE_P (type)
>>>>> +         && (TREE_CODE (type) == BOOLEAN_TYPE
>>>>> +             || TYPE_PRECISION (type) == 1))
>>>>> +       {
>>>>> +         /* Have vector comparison with scalar boolean result.  */
>>>>> +         bool result = true;
>>>>> +         gcc_assert (code == EQ_EXPR || code == NE_EXPR);
>>>>> +         gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
>>>>> +         for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
>>>>> +           {
>>>>> +             tree elem0 = VECTOR_CST_ELT (op0, i);
>>>>> +             tree elem1 = VECTOR_CST_ELT (op1, i);
>>>>> +             tree tmp = fold_relational_const (code, type, elem0, elem1);
>>>>> +             result &= integer_onep (tmp);
>>>>> +         if (code == NE_EXPR)
>>>>> +           result = !result;
>>>>> +         return constant_boolean_node (result, type);
>>>>>
>>>>> ... just assumes it is either EQ_EXPR or NE_EXPR.   I believe you want
>>>>> to change the
>>>>> guarding condition to just
>>>>>
>>>>>    if (! VECTOR_TYPE_P (type))
>>>>>
>>>>> and assert the boolean/precision.  Please also merge the asserts into
>>>>> one with &&
>>>>>
>>>>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
>>>>> index b82ae3c..73ee3be 100644
>>>>> --- a/gcc/tree-ssa-forwprop.c
>>>>> +++ b/gcc/tree-ssa-forwprop.c
>>>>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>>>>> tree_code code, tree type,
>>>>>
>>>>>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>>>>
>>>>> +  /* Do not perform combining it types are not compatible.  */
>>>>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>>>>> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
>>>>> +    return NULL_TREE;
>>>>> +
>>>>>
>>>>> again, how does this happen?
>>>>>
>>>>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>>>>> index e67048e..1605520c 100644
>>>>> --- a/gcc/tree-vrp.c
>>>>> +++ b/gcc/tree-vrp.c
>>>>> @@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e,
>>>>> gimple_stmt_iterator si,
>>>>>                                                 &comp_code, &val))
>>>>>      return;
>>>>>
>>>>> +  /* Use of vector comparison in gcond is very restricted and used to check
>>>>> +     that the mask in masked store is zero, so assert for such comparison
>>>>> +     is not implemented yet.  */
>>>>> +  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
>>>>> +    return;
>>>>> +
>>>>>
>>>>> VECTOR_TYPE_P
>>>>>
>>>>> I believe the comment should simply say that VRP doesn't track ranges for
>>>>> vector types.
>>>>>
>>>>> In the previous review I suggested you should make sure that RTL expansion
>>>>> ends up using a well-defined optab for these compares.  To make sure
>>>>> this happens across targets I suggest you make these comparisons available
>>>>> via the GCC vector extension.  Thus allow
>>>>>
>>>>> typedef int v4si __attribute__((vector_size(16)));
>>>>>
>>>>> int foo (v4si a, v4si b)
>>>>> {
>>>>>   if (a == b)
>>>>>     return 4;
>>>>> }
>>>>>
>>>>> and != and also using floating point vectors.
>>>>>
>>>>> Otherwise it's hard to see the impact of this change.  Obvious choices
>>>>> are the eq/ne optabs for FP compares and [u]cmp optabs for integer
>>>>> compares.
>>>>>
>>>>> A half-way implementation like your VRP comment suggests (only
>>>>> ==/!= zero against integer vectors is implemented?!) this doesn't sound
>>>>> good without also limiting the feature this way in the verifier.
>>>>>
>>>>> Btw, the regression with WRF is >50% on AMD Bulldozer (which only
>>>>> has AVX, not AVX2).
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>> ChangeLog:
>>>>>> 2015-11-30  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>
>>>>>> PR middle-end/68542
>>>>>> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
>>>>>> comparison with boolean result.
>>>>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
>>>>>> for vector comparion with eq/ne only.
>>>>>> * fold-const.c (fold_relational_const): Add handling of vector
>>>>>> comparison with boolean result.
>>>>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>>>>>> comparison of vector operands with boolean result for EQ/NE only.
>>>>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>>>>>> (verify_gimple_cond): Likewise.
>>>>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
>>>>>> combining for non-compatible vector types.
>>>>>> * tree-vect-loop.c (is_valid_sink): New function.
>>>>>> (optimize_mask_stores): Likewise.
>>>>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
>>>>>> has_mask_store field of vect_info.
>>>>>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
>>>>>> vectorized loops having masked stores.
>>>>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
>>>>>> correspondent macros.
>>>>>> (optimize_mask_stores): Add prototype.
>>>>>> * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector
>>>>>> type.
>>>>>>
>>>>>> gcc/testsuite/ChangeLog:
>>>>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

[-- Attachment #2: PR68542.middle-end.patch3 --]
[-- Type: application/octet-stream, Size: 4435 bytes --]

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 553a9c3..a95b537 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -6419,13 +6419,17 @@ fold_binary_op_with_conditional_arg (location_t loc,
       if (VOID_TYPE_P (TREE_TYPE (false_value)))
 	rhs = false_value;
     }
-  else
+  else if (!(TREE_CODE (type) != VECTOR_TYPE
+	     && TREE_CODE (TREE_TYPE (cond)) == VECTOR_TYPE))
     {
       tree testtype = TREE_TYPE (cond);
       test = cond;
       true_value = constant_boolean_node (true, testtype);
       false_value = constant_boolean_node (false, testtype);
     }
+  else
+    /* Detect the case of mixing vector and scalar types - bail out.  */
+    return NULL_TREE;
 
   if (TREE_CODE (TREE_TYPE (test)) == VECTOR_TYPE)
     cond_code = VEC_COND_EXPR;
@@ -13882,6 +13886,23 @@ fold_relational_const (enum tree_code code, tree type, tree op0, tree op1)
 
   if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
     {
+      if (!VECTOR_TYPE_P (type))
+	{
+	  /* Have vector comparison with scalar boolean result.  */
+	  bool result = true;
+	  gcc_assert ((code == EQ_EXPR || code == NE_EXPR)
+		      && VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
+	  for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
+	    {
+	      tree elem0 = VECTOR_CST_ELT (op0, i);
+	      tree elem1 = VECTOR_CST_ELT (op1, i);
+	      tree tmp = fold_relational_const (code, type, elem0, elem1);
+	      result &= integer_onep (tmp);
+	    }
+	  if (code == NE_EXPR)
+	    result = !result;
+	  return constant_boolean_node (result, type);
+	}
       unsigned count = VECTOR_CST_NELTS (op0);
       tree *elts =  XALLOCAVEC (tree, count);
       gcc_assert (VECTOR_CST_NELTS (op1) == count
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 0c624aa..3e10e76 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3408,10 +3408,10 @@ verify_gimple_call (gcall *stmt)
 }
 
 /* Verifies the gimple comparison with the result type TYPE and
-   the operands OP0 and OP1.  */
+   the operands OP0 and OP1, comparison code is CODE.  */
 
 static bool
-verify_gimple_comparison (tree type, tree op0, tree op1)
+verify_gimple_comparison (tree type, tree op0, tree op1, enum tree_code code)
 {
   tree op0_type = TREE_TYPE (op0);
   tree op1_type = TREE_TYPE (op1);
@@ -3445,13 +3445,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
       && (TREE_CODE (type) == BOOLEAN_TYPE
 	  || TYPE_PRECISION (type) == 1))
     {
-      if (TREE_CODE (op0_type) == VECTOR_TYPE
-	  || TREE_CODE (op1_type) == VECTOR_TYPE)
-        {
-          error ("vector comparison returning a boolean");
-          debug_generic_expr (op0_type);
-          debug_generic_expr (op1_type);
-          return true;
+      if ((TREE_CODE (op0_type) == VECTOR_TYPE
+	   || TREE_CODE (op1_type) == VECTOR_TYPE)
+	  && code != EQ_EXPR && code != NE_EXPR
+	  && !VECTOR_BOOLEAN_TYPE_P (op0_type)
+	  && !VECTOR_INTEGER_TYPE_P (op0_type))
+	{
+	  error ("unsupported operation or type for vector comparison"
+		 " returning a boolean");
+	  debug_generic_expr (op0_type);
+	  debug_generic_expr (op1_type);
+	  return true;
         }
     }
   /* Or a boolean vector type with the same element count
@@ -3832,7 +3836,7 @@ verify_gimple_assign_binary (gassign *stmt)
     case LTGT_EXPR:
       /* Comparisons are also binary, but the result type is not
 	 connected to the operand types.  */
-      return verify_gimple_comparison (lhs_type, rhs1, rhs2);
+      return verify_gimple_comparison (lhs_type, rhs1, rhs2, rhs_code);
 
     case WIDEN_MULT_EXPR:
       if (TREE_CODE (lhs_type) != INTEGER_TYPE)
@@ -4541,7 +4545,8 @@ verify_gimple_cond (gcond *stmt)
 
   return verify_gimple_comparison (boolean_type_node,
 				   gimple_cond_lhs (stmt),
-				   gimple_cond_rhs (stmt));
+				   gimple_cond_rhs (stmt),
+				   gimple_cond_code (stmt));
 }
 
 /* Verify the GIMPLE statement STMT.  Returns true if there is an
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index acbb70b..208a752 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -5771,6 +5771,10 @@ register_edge_assert_for (tree name, edge e, gimple_stmt_iterator si,
 						&comp_code, &val))
     return;
 
+  /* VRP doesn't track ranges for vector types.  */
+  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
+    return;
+
   /* Register ASSERT_EXPRs for name.  */
   register_edge_assert_for_2 (name, e, si, cond_code, cond_op0,
 			      cond_op1, is_else_edge);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR68542]
  2015-12-18 10:20           ` Yuri Rumyantsev
@ 2016-01-11 10:06             ` Yuri Rumyantsev
  2016-01-18 12:44               ` Richard Biener
  0 siblings, 1 reply; 20+ messages in thread
From: Yuri Rumyantsev @ 2016-01-11 10:06 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin, Kirill Yukhin

Hi Richard,

Did you have anu chance to look at updated patch?

Thanks.
Yuri.

2015-12-18 13:20 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
> Hi Richard,
>
> Here is updated patch for middle-end part of the whole patch which
> fixes all your remarks I hope.
>
> Regression testing and bootstrapping did not show any new failures.
> Is it OK for trunk?
>
> Yuri.
>
> ChangeLog:
> 2015-12-18  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> PR middle-end/68542
> * fold-const.c (fold_binary_op_with_conditional_arg): Bail out for case
> of mixind vector and scalar types.
> (fold_relational_const): Add handling of vector
> comparison with boolean result.
> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
> comparison of vector operands with boolean result for EQ/NE only.
> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
> (verify_gimple_cond): Likewise.
> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
> combining for non-compatible vector types.
> * tree-vrp.c (register_edge_assert_for): VRP does not track ranges for
> vector types.
>
> 2015-12-16 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Fri, Dec 11, 2015 at 3:03 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard.
>>> Thanks for your review.
>>> I re-designed fix for assert by adding additional checks for vector
>>> comparison with boolean result to fold_binary_op_with_conditional_arg
>>> and remove early exit to combine_cond_expr_cond.
>>> Unfortunately, I am not able to provide you with test-case since it is
>>> in my second patch related to back-end patch which I sent earlier
>>> (12-08).
>>>
>>> Bootstrapping and regression testing did not show any new failures.
>>> Is it OK for trunk?
>>
>> +  else if (TREE_CODE (type) == VECTOR_TYPE)
>>      {
>>        tree testtype = TREE_TYPE (cond);
>>        test = cond;
>>        true_value = constant_boolean_node (true, testtype);
>>        false_value = constant_boolean_node (false, testtype);
>>      }
>> +  else
>> +    {
>> +      test = cond;
>> +      cond_type = type;
>> +      true_value = boolean_true_node;
>> +      false_value = boolean_false_node;
>> +    }
>>
>> So this is, say, vec1 != vec2 with scalar vs. vector result.  If we have
>> scalar result and thus, say, scalar + vec1 != vec2.  I believe rather
>> than doing the above (not seeing how this not would generate wrong
>> code eventually) we should simply detect the case of mixing vector
>> and scalar types and bail out.  At least without some comments
>> your patch makes the function even more difficult to understand than
>> it is already.
>>
>> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
>>        if (TREE_CODE (op0_type) == VECTOR_TYPE
>>           || TREE_CODE (op1_type) == VECTOR_TYPE)
>>          {
>> -          error ("vector comparison returning a boolean");
>> -          debug_generic_expr (op0_type);
>> -          debug_generic_expr (op1_type);
>> -          return true;
>> +         /* Allow vector comparison returning boolean if operand types
>> +            are boolean or integral and CODE is EQ/NE.  */
>> +         if (code != EQ_EXPR && code != NE_EXPR
>> +             && !VECTOR_BOOLEAN_TYPE_P (op0_type)
>> +             && !VECTOR_INTEGER_TYPE_P (op0_type))
>> +           {
>> +             error ("type mismatch for vector comparison returning a boolean");
>> +             debug_generic_expr (op0_type);
>> +             debug_generic_expr (op1_type);
>> +             return true;
>> +           }
>>          }
>>      }
>>    /* Or a boolean vector type with the same element count
>>
>> as said before please merge the cascaded if()s.  Better wording for
>> the error is "unsupported operation or type for vector comparison
>> returning a boolean"
>>
>> Otherwise the patch looks sensible to me though it shows that overloading of
>> EQ/NE_EXPR for scalar result and vector operands might have some more unexpected
>> fallout (which is why I originally prefered the view-convert to large
>> integer type variant).
>>
>> Thanks,
>> Richard.
>>
>>
>>> ChangeLog:
>>> 2015-12-11  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> PR middle-end/68542
>>> * fold-const.c (fold_binary_op_with_conditional_arg): Add checks oh
>>> vector comparison with boolean result to avoid ICE.
>>> (fold_relational_const): Add handling of vector
>>> comparison with boolean result.
>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>>> comparison of vector operands with boolean result for EQ/NE only.
>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>>> (verify_gimple_cond): Likewise.
>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
>>> combining for non-compatible vector types.
>>> * tree-vrp.c (register_edge_assert_for): VRP does not track ranges for
>>> vector types.
>>>
>>> 2015-12-10 16:36 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Fri, Dec 4, 2015 at 4:07 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Hi Richard.
>>>>>
>>>>> Thanks a lot for your review.
>>>>> Below are my answers.
>>>>>
>>>>> You asked why I inserted additional check to
>>>>> ++ b/gcc/tree-ssa-forwprop.c
>>>>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>>>>> tree_code code, tree type,
>>>>>
>>>>>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>>>>
>>>>> +  /* Do not perform combining it types are not compatible.  */
>>>>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>>>>> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
>>>>> +    return NULL_TREE;
>>>>> +
>>>>>
>>>>> again, how does this happen?
>>>>>
>>>>> This is because without it I've got assert in fold_convert_loc
>>>>>       gcc_assert (TREE_CODE (orig) == VECTOR_TYPE
>>>>>  && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig)));
>>>>>
>>>>> since it tries to convert vector of bool to scalar bool.
>>>>> Here is essential part of call-stack:
>>>>>
>>>>> #0  internal_error (gmsgid=0x1e48397 "in %s, at %s:%d")
>>>>>     at ../../gcc/diagnostic.c:1259
>>>>> #1  0x0000000001743ada in fancy_abort (
>>>>>     file=0x1847fc3 "../../gcc/fold-const.c", line=2217,
>>>>>     function=0x184b9d0 <fold_convert_loc(unsigned int, tree_node*,
>>>>> tree_node*)::__FUNCTION__> "fold_convert_loc") at
>>>>> ../../gcc/diagnostic.c:1332
>>>>> #2  0x00000000009c8330 in fold_convert_loc (loc=0, type=0x7ffff18a9d20,
>>>>>     arg=0x7ffff1a7f488) at ../../gcc/fold-const.c:2216
>>>>> #3  0x00000000009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR,
>>>>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>>>>>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:11453
>>>>> #4  0x00000000009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR,
>>>>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>>>>>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:12394
>>>>> #5  0x00000000009d870c in fold_binary_op_with_conditional_arg (loc=0,
>>>>>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>>>>>     op1=0x7ffff1a48780, cond=0x7ffff1a7f460, arg=0x7ffff1a48780,
>>>>>     cond_first_p=1) at ../../gcc/fold-const.c:6465
>>>>> #6  0x00000000009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR,
>>>>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff1a48780)
>>>>>     at ../../gcc/fold-const.c:9211
>>>>> #7  0x0000000000ecb8fa in combine_cond_expr_cond (stmt=0x7ffff1a487d0,
>>>>>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>>>>>     op1=0x7ffff1a48780, invariant_only=true)
>>>>>     at ../../gcc/tree-ssa-forwprop.c:382
>>>>
>>>> Ok, but that only shows that
>>>>
>>>>       /* Convert A ? 1 : 0 to simply A.  */
>>>>       if ((code == VEC_COND_EXPR ? integer_all_onesp (op1)
>>>>                                  : (integer_onep (op1)
>>>>                                     && !VECTOR_TYPE_P (type)))
>>>>           && integer_zerop (op2)
>>>>           /* If we try to convert OP0 to our type, the
>>>>              call to fold will try to move the conversion inside
>>>>              a COND, which will recurse.  In that case, the COND_EXPR
>>>>              is probably the best choice, so leave it alone.  */
>>>>           && type == TREE_TYPE (arg0))
>>>>         return pedantic_non_lvalue_loc (loc, arg0);
>>>>
>>>>       /* Convert A ? 0 : 1 to !A.  This prefers the use of NOT_EXPR
>>>>          over COND_EXPR in cases such as floating point comparisons.  */
>>>>       if (integer_zerop (op1)
>>>>           && (code == VEC_COND_EXPR ? integer_all_onesp (op2)
>>>>                                     : (integer_onep (op2)
>>>>                                        && !VECTOR_TYPE_P (type)))
>>>>           && truth_value_p (TREE_CODE (arg0)))
>>>>         return pedantic_non_lvalue_loc (loc,
>>>>                                     fold_convert_loc (loc, type,
>>>>                                               invert_truthvalue_loc (loc,
>>>>                                                                      arg0)));
>>>>
>>>> are wrong?  I can't say for sure without a testcase.
>>>>
>>>> That said, papering over this in tree-ssa-forwprop.c is not the
>>>> correct thing to do.
>>>>
>>>>> Secondly, I did not catch your idea to implement GCC Vector Extension
>>>>> for vector comparison with bool result since
>>>>> such extension completely depends on comparison context, e.g. for your
>>>>> example, result type of comparison depends on using - for
>>>>> if-comparison it is scalar, but for c = (a==b) - result type is
>>>>> vector. I don't think that this is reasonable for current release.
>>>>
>>>> The idea was to be able to write testcases exercising different EQ/NE vector
>>>> compares.  But yes, if that's non-trivial the it's not appropriate for stage3.
>>>>
>>>> Can you add a testcase for the forwprop issue and try to fix the offending
>>>> bogus folders instead?
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> And finally about AMD performance. I checked that this transformation
>>>>> works for "-march=bdver4" option and regression for 481.wrf must
>>>>> disappear too.
>>>>>
>>>>> Thanks.
>>>>> Yuri.
>>>>>
>>>>> 2015-12-04 15:18 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Here is a patch for 481.wrf preformance regression for avx2 which is
>>>>>>> sligthly modified mask store optimization. This transformation allows
>>>>>>> perform unpredication for semi-hammock containing masked stores, other
>>>>>>> words if we have a loop like
>>>>>>> for (i=0; i<n; i++)
>>>>>>>   if (c[i]) {
>>>>>>>     p1[i] += 1;
>>>>>>>     p2[i] = p3[i] +2;
>>>>>>>   }
>>>>>>>
>>>>>>> then it will be transformed to
>>>>>>>    if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
>>>>>>>      vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165);
>>>>>>>      vect__12.22_172 = vect__11.19_170 + vect_cst__171;
>>>>>>>      MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172);
>>>>>>>      vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165);
>>>>>>>      vect__19.28_184 = vect__18.25_182 + vect_cst__183;
>>>>>>>      MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184);
>>>>>>>    }
>>>>>>> i.e. it will put all computations related to masked stores to semi-hammock.
>>>>>>>
>>>>>>> Bootstrapping and regression testing did not show any new failures.
>>>>>>
>>>>>> Can you please split out the middle-end support for vector equality compares?
>>>>>>
>>>>>> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
>>>>>>        if (TREE_CODE (op0_type) == VECTOR_TYPE
>>>>>>           || TREE_CODE (op1_type) == VECTOR_TYPE)
>>>>>>          {
>>>>>> -          error ("vector comparison returning a boolean");
>>>>>> -          debug_generic_expr (op0_type);
>>>>>> -          debug_generic_expr (op1_type);
>>>>>> -          return true;
>>>>>> +         /* Allow vector comparison returning boolean if operand types
>>>>>> +            are equal and CODE is EQ/NE.  */
>>>>>> +         if ((code != EQ_EXPR && code != NE_EXPR)
>>>>>> +             || !(VECTOR_BOOLEAN_TYPE_P (op0_type)
>>>>>> +                  || VECTOR_INTEGER_TYPE_P (op0_type)))
>>>>>> +           {
>>>>>> +             error ("type mismatch for vector comparison returning a boolean");
>>>>>> +             debug_generic_expr (op0_type);
>>>>>> +             debug_generic_expr (op1_type);
>>>>>> +             return true;
>>>>>> +           }
>>>>>>          }
>>>>>>      }
>>>>>>
>>>>>> please merge the conditions with a &&
>>>>>>
>>>>>> @@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code,
>>>>>> tree type, tree op0, tree op1)
>>>>>>
>>>>>>    if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>>>>>>      {
>>>>>> +      if (INTEGRAL_TYPE_P (type)
>>>>>> +         && (TREE_CODE (type) == BOOLEAN_TYPE
>>>>>> +             || TYPE_PRECISION (type) == 1))
>>>>>> +       {
>>>>>> +         /* Have vector comparison with scalar boolean result.  */
>>>>>> +         bool result = true;
>>>>>> +         gcc_assert (code == EQ_EXPR || code == NE_EXPR);
>>>>>> +         gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
>>>>>> +         for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
>>>>>> +           {
>>>>>> +             tree elem0 = VECTOR_CST_ELT (op0, i);
>>>>>> +             tree elem1 = VECTOR_CST_ELT (op1, i);
>>>>>> +             tree tmp = fold_relational_const (code, type, elem0, elem1);
>>>>>> +             result &= integer_onep (tmp);
>>>>>> +         if (code == NE_EXPR)
>>>>>> +           result = !result;
>>>>>> +         return constant_boolean_node (result, type);
>>>>>>
>>>>>> ... just assumes it is either EQ_EXPR or NE_EXPR.   I believe you want
>>>>>> to change the
>>>>>> guarding condition to just
>>>>>>
>>>>>>    if (! VECTOR_TYPE_P (type))
>>>>>>
>>>>>> and assert the boolean/precision.  Please also merge the asserts into
>>>>>> one with &&
>>>>>>
>>>>>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
>>>>>> index b82ae3c..73ee3be 100644
>>>>>> --- a/gcc/tree-ssa-forwprop.c
>>>>>> +++ b/gcc/tree-ssa-forwprop.c
>>>>>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>>>>>> tree_code code, tree type,
>>>>>>
>>>>>>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>>>>>
>>>>>> +  /* Do not perform combining it types are not compatible.  */
>>>>>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>>>>>> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
>>>>>> +    return NULL_TREE;
>>>>>> +
>>>>>>
>>>>>> again, how does this happen?
>>>>>>
>>>>>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>>>>>> index e67048e..1605520c 100644
>>>>>> --- a/gcc/tree-vrp.c
>>>>>> +++ b/gcc/tree-vrp.c
>>>>>> @@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e,
>>>>>> gimple_stmt_iterator si,
>>>>>>                                                 &comp_code, &val))
>>>>>>      return;
>>>>>>
>>>>>> +  /* Use of vector comparison in gcond is very restricted and used to check
>>>>>> +     that the mask in masked store is zero, so assert for such comparison
>>>>>> +     is not implemented yet.  */
>>>>>> +  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
>>>>>> +    return;
>>>>>> +
>>>>>>
>>>>>> VECTOR_TYPE_P
>>>>>>
>>>>>> I believe the comment should simply say that VRP doesn't track ranges for
>>>>>> vector types.
>>>>>>
>>>>>> In the previous review I suggested you should make sure that RTL expansion
>>>>>> ends up using a well-defined optab for these compares.  To make sure
>>>>>> this happens across targets I suggest you make these comparisons available
>>>>>> via the GCC vector extension.  Thus allow
>>>>>>
>>>>>> typedef int v4si __attribute__((vector_size(16)));
>>>>>>
>>>>>> int foo (v4si a, v4si b)
>>>>>> {
>>>>>>   if (a == b)
>>>>>>     return 4;
>>>>>> }
>>>>>>
>>>>>> and != and also using floating point vectors.
>>>>>>
>>>>>> Otherwise it's hard to see the impact of this change.  Obvious choices
>>>>>> are the eq/ne optabs for FP compares and [u]cmp optabs for integer
>>>>>> compares.
>>>>>>
>>>>>> A half-way implementation like your VRP comment suggests (only
>>>>>> ==/!= zero against integer vectors is implemented?!) this doesn't sound
>>>>>> good without also limiting the feature this way in the verifier.
>>>>>>
>>>>>> Btw, the regression with WRF is >50% on AMD Bulldozer (which only
>>>>>> has AVX, not AVX2).
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>> ChangeLog:
>>>>>>> 2015-11-30  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>
>>>>>>> PR middle-end/68542
>>>>>>> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
>>>>>>> comparison with boolean result.
>>>>>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
>>>>>>> for vector comparion with eq/ne only.
>>>>>>> * fold-const.c (fold_relational_const): Add handling of vector
>>>>>>> comparison with boolean result.
>>>>>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>>>>>>> comparison of vector operands with boolean result for EQ/NE only.
>>>>>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>>>>>>> (verify_gimple_cond): Likewise.
>>>>>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
>>>>>>> combining for non-compatible vector types.
>>>>>>> * tree-vect-loop.c (is_valid_sink): New function.
>>>>>>> (optimize_mask_stores): Likewise.
>>>>>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
>>>>>>> has_mask_store field of vect_info.
>>>>>>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
>>>>>>> vectorized loops having masked stores.
>>>>>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
>>>>>>> correspondent macros.
>>>>>>> (optimize_mask_stores): Add prototype.
>>>>>>> * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector
>>>>>>> type.
>>>>>>>
>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR68542]
  2016-01-11 10:06             ` Yuri Rumyantsev
@ 2016-01-18 12:44               ` Richard Biener
  2016-01-18 14:02                 ` Yuri Rumyantsev
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Biener @ 2016-01-18 12:44 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin, Kirill Yukhin

On Mon, Jan 11, 2016 at 11:06 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Hi Richard,
>
> Did you have anu chance to look at updated patch?

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index acbb70b..208a752 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -5771,6 +5771,10 @@ register_edge_assert_for (tree name, edge e,
gimple_stmt_iterator si,
                                                &comp_code, &val))
     return;

+  /* VRP doesn't track ranges for vector types.  */
+  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
+    return;
+

please instead fix extract_code_and_val_from_cond_with_ops with

Index: gcc/tree-vrp.c
===================================================================
--- gcc/tree-vrp.c      (revision 232506)
+++ gcc/tree-vrp.c      (working copy)
@@ -5067,8 +5067,9 @@ extract_code_and_val_from_cond_with_ops
   if (invert)
     comp_code = invert_tree_comparison (comp_code, 0);

-  /* VRP does not handle float types.  */
-  if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (val)))
+  /* VRP only handles integral and pointer types.  */
+  if (! INTEGRAL_TYPE_P (TREE_TYPE (val))
+      && ! POINTER_TYPE_P (TREE_TYPE (val)))
     return false;

   /* Do not register always-false predicates.

Ok with that change.

Thanks,
Richard.

> Thanks.
> Yuri.
>
> 2015-12-18 13:20 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>> Hi Richard,
>>
>> Here is updated patch for middle-end part of the whole patch which
>> fixes all your remarks I hope.
>>
>> Regression testing and bootstrapping did not show any new failures.
>> Is it OK for trunk?
>>
>> Yuri.
>>
>> ChangeLog:
>> 2015-12-18  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> PR middle-end/68542
>> * fold-const.c (fold_binary_op_with_conditional_arg): Bail out for case
>> of mixind vector and scalar types.
>> (fold_relational_const): Add handling of vector
>> comparison with boolean result.
>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>> comparison of vector operands with boolean result for EQ/NE only.
>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>> (verify_gimple_cond): Likewise.
>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR68542]
  2016-01-18 12:44               ` Richard Biener
@ 2016-01-18 14:02                 ` Yuri Rumyantsev
  2016-01-18 14:07                   ` Richard Biener
  0 siblings, 1 reply; 20+ messages in thread
From: Yuri Rumyantsev @ 2016-01-18 14:02 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin, Kirill Yukhin

Thanks Richard.

I changed the check on type as you proposed.

What about the second back-end part of patch (it has been sent 08.12.15).

Thanks.
Yuri.

2016-01-18 15:44 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Mon, Jan 11, 2016 at 11:06 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Hi Richard,
>>
>> Did you have anu chance to look at updated patch?
>
> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> index acbb70b..208a752 100644
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -5771,6 +5771,10 @@ register_edge_assert_for (tree name, edge e,
> gimple_stmt_iterator si,
>                                                 &comp_code, &val))
>      return;
>
> +  /* VRP doesn't track ranges for vector types.  */
> +  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
> +    return;
> +
>
> please instead fix extract_code_and_val_from_cond_with_ops with
>
> Index: gcc/tree-vrp.c
> ===================================================================
> --- gcc/tree-vrp.c      (revision 232506)
> +++ gcc/tree-vrp.c      (working copy)
> @@ -5067,8 +5067,9 @@ extract_code_and_val_from_cond_with_ops
>    if (invert)
>      comp_code = invert_tree_comparison (comp_code, 0);
>
> -  /* VRP does not handle float types.  */
> -  if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (val)))
> +  /* VRP only handles integral and pointer types.  */
> +  if (! INTEGRAL_TYPE_P (TREE_TYPE (val))
> +      && ! POINTER_TYPE_P (TREE_TYPE (val)))
>      return false;
>
>    /* Do not register always-false predicates.
>
> Ok with that change.
>
> Thanks,
> Richard.
>
>> Thanks.
>> Yuri.
>>
>> 2015-12-18 13:20 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>> Hi Richard,
>>>
>>> Here is updated patch for middle-end part of the whole patch which
>>> fixes all your remarks I hope.
>>>
>>> Regression testing and bootstrapping did not show any new failures.
>>> Is it OK for trunk?
>>>
>>> Yuri.
>>>
>>> ChangeLog:
>>> 2015-12-18  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> PR middle-end/68542
>>> * fold-const.c (fold_binary_op_with_conditional_arg): Bail out for case
>>> of mixind vector and scalar types.
>>> (fold_relational_const): Add handling of vector
>>> comparison with boolean result.
>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>>> comparison of vector operands with boolean result for EQ/NE only.
>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>>> (verify_gimple_cond): Likewise.
>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR68542]
  2016-01-18 14:02                 ` Yuri Rumyantsev
@ 2016-01-18 14:07                   ` Richard Biener
  2016-01-18 14:50                     ` Yuri Rumyantsev
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Biener @ 2016-01-18 14:07 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin, Kirill Yukhin

On Mon, Jan 18, 2016 at 3:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Thanks Richard.
>
> I changed the check on type as you proposed.
>
> What about the second back-end part of patch (it has been sent 08.12.15).

Can't see it in my inbox - can you reply to the mail with a ping?

Thanks,
Richard.

> Thanks.
> Yuri.
>
> 2016-01-18 15:44 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Mon, Jan 11, 2016 at 11:06 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Hi Richard,
>>>
>>> Did you have anu chance to look at updated patch?
>>
>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>> index acbb70b..208a752 100644
>> --- a/gcc/tree-vrp.c
>> +++ b/gcc/tree-vrp.c
>> @@ -5771,6 +5771,10 @@ register_edge_assert_for (tree name, edge e,
>> gimple_stmt_iterator si,
>>                                                 &comp_code, &val))
>>      return;
>>
>> +  /* VRP doesn't track ranges for vector types.  */
>> +  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
>> +    return;
>> +
>>
>> please instead fix extract_code_and_val_from_cond_with_ops with
>>
>> Index: gcc/tree-vrp.c
>> ===================================================================
>> --- gcc/tree-vrp.c      (revision 232506)
>> +++ gcc/tree-vrp.c      (working copy)
>> @@ -5067,8 +5067,9 @@ extract_code_and_val_from_cond_with_ops
>>    if (invert)
>>      comp_code = invert_tree_comparison (comp_code, 0);
>>
>> -  /* VRP does not handle float types.  */
>> -  if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (val)))
>> +  /* VRP only handles integral and pointer types.  */
>> +  if (! INTEGRAL_TYPE_P (TREE_TYPE (val))
>> +      && ! POINTER_TYPE_P (TREE_TYPE (val)))
>>      return false;
>>
>>    /* Do not register always-false predicates.
>>
>> Ok with that change.
>>
>> Thanks,
>> Richard.
>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2015-12-18 13:20 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>> Hi Richard,
>>>>
>>>> Here is updated patch for middle-end part of the whole patch which
>>>> fixes all your remarks I hope.
>>>>
>>>> Regression testing and bootstrapping did not show any new failures.
>>>> Is it OK for trunk?
>>>>
>>>> Yuri.
>>>>
>>>> ChangeLog:
>>>> 2015-12-18  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> PR middle-end/68542
>>>> * fold-const.c (fold_binary_op_with_conditional_arg): Bail out for case
>>>> of mixind vector and scalar types.
>>>> (fold_relational_const): Add handling of vector
>>>> comparison with boolean result.
>>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>>>> comparison of vector operands with boolean result for EQ/NE only.
>>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>>>> (verify_gimple_cond): Likewise.
>>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR68542]
  2016-01-18 14:07                   ` Richard Biener
@ 2016-01-18 14:50                     ` Yuri Rumyantsev
  2016-01-20 12:25                       ` Richard Biener
  0 siblings, 1 reply; 20+ messages in thread
From: Yuri Rumyantsev @ 2016-01-18 14:50 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 3935 bytes --]

Richard,

Here is the second part of patch which really preforms mask stores and
all statements related to it to new basic block guarded by test on
zero mask. Hew test is also added.

Is it OK for trunk?

Thanks.
Yuri.

2016-01-18  Yuri Rumyantsev  <ysrumyan@gmail.com>

PR middle-end/68542
* config/i386/i386.c (ix86_expand_branch): Implement integral vector
comparison with boolean result.
* config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
for vector comparion with eq/ne only.
* tree-vect-loop.c (is_valid_sink): New function.
(optimize_mask_stores): Likewise.
* tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
has_mask_store field of vect_info.
* tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
vectorized loops having masked stores.
* tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
correspondent macros.
(optimize_mask_stores): Add prototype.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-mask-store-move-1.c: New test.

2016-01-18 17:07 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Mon, Jan 18, 2016 at 3:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Thanks Richard.
>>
>> I changed the check on type as you proposed.
>>
>> What about the second back-end part of patch (it has been sent 08.12.15).
>
> Can't see it in my inbox - can you reply to the mail with a ping?
>
> Thanks,
> Richard.
>
>> Thanks.
>> Yuri.
>>
>> 2016-01-18 15:44 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Mon, Jan 11, 2016 at 11:06 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Hi Richard,
>>>>
>>>> Did you have anu chance to look at updated patch?
>>>
>>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>>> index acbb70b..208a752 100644
>>> --- a/gcc/tree-vrp.c
>>> +++ b/gcc/tree-vrp.c
>>> @@ -5771,6 +5771,10 @@ register_edge_assert_for (tree name, edge e,
>>> gimple_stmt_iterator si,
>>>                                                 &comp_code, &val))
>>>      return;
>>>
>>> +  /* VRP doesn't track ranges for vector types.  */
>>> +  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
>>> +    return;
>>> +
>>>
>>> please instead fix extract_code_and_val_from_cond_with_ops with
>>>
>>> Index: gcc/tree-vrp.c
>>> ===================================================================
>>> --- gcc/tree-vrp.c      (revision 232506)
>>> +++ gcc/tree-vrp.c      (working copy)
>>> @@ -5067,8 +5067,9 @@ extract_code_and_val_from_cond_with_ops
>>>    if (invert)
>>>      comp_code = invert_tree_comparison (comp_code, 0);
>>>
>>> -  /* VRP does not handle float types.  */
>>> -  if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (val)))
>>> +  /* VRP only handles integral and pointer types.  */
>>> +  if (! INTEGRAL_TYPE_P (TREE_TYPE (val))
>>> +      && ! POINTER_TYPE_P (TREE_TYPE (val)))
>>>      return false;
>>>
>>>    /* Do not register always-false predicates.
>>>
>>> Ok with that change.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> Thanks.
>>>> Yuri.
>>>>
>>>> 2015-12-18 13:20 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>> Hi Richard,
>>>>>
>>>>> Here is updated patch for middle-end part of the whole patch which
>>>>> fixes all your remarks I hope.
>>>>>
>>>>> Regression testing and bootstrapping did not show any new failures.
>>>>> Is it OK for trunk?
>>>>>
>>>>> Yuri.
>>>>>
>>>>> ChangeLog:
>>>>> 2015-12-18  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>
>>>>> PR middle-end/68542
>>>>> * fold-const.c (fold_binary_op_with_conditional_arg): Bail out for case
>>>>> of mixind vector and scalar types.
>>>>> (fold_relational_const): Add handling of vector
>>>>> comparison with boolean result.
>>>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>>>>> comparison of vector operands with boolean result for EQ/NE only.
>>>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>>>>> (verify_gimple_cond): Likewise.
>>>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform

[-- Attachment #2: PR68542.patch --]
[-- Type: application/octet-stream, Size: 14281 bytes --]

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 733d0ab..5a07d7c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21669,6 +21669,40 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label)
   machine_mode mode = GET_MODE (op0);
   rtx tmp;
 
+  /* Handle special case - vector comparsion with boolean result, transform
+     it using ptest instruction.  */
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+    {
+      rtx lhs;
+      rtx flag;
+      machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : V2DImode;
+      gcc_assert (code == EQ || code == NE);
+      if (!REG_P (op0))
+	op0 = force_reg (mode, op0);
+      if (!REG_P (op1))
+	op1 = force_reg (mode, op1);
+      /* Generate subtraction since we can't check that one operand is
+	 zero vector.  */
+      lhs = gen_reg_rtx (mode);
+      emit_insn (gen_rtx_SET (lhs, gen_rtx_MINUS (mode, op0, op1)));
+      lhs = gen_rtx_SUBREG (p_mode, lhs, 0);
+      tmp = gen_rtx_SET (gen_rtx_REG (CCmode, FLAGS_REG),
+			 gen_rtx_UNSPEC (CCmode,
+					 gen_rtvec (2, lhs, lhs),
+					 UNSPEC_PTEST));
+      emit_insn (tmp);
+      flag = gen_rtx_REG (CCZmode, FLAGS_REG);
+      tmp = gen_rtx_fmt_ee (code, VOIDmode, flag, const0_rtx);
+      if (code == EQ)
+	tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp,
+				    gen_rtx_LABEL_REF (VOIDmode, label), pc_rtx);
+      else
+	tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp,
+				    pc_rtx, gen_rtx_LABEL_REF (VOIDmode, label));
+      emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
+      return;
+    }
+
   switch (mode)
     {
     case SFmode:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 6740edf..ae0c54e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -18350,6 +18350,25 @@
 	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
   "TARGET_AVX512BW")
 
+(define_expand "cbranch<mode>4"
+  [(set (reg:CC FLAGS_REG)
+	(compare:CC (match_operand:V48_AVX2 1 "nonimmediate_operand")
+		    (match_operand:V48_AVX2 2 "register_operand")))
+   (set (pc) (if_then_else
+	       (match_operator 0 "bt_comparison_operator"
+		[(reg:CC FLAGS_REG) (const_int 0)])
+	       (label_ref (match_operand 3))
+	       (pc)))]
+  "TARGET_AVX2"
+{
+  if (MEM_P (operands[1]) && MEM_P (operands[2]))
+    operands[1] = force_reg (<MODE>mode, operands[1]);
+  ix86_expand_branch (GET_CODE (operands[0]),
+		      operands[1], operands[2], operands[3]);
+  DONE;
+})
+
+
 (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
   [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
 	(unspec:AVX256MODE2P
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
new file mode 100644
index 0000000..89fac5a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
@@ -0,0 +1,79 @@
+/* { dg-additional-options "-O3 -fdump-tree-vect-details" } */
+/* { dg-additional-options "-mavx2" { target avx_runtime } } */
+
+#include "tree-vect.h"
+#define N 32
+int *p1, *p2, *p3;
+int c[N];
+int p1ref[N], p2ref[N];
+
+__attribute__((noinline, noclone)) void foo (int n)
+{
+  int i;
+  for (i=0; i<n; i++)
+    if (c[i]) {
+      p1[i] += 1;
+      p2[i] = p3[i] +2;
+    }
+}
+
+void init ()
+{
+  p1ref[0]=1; p2ref[0]=2;
+  p1ref[1]=3; p2ref[1]=5;
+  p1ref[2]=5; p2ref[2]=8;
+  p1ref[3]=7; p2ref[3]=11;
+  p1ref[4]=9; p2ref[4]=14;
+  p1ref[5]=11; p2ref[5]=17;
+  p1ref[6]=13; p2ref[6]=20;
+  p1ref[7]=15; p2ref[7]=23;
+  p1ref[8]=16; p2ref[8]=8;
+  p1ref[9]=18; p2ref[9]=9;
+  p1ref[10]=20; p2ref[10]=10;
+  p1ref[11]=22; p2ref[11]=11;
+  p1ref[12]=24; p2ref[12]=12;
+  p1ref[13]=26; p2ref[13]=13;
+  p1ref[14]=28; p2ref[14]=14;
+  p1ref[15]=30; p2ref[15]=15;
+  p1ref[16]=33; p2ref[16]=50;
+  p1ref[17]=35; p2ref[17]=53;
+  p1ref[18]=37; p2ref[18]=56;
+  p1ref[19]=39; p2ref[19]=59;
+  p1ref[20]=41; p2ref[20]=62;
+  p1ref[21]=43; p2ref[21]=65;
+  p1ref[22]=45; p2ref[22]=68;
+  p1ref[23]=47; p2ref[23]=71;
+  p1ref[24]=48; p2ref[24]=24;
+  p1ref[25]=50; p2ref[25]=25;
+  p1ref[26]=52; p2ref[26]=26;
+  p1ref[27]=54; p2ref[27]=27;
+  p1ref[28]=56; p2ref[28]=28;
+  p1ref[29]=58; p2ref[29]=29;
+  p1ref[30]=60; p2ref[30]=30;
+  p1ref[31]=62; p2ref[31]=31;
+}
+
+int main()
+{
+  int * P = malloc(N * 3 * sizeof (int));
+  int i;
+
+  check_vect ();
+  p1 = &P[0];
+  p2 = &P[N];
+  p3 = &P[2 * N];
+  for (i=0; i<N; i++) {
+    p1[i] = i + i;
+    p3[i] = i * 3;
+    p2[i] = i;
+    c[i] = (i >> 3) & 1? 0: 1;
+  }
+  init ();
+  foo (N);
+  for (i=0; i<N;i++)
+    if (p1[i] != p1ref[i] || p2[i] != p2ref[i])
+      abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Move MASK_STORE to new bb" 1 "vect" } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 77ad760..709746b 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6927,3 +6927,225 @@ vect_transform_loop (loop_vec_info loop_vinfo)
       dump_printf (MSG_NOTE, "\n");
     }
 }
+
+/* Helper for optimize_mask_stores: returns true if STMT sinking to end
+   of BB is valid and false otherwise.  */
+
+static bool
+is_valid_sink (gimple *stmt, gimple *last_store)
+{
+  tree vdef;
+  imm_use_iterator imm_it;
+  use_operand_p use_p;
+  basic_block bb = gimple_bb (stmt);
+
+  if (is_gimple_call (stmt)
+      && !gimple_call_internal_p (stmt))
+    /* Do not consider non-internal call as valid to sink.  */
+    return false;
+
+  if ((vdef = gimple_vdef (stmt)))
+    {
+      /* Check that ther are no store vuses in current bb.  */
+      FOR_EACH_IMM_USE_FAST (use_p, imm_it, vdef)
+	if (gimple_bb (USE_STMT (use_p)) == bb)
+	  return false;
+      return true;
+    }
+  else if (gimple_vuse (stmt) == NULL_TREE)
+    return true;
+  else if (gimple_vuse (stmt) == gimple_vuse (last_store))
+    return true;
+  return false;
+}
+
+/* The code below is trying to perform simple optimization - revert
+   if-conversion for masked stores, i.e. if the mask of a store is zero
+   do not perform it and all stored value producers also if possible.
+   For example,
+     for (i=0; i<n; i++)
+       if (c[i]) {
+	 p1[i] += 1;
+	 p2[i] = p3[i] +2;
+     }
+   this transformation will produce the following semi-hammock:
+
+   if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
+     vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165);
+     vect__12.22_172 = vect__11.19_170 + vect_cst__171;
+     MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172);
+     vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165);
+     vect__19.28_184 = vect__18.25_182 + vect_cst__183;
+     MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184);
+   }
+*/
+
+void
+optimize_mask_stores (struct loop *loop)
+{
+  basic_block bb = loop->header;
+  gimple_stmt_iterator gsi;
+  gimple *stmt, *stmt1;
+  auto_vec<gimple *> worklist;
+
+  vect_location = find_loop_location (loop);
+  /* Pick up all masked stores in loop if any.  */
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      stmt = gsi_stmt (gsi);
+      if (is_gimple_call (stmt)
+	  && gimple_call_internal_p (stmt)
+	  && gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
+	worklist.safe_push (stmt);
+    }
+
+  if (worklist.is_empty ())
+    return;
+
+  /* Loop has masked stores.  */
+  while (!worklist.is_empty ())
+    {
+      gimple *last, *last_store;
+      edge e, efalse;
+      tree mask;
+      basic_block store_bb, join_bb;
+      gimple_stmt_iterator gsi_to;
+      /* tree arg3; */
+      tree vdef, new_vdef;
+      gphi *phi;
+      bool first_dump;
+      tree vectype;
+      tree zero;
+
+      last = worklist.pop ();
+      mask = gimple_call_arg (last, 2);
+      /* Create new bb.  */
+      e = split_block (bb, last);
+      join_bb = e->dest;
+      store_bb = create_empty_bb (bb);
+      add_bb_to_loop (store_bb, loop);
+      e->flags = EDGE_TRUE_VALUE;
+      efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE);
+      /* Put STORE_BB to likely part.  */
+      efalse->probability = PROB_UNLIKELY;
+      store_bb->frequency = PROB_ALWAYS - EDGE_FREQUENCY (efalse);
+      make_edge (store_bb, join_bb, EDGE_FALLTHRU);
+      if (dom_info_available_p (CDI_DOMINATORS))
+	set_immediate_dominator (CDI_DOMINATORS, store_bb, bb);
+      /* Create vector comparison with boolean result.  */
+      vectype = TREE_TYPE (mask);
+      zero = build_zero_cst (vectype);
+      stmt = gimple_build_cond (EQ_EXPR, mask, zero, NULL_TREE, NULL_TREE);
+      gsi = gsi_last_bb (bb);
+      gsi_insert_after (&gsi, stmt, GSI_SAME_STMT);
+      /* Create new PHI node for vdef of the last masked store:
+	 .MEM_2 = VDEF <.MEM_1>
+	 will be converted to
+	 .MEM.3 = VDEF <.MEM_1>
+	 and new PHI node will be created in join bb
+	 .MEM_2 = PHI <.MEM_1, .MEM_3>
+      */
+      vdef = gimple_vdef (last);
+      new_vdef = make_ssa_name (gimple_vop (cfun), last);
+      gimple_set_vdef (last, new_vdef);
+      phi = create_phi_node (vdef, join_bb);
+      add_phi_arg (phi, new_vdef, EDGE_SUCC (store_bb, 0), UNKNOWN_LOCATION);
+      first_dump = true;
+
+      /* Put all masked stores with the same mask to STORE_BB if possible.  */
+      while (true)
+	{
+	  gimple_stmt_iterator gsi_from;
+	  /* Move masked store to STORE_BB.  */
+	  last_store = last;
+	  gsi = gsi_for_stmt (last);
+	  gsi_from = gsi;
+	  /* Shift gsi to previous stmt for further traversal.  */
+	  gsi_prev (&gsi);
+	  gsi_to = gsi_start_bb (store_bb);
+	  gsi_move_before (&gsi_from, &gsi_to);
+	  update_stmt (last);
+	  if (dump_enabled_p ())
+	    {
+	      /* Issue different messages depending on FIRST_DUMP.  */
+	      if (first_dump)
+		{
+		  dump_printf_loc (MSG_NOTE, vect_location,
+				   "Move MASK_STORE to new bb#%d\n",
+			       store_bb->index);
+		  first_dump = false;
+		}
+	      else
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "Move MASK_STORE to created bb\n");
+	      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, last, 0);
+	    }
+	    /* Move all stored value producers if possible.  */
+	    while (!gsi_end_p (gsi))
+	      {
+		tree lhs;
+		imm_use_iterator imm_iter;
+		use_operand_p use_p;
+		bool res;
+		stmt1 = gsi_stmt (gsi);
+		/* Do not consider statements writing to memory.  */
+		if (gimple_vdef (stmt1))
+		  break;
+		gsi_prev (&gsi);
+		if (is_gimple_call (stmt1))
+		  lhs = gimple_call_lhs (stmt1);
+		else
+		  if (gimple_code (stmt1) == GIMPLE_ASSIGN)
+		    lhs = gimple_assign_lhs (stmt1);
+		  else
+		    break;
+
+		/* LHS of vectorized stmt must be SSA_NAME.  */
+		if (TREE_CODE (lhs) != SSA_NAME)
+		  break;
+
+		/* Skip scalar statements.  */
+		if (!VECTOR_TYPE_P (TREE_TYPE (lhs)))
+		  continue;
+
+		/* Check that LHS does not have uses outside of STORE_BB.  */
+		res = true;
+		FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
+		  {
+		    gimple *use_stmt;
+		    use_stmt = USE_STMT (use_p);
+		    if (gimple_bb (use_stmt) != store_bb)
+		      {
+			res = false;
+			break;
+		      }
+		  }
+		if (!res)
+		  break;
+
+		if (gimple_vuse (stmt1)
+		    && gimple_vuse (stmt1) != gimple_vuse (last_store))
+		  break;
+
+		/* Can move STMT1 to STORE_BB.  */
+		if (dump_enabled_p ())
+		  {
+		    dump_printf_loc (MSG_NOTE, vect_location,
+				     "Move stmt to created bb\n");
+		    dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt1, 0);
+		  }
+		gsi_from = gsi_for_stmt (stmt1);
+		gsi_to = gsi_start_bb (store_bb);
+		gsi_move_before (&gsi_from, &gsi_to);
+		update_stmt (stmt1);
+	      }
+	    /* Put other masked stores with the same mask to STORE_BB.  */
+	    if (worklist.is_empty ()
+		|| gimple_call_arg (worklist.last (), 2) != mask
+		|| !is_valid_sink (worklist.last (), last_store))
+	      break;
+	    last = worklist.pop ();
+	}
+      add_phi_arg (phi, gimple_vuse (last_store), e, UNKNOWN_LOCATION);
+    }
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 7c6fa73..645257b 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -2019,6 +2019,7 @@ vectorizable_mask_load_store (gimple *stmt, gimple_stmt_iterator *gsi,
     {
       tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE;
       prev_stmt_info = NULL;
+      LOOP_VINFO_HAS_MASK_STORE (loop_vinfo) = true;
       for (i = 0; i < ncopies; i++)
 	{
 	  unsigned align, misalign;
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index c496c4b..3d8fbac 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -604,12 +604,18 @@ vectorize_loops (void)
   for (i = 1; i < vect_loops_num; i++)
     {
       loop_vec_info loop_vinfo;
+      bool has_mask_store;
 
       loop = get_loop (cfun, i);
       if (!loop)
 	continue;
       loop_vinfo = (loop_vec_info) loop->aux;
+      has_mask_store = false;
+      if (loop_vinfo)
+	has_mask_store = LOOP_VINFO_HAS_MASK_STORE (loop_vinfo);
       destroy_loop_vec_info (loop_vinfo, true);
+      if (has_mask_store)
+	optimize_mask_stores (loop);
       loop->aux = NULL;
     }
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index ac68750..9b09f25 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -333,6 +333,9 @@ typedef struct _loop_vec_info : public vec_info {
      loop version without if-conversion.  */
   struct loop *scalar_loop;
 
+  /* Mark loops having masked stores.  */
+  bool has_mask_store;
+
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -368,6 +371,7 @@ typedef struct _loop_vec_info : public vec_info {
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
 #define LOOP_VINFO_SCALAR_LOOP(L)	   (L)->scalar_loop
+#define LOOP_VINFO_HAS_MASK_STORE(L)       (L)->has_mask_store
 #define LOOP_VINFO_SCALAR_ITERATION_COST(L) (L)->scalar_cost_vec
 #define LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST(L) (L)->single_scalar_iteration_cost
 
@@ -1010,6 +1014,7 @@ extern void vect_get_vec_defs (tree, tree, gimple *, vec<tree> *,
 			       vec<tree> *, slp_tree, int);
 extern tree vect_gen_perm_mask_any (tree, const unsigned char *);
 extern tree vect_gen_perm_mask_checked (tree, const unsigned char *);
+extern void optimize_mask_stores (struct loop *);
 
 /* In tree-vect-data-refs.c.  */
 extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR68542]
  2016-01-18 14:50                     ` Yuri Rumyantsev
@ 2016-01-20 12:25                       ` Richard Biener
  2016-01-22 14:29                         ` Yuri Rumyantsev
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Biener @ 2016-01-20 12:25 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin, Kirill Yukhin

On Mon, Jan 18, 2016 at 3:50 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> Here is the second part of patch which really preforms mask stores and
> all statements related to it to new basic block guarded by test on
> zero mask. Hew test is also added.
>
> Is it OK for trunk?

+  /* Pick up all masked stores in loop if any.  */
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      stmt = gsi_stmt (gsi);

you fail to iterate over all BBs of the loop here.  Please follow
other uses in the
vectorizer.

+  while (!worklist.is_empty ())
+    {
+      gimple *last, *last_store;
+      edge e, efalse;
+      tree mask;
+      basic_block store_bb, join_bb;
+      gimple_stmt_iterator gsi_to;
+      /* tree arg3; */

remove

+      tree vdef, new_vdef;
+      gphi *phi;
+      bool first_dump;
+      tree vectype;
+      tree zero;
+
+      last = worklist.pop ();
+      mask = gimple_call_arg (last, 2);
+      /* Create new bb.  */

bb should be initialized from gimple_bb (last), not loop->header

+      e = split_block (bb, last);

+               gsi_from = gsi_for_stmt (stmt1);
+               gsi_to = gsi_start_bb (store_bb);
+               gsi_move_before (&gsi_from, &gsi_to);
+               update_stmt (stmt1);

I think the update_stmt is redundant and you should be able to
keep two gsis throughout the loop, from and to, no?

+           /* Put other masked stores with the same mask to STORE_BB.  */
+           if (worklist.is_empty ()
+               || gimple_call_arg (worklist.last (), 2) != mask
+               || !is_valid_sink (worklist.last (), last_store))

as I understand the code the last check is redundant with the invariant
you track if you verify the stmt you breaked from the inner loop is
actually equal to worklist.last () and you add a flag to track whether
you did visit a load (vuse) in the sinking loop you didn't end up sinking.

+             /* Issue different messages depending on FIRST_DUMP.  */
+             if (first_dump)
+               {
+                 dump_printf_loc (MSG_NOTE, vect_location,
+                                  "Move MASK_STORE to new bb#%d\n",
+                              store_bb->index);
+                 first_dump = false;
+               }
+             else
+               dump_printf_loc (MSG_NOTE, vect_location,
+                                "Move MASK_STORE to created bb\n");

just add a separate dump when you create the BB, "Created new bb#%d for ..."
to avoid this.

Note that I can't comment on the x86 backend part so that will need to
be reviewed by somebody
else.

Thanks,
Richard.

> Thanks.
> Yuri.
>
> 2016-01-18  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> PR middle-end/68542
> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
> comparison with boolean result.
> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
> for vector comparion with eq/ne only.
> * tree-vect-loop.c (is_valid_sink): New function.
> (optimize_mask_stores): Likewise.
> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
> has_mask_store field of vect_info.
> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
> vectorized loops having masked stores.
> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
> correspondent macros.
> (optimize_mask_stores): Add prototype.
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/vect/vect-mask-store-move-1.c: New test.
>
> 2016-01-18 17:07 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Mon, Jan 18, 2016 at 3:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Thanks Richard.
>>>
>>> I changed the check on type as you proposed.
>>>
>>> What about the second back-end part of patch (it has been sent 08.12.15).
>>
>> Can't see it in my inbox - can you reply to the mail with a ping?
>>
>> Thanks,
>> Richard.
>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2016-01-18 15:44 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Mon, Jan 11, 2016 at 11:06 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Hi Richard,
>>>>>
>>>>> Did you have anu chance to look at updated patch?
>>>>
>>>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>>>> index acbb70b..208a752 100644
>>>> --- a/gcc/tree-vrp.c
>>>> +++ b/gcc/tree-vrp.c
>>>> @@ -5771,6 +5771,10 @@ register_edge_assert_for (tree name, edge e,
>>>> gimple_stmt_iterator si,
>>>>                                                 &comp_code, &val))
>>>>      return;
>>>>
>>>> +  /* VRP doesn't track ranges for vector types.  */
>>>> +  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
>>>> +    return;
>>>> +
>>>>
>>>> please instead fix extract_code_and_val_from_cond_with_ops with
>>>>
>>>> Index: gcc/tree-vrp.c
>>>> ===================================================================
>>>> --- gcc/tree-vrp.c      (revision 232506)
>>>> +++ gcc/tree-vrp.c      (working copy)
>>>> @@ -5067,8 +5067,9 @@ extract_code_and_val_from_cond_with_ops
>>>>    if (invert)
>>>>      comp_code = invert_tree_comparison (comp_code, 0);
>>>>
>>>> -  /* VRP does not handle float types.  */
>>>> -  if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (val)))
>>>> +  /* VRP only handles integral and pointer types.  */
>>>> +  if (! INTEGRAL_TYPE_P (TREE_TYPE (val))
>>>> +      && ! POINTER_TYPE_P (TREE_TYPE (val)))
>>>>      return false;
>>>>
>>>>    /* Do not register always-false predicates.
>>>>
>>>> Ok with that change.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> Thanks.
>>>>> Yuri.
>>>>>
>>>>> 2015-12-18 13:20 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>> Hi Richard,
>>>>>>
>>>>>> Here is updated patch for middle-end part of the whole patch which
>>>>>> fixes all your remarks I hope.
>>>>>>
>>>>>> Regression testing and bootstrapping did not show any new failures.
>>>>>> Is it OK for trunk?
>>>>>>
>>>>>> Yuri.
>>>>>>
>>>>>> ChangeLog:
>>>>>> 2015-12-18  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>
>>>>>> PR middle-end/68542
>>>>>> * fold-const.c (fold_binary_op_with_conditional_arg): Bail out for case
>>>>>> of mixind vector and scalar types.
>>>>>> (fold_relational_const): Add handling of vector
>>>>>> comparison with boolean result.
>>>>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>>>>>> comparison of vector operands with boolean result for EQ/NE only.
>>>>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>>>>>> (verify_gimple_cond): Likewise.
>>>>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR68542]
  2016-01-20 12:25                       ` Richard Biener
@ 2016-01-22 14:29                         ` Yuri Rumyantsev
  2016-01-22 14:50                           ` H.J. Lu
  2016-01-28 13:26                           ` Richard Biener
  0 siblings, 2 replies; 20+ messages in thread
From: Yuri Rumyantsev @ 2016-01-22 14:29 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 7996 bytes --]

Richard,

I fixed all remarks pointed by you in vectorizer part of patch. Could
you take a look on modified patch.

Uros,

Could you please review i386 part of patch related to support of
conditional branches with vector comparison.

Bootstrap and regression testing did not show any new failures.
Is it OK for trunk?

Thanks.
Yuri.

ChangeLog:

2016-01-22  Yuri Rumyantsev  <ysrumyan@gmail.com>

PR middle-end/68542
* config/i386/i386.c (ix86_expand_branch): Add support for conditional
brnach with vector comparison.
* config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
for vector comparion with eq/ne only.
(optimize_mask_stores): New function.
* tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
has_mask_store field of vect_info.
* tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
vectorized loops having masked stores after vec_info destroy.
* tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
correspondent macros.
(optimize_mask_stores): Add prototype.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-mask-store-move-1.c: New test.
* testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

2016-01-20 15:24 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Mon, Jan 18, 2016 at 3:50 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> Here is the second part of patch which really preforms mask stores and
>> all statements related to it to new basic block guarded by test on
>> zero mask. Hew test is also added.
>>
>> Is it OK for trunk?
>
> +  /* Pick up all masked stores in loop if any.  */
> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      stmt = gsi_stmt (gsi);
>
> you fail to iterate over all BBs of the loop here.  Please follow
> other uses in the
> vectorizer.
>
> +  while (!worklist.is_empty ())
> +    {
> +      gimple *last, *last_store;
> +      edge e, efalse;
> +      tree mask;
> +      basic_block store_bb, join_bb;
> +      gimple_stmt_iterator gsi_to;
> +      /* tree arg3; */
>
> remove
>
> +      tree vdef, new_vdef;
> +      gphi *phi;
> +      bool first_dump;
> +      tree vectype;
> +      tree zero;
> +
> +      last = worklist.pop ();
> +      mask = gimple_call_arg (last, 2);
> +      /* Create new bb.  */
>
> bb should be initialized from gimple_bb (last), not loop->header
>
> +      e = split_block (bb, last);
>
> +               gsi_from = gsi_for_stmt (stmt1);
> +               gsi_to = gsi_start_bb (store_bb);
> +               gsi_move_before (&gsi_from, &gsi_to);
> +               update_stmt (stmt1);
>
> I think the update_stmt is redundant and you should be able to
> keep two gsis throughout the loop, from and to, no?
>
> +           /* Put other masked stores with the same mask to STORE_BB.  */
> +           if (worklist.is_empty ()
> +               || gimple_call_arg (worklist.last (), 2) != mask
> +               || !is_valid_sink (worklist.last (), last_store))
>
> as I understand the code the last check is redundant with the invariant
> you track if you verify the stmt you breaked from the inner loop is
> actually equal to worklist.last () and you add a flag to track whether
> you did visit a load (vuse) in the sinking loop you didn't end up sinking.
>
> +             /* Issue different messages depending on FIRST_DUMP.  */
> +             if (first_dump)
> +               {
> +                 dump_printf_loc (MSG_NOTE, vect_location,
> +                                  "Move MASK_STORE to new bb#%d\n",
> +                              store_bb->index);
> +                 first_dump = false;
> +               }
> +             else
> +               dump_printf_loc (MSG_NOTE, vect_location,
> +                                "Move MASK_STORE to created bb\n");
>
> just add a separate dump when you create the BB, "Created new bb#%d for ..."
> to avoid this.
>
> Note that I can't comment on the x86 backend part so that will need to
> be reviewed by somebody
> else.
>
> Thanks,
> Richard.
>
>> Thanks.
>> Yuri.
>>
>> 2016-01-18  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> PR middle-end/68542
>> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
>> comparison with boolean result.
>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
>> for vector comparion with eq/ne only.
>> * tree-vect-loop.c (is_valid_sink): New function.
>> (optimize_mask_stores): Likewise.
>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
>> has_mask_store field of vect_info.
>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
>> vectorized loops having masked stores.
>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
>> correspondent macros.
>> (optimize_mask_stores): Add prototype.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.dg/vect/vect-mask-store-move-1.c: New test.
>>
>> 2016-01-18 17:07 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Mon, Jan 18, 2016 at 3:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Thanks Richard.
>>>>
>>>> I changed the check on type as you proposed.
>>>>
>>>> What about the second back-end part of patch (it has been sent 08.12.15).
>>>
>>> Can't see it in my inbox - can you reply to the mail with a ping?
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> Thanks.
>>>> Yuri.
>>>>
>>>> 2016-01-18 15:44 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Mon, Jan 11, 2016 at 11:06 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Hi Richard,
>>>>>>
>>>>>> Did you have anu chance to look at updated patch?
>>>>>
>>>>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>>>>> index acbb70b..208a752 100644
>>>>> --- a/gcc/tree-vrp.c
>>>>> +++ b/gcc/tree-vrp.c
>>>>> @@ -5771,6 +5771,10 @@ register_edge_assert_for (tree name, edge e,
>>>>> gimple_stmt_iterator si,
>>>>>                                                 &comp_code, &val))
>>>>>      return;
>>>>>
>>>>> +  /* VRP doesn't track ranges for vector types.  */
>>>>> +  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
>>>>> +    return;
>>>>> +
>>>>>
>>>>> please instead fix extract_code_and_val_from_cond_with_ops with
>>>>>
>>>>> Index: gcc/tree-vrp.c
>>>>> ===================================================================
>>>>> --- gcc/tree-vrp.c      (revision 232506)
>>>>> +++ gcc/tree-vrp.c      (working copy)
>>>>> @@ -5067,8 +5067,9 @@ extract_code_and_val_from_cond_with_ops
>>>>>    if (invert)
>>>>>      comp_code = invert_tree_comparison (comp_code, 0);
>>>>>
>>>>> -  /* VRP does not handle float types.  */
>>>>> -  if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (val)))
>>>>> +  /* VRP only handles integral and pointer types.  */
>>>>> +  if (! INTEGRAL_TYPE_P (TREE_TYPE (val))
>>>>> +      && ! POINTER_TYPE_P (TREE_TYPE (val)))
>>>>>      return false;
>>>>>
>>>>>    /* Do not register always-false predicates.
>>>>>
>>>>> Ok with that change.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>> Thanks.
>>>>>> Yuri.
>>>>>>
>>>>>> 2015-12-18 13:20 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>> Hi Richard,
>>>>>>>
>>>>>>> Here is updated patch for middle-end part of the whole patch which
>>>>>>> fixes all your remarks I hope.
>>>>>>>
>>>>>>> Regression testing and bootstrapping did not show any new failures.
>>>>>>> Is it OK for trunk?
>>>>>>>
>>>>>>> Yuri.
>>>>>>>
>>>>>>> ChangeLog:
>>>>>>> 2015-12-18  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>
>>>>>>> PR middle-end/68542
>>>>>>> * fold-const.c (fold_binary_op_with_conditional_arg): Bail out for case
>>>>>>> of mixind vector and scalar types.
>>>>>>> (fold_relational_const): Add handling of vector
>>>>>>> comparison with boolean result.
>>>>>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>>>>>>> comparison of vector operands with boolean result for EQ/NE only.
>>>>>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>>>>>>> (verify_gimple_cond): Likewise.
>>>>>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform

[-- Attachment #2: PR68542.patch.2 --]
[-- Type: application/octet-stream, Size: 14104 bytes --]

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 733d0ab..5a07d7c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21669,6 +21669,40 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label)
   machine_mode mode = GET_MODE (op0);
   rtx tmp;
 
+  /* Handle special case - vector comparsion with boolean result, transform
+     it using ptest instruction.  */
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+    {
+      rtx lhs;
+      rtx flag;
+      machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : V2DImode;
+      gcc_assert (code == EQ || code == NE);
+      if (!REG_P (op0))
+	op0 = force_reg (mode, op0);
+      if (!REG_P (op1))
+	op1 = force_reg (mode, op1);
+      /* Generate subtraction since we can't check that one operand is
+	 zero vector.  */
+      lhs = gen_reg_rtx (mode);
+      emit_insn (gen_rtx_SET (lhs, gen_rtx_MINUS (mode, op0, op1)));
+      lhs = gen_rtx_SUBREG (p_mode, lhs, 0);
+      tmp = gen_rtx_SET (gen_rtx_REG (CCmode, FLAGS_REG),
+			 gen_rtx_UNSPEC (CCmode,
+					 gen_rtvec (2, lhs, lhs),
+					 UNSPEC_PTEST));
+      emit_insn (tmp);
+      flag = gen_rtx_REG (CCZmode, FLAGS_REG);
+      tmp = gen_rtx_fmt_ee (code, VOIDmode, flag, const0_rtx);
+      if (code == EQ)
+	tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp,
+				    gen_rtx_LABEL_REF (VOIDmode, label), pc_rtx);
+      else
+	tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp,
+				    pc_rtx, gen_rtx_LABEL_REF (VOIDmode, label));
+      emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
+      return;
+    }
+
   switch (mode)
     {
     case SFmode:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 6740edf..ae0c54e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -18350,6 +18350,25 @@
 	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
   "TARGET_AVX512BW")
 
+(define_expand "cbranch<mode>4"
+  [(set (reg:CC FLAGS_REG)
+	(compare:CC (match_operand:V48_AVX2 1 "nonimmediate_operand")
+		    (match_operand:V48_AVX2 2 "register_operand")))
+   (set (pc) (if_then_else
+	       (match_operator 0 "bt_comparison_operator"
+		[(reg:CC FLAGS_REG) (const_int 0)])
+	       (label_ref (match_operand 3))
+	       (pc)))]
+  "TARGET_AVX2"
+{
+  if (MEM_P (operands[1]) && MEM_P (operands[2]))
+    operands[1] = force_reg (<MODE>mode, operands[1]);
+  ix86_expand_branch (GET_CODE (operands[0]),
+		      operands[1], operands[2], operands[3]);
+  DONE;
+})
+
+
 (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
   [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
 	(unspec:AVX256MODE2P
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
new file mode 100644
index 0000000..4ea4c28
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } } */
+
+#define N 256
+int p1[N], p2[N], p3[N];
+int c[N];
+void foo (int n)
+{
+  int i;
+  for (i=0; i<n; i++)
+    if (c[i]) {
+      p1[i] += 1;
+      p2[i] = p3[i] +2;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
new file mode 100644
index 0000000..f3fc9fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
@@ -0,0 +1,78 @@
+/* { dg-options "-O3 -mavx2 -fdump-tree-vect-details" } */
+/* { dg-require-effective-target avx2 } */
+
+#include "avx2-check.h"
+#define N 32
+int *p1, *p2, *p3;
+int c[N];
+int p1ref[N], p2ref[N];
+
+__attribute__((noinline, noclone)) void foo (int n)
+{
+  int i;
+  for (i=0; i<n; i++)
+    if (c[i]) {
+      p1[i] += 1;
+      p2[i] = p3[i] +2;
+    }
+}
+
+void init ()
+{
+  p1ref[0]=1; p2ref[0]=2;
+  p1ref[1]=3; p2ref[1]=5;
+  p1ref[2]=5; p2ref[2]=8;
+  p1ref[3]=7; p2ref[3]=11;
+  p1ref[4]=9; p2ref[4]=14;
+  p1ref[5]=11; p2ref[5]=17;
+  p1ref[6]=13; p2ref[6]=20;
+  p1ref[7]=15; p2ref[7]=23;
+  p1ref[8]=16; p2ref[8]=8;
+  p1ref[9]=18; p2ref[9]=9;
+  p1ref[10]=20; p2ref[10]=10;
+  p1ref[11]=22; p2ref[11]=11;
+  p1ref[12]=24; p2ref[12]=12;
+  p1ref[13]=26; p2ref[13]=13;
+  p1ref[14]=28; p2ref[14]=14;
+  p1ref[15]=30; p2ref[15]=15;
+  p1ref[16]=33; p2ref[16]=50;
+  p1ref[17]=35; p2ref[17]=53;
+  p1ref[18]=37; p2ref[18]=56;
+  p1ref[19]=39; p2ref[19]=59;
+  p1ref[20]=41; p2ref[20]=62;
+  p1ref[21]=43; p2ref[21]=65;
+  p1ref[22]=45; p2ref[22]=68;
+  p1ref[23]=47; p2ref[23]=71;
+  p1ref[24]=48; p2ref[24]=24;
+  p1ref[25]=50; p2ref[25]=25;
+  p1ref[26]=52; p2ref[26]=26;
+  p1ref[27]=54; p2ref[27]=27;
+  p1ref[28]=56; p2ref[28]=28;
+  p1ref[29]=58; p2ref[29]=29;
+  p1ref[30]=60; p2ref[30]=30;
+  p1ref[31]=62; p2ref[31]=31;
+}
+
+static void
+avx2_test (void)
+{
+  int * P = malloc(N * 3 * sizeof (int));
+  int i;
+
+  p1 = &P[0];
+  p2 = &P[N];
+  p3 = &P[2 * N];
+  for (i=0; i<N; i++) {
+    p1[i] = i + i;
+    p3[i] = i * 3;
+    p2[i] = i;
+    c[i] = (i >> 3) & 1? 0: 1;
+  }
+  init ();
+  foo (N);
+  for (i=0; i<N;i++)
+    if (p1[i] != p1ref[i] || p2[i] != p2ref[i])
+      abort ();
+}
+
+/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 77ad760..2573c6d 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6927,3 +6927,196 @@ vect_transform_loop (loop_vec_info loop_vinfo)
       dump_printf (MSG_NOTE, "\n");
     }
 }
+
+/* The code below is trying to perform simple optimization - revert
+   if-conversion for masked stores, i.e. if the mask of a store is zero
+   do not perform it and all stored value producers also if possible.
+   For example,
+     for (i=0; i<n; i++)
+       if (c[i]) {
+	 p1[i] += 1;
+	 p2[i] = p3[i] +2;
+     }
+   this transformation will produce the following semi-hammock:
+
+   if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
+     vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165);
+     vect__12.22_172 = vect__11.19_170 + vect_cst__171;
+     MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172);
+     vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165);
+     vect__19.28_184 = vect__18.25_182 + vect_cst__183;
+     MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184);
+   }
+*/
+
+void
+optimize_mask_stores (struct loop *loop)
+{
+  basic_block *bbs = get_loop_body (loop);
+  unsigned nbbs = loop->num_nodes;
+  unsigned i;
+  basic_block bb;
+  gimple_stmt_iterator gsi;
+  gimple *stmt, *stmt1 = NULL;
+  auto_vec<gimple *> worklist;
+
+  vect_location = find_loop_location (loop);
+  /* Pick up all masked stores in loop if any.  */
+  for (i = 0; i < nbbs; i++)
+    {
+      bb = bbs[i];
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+	   gsi_next (&gsi))
+	{
+	  stmt = gsi_stmt (gsi);
+	  if (is_gimple_call (stmt)
+	      && gimple_call_internal_p (stmt)
+	      && gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
+	    worklist.safe_push (stmt);
+	}
+    }
+
+  if (worklist.is_empty ())
+    return;
+
+  /* Loop has masked stores.  */
+  while (!worklist.is_empty ())
+    {
+      gimple *last, *last_store;
+      edge e, efalse;
+      tree mask;
+      basic_block store_bb, join_bb;
+      gimple_stmt_iterator gsi_to;
+      tree vdef, new_vdef;
+      gphi *phi;
+      tree vectype;
+      tree zero;
+
+      last = worklist.pop ();
+      mask = gimple_call_arg (last, 2);
+      bb = gimple_bb (last);
+      /* Create new bb.  */
+      e = split_block (bb, last);
+      join_bb = e->dest;
+      store_bb = create_empty_bb (bb);
+      add_bb_to_loop (store_bb, loop);
+      e->flags = EDGE_TRUE_VALUE;
+      efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE);
+      /* Put STORE_BB to likely part.  */
+      efalse->probability = PROB_UNLIKELY;
+      store_bb->frequency = PROB_ALWAYS - EDGE_FREQUENCY (efalse);
+      make_edge (store_bb, join_bb, EDGE_FALLTHRU);
+      if (dom_info_available_p (CDI_DOMINATORS))
+	set_immediate_dominator (CDI_DOMINATORS, store_bb, bb);
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "Create new block %d to sink mask stores.",
+			 store_bb->index);
+      /* Create vector comparison with boolean result.  */
+      vectype = TREE_TYPE (mask);
+      zero = build_zero_cst (vectype);
+      stmt = gimple_build_cond (EQ_EXPR, mask, zero, NULL_TREE, NULL_TREE);
+      gsi = gsi_last_bb (bb);
+      gsi_insert_after (&gsi, stmt, GSI_SAME_STMT);
+      /* Create new PHI node for vdef of the last masked store:
+	 .MEM_2 = VDEF <.MEM_1>
+	 will be converted to
+	 .MEM.3 = VDEF <.MEM_1>
+	 and new PHI node will be created in join bb
+	 .MEM_2 = PHI <.MEM_1, .MEM_3>
+      */
+      vdef = gimple_vdef (last);
+      new_vdef = make_ssa_name (gimple_vop (cfun), last);
+      gimple_set_vdef (last, new_vdef);
+      phi = create_phi_node (vdef, join_bb);
+      add_phi_arg (phi, new_vdef, EDGE_SUCC (store_bb, 0), UNKNOWN_LOCATION);
+
+      /* Put all masked stores with the same mask to STORE_BB if possible.  */
+      while (true)
+	{
+	  gimple_stmt_iterator gsi_from;
+	  /* Move masked store to STORE_BB.  */
+	  last_store = last;
+	  gsi = gsi_for_stmt (last);
+	  gsi_from = gsi;
+	  /* Shift GSI to the previous stmt for further traversal.  */
+	  gsi_prev (&gsi);
+	  gsi_to = gsi_start_bb (store_bb);
+	  gsi_move_before (&gsi_from, &gsi_to);
+	  /* Setup GSI_TO to the non-empty block start.  */
+	  gsi_to = gsi_start_bb (store_bb);
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_NOTE, vect_location,
+			       "Move stmt to created bb\n");
+	      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, last, 0);
+	    }
+	    /* Move all stored value producers if possible.  */
+	    while (!gsi_end_p (gsi))
+	      {
+		tree lhs;
+		imm_use_iterator imm_iter;
+		use_operand_p use_p;
+		bool res;
+		stmt1 = gsi_stmt (gsi);
+		/* Do not consider statements writing to memory.  */
+		if (gimple_vdef (stmt1))
+		  break;
+		gsi_from = gsi;
+		gsi_prev (&gsi);
+		if (is_gimple_call (stmt1))
+		  lhs = gimple_call_lhs (stmt1);
+		else
+		  if (gimple_code (stmt1) == GIMPLE_ASSIGN)
+		    lhs = gimple_assign_lhs (stmt1);
+		  else
+		    break;
+
+		/* LHS of vectorized stmt must be SSA_NAME.  */
+		if (TREE_CODE (lhs) != SSA_NAME)
+		  break;
+
+		/* Skip scalar statements.  */
+		if (!VECTOR_TYPE_P (TREE_TYPE (lhs)))
+		  continue;
+
+		/* Check that LHS does not have uses outside of STORE_BB.  */
+		res = true;
+		FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
+		  {
+		    gimple *use_stmt;
+		    use_stmt = USE_STMT (use_p);
+		    if (gimple_bb (use_stmt) != store_bb)
+		      {
+			res = false;
+			break;
+		      }
+		  }
+		if (!res)
+		  break;
+
+		if (gimple_vuse (stmt1)
+		    && gimple_vuse (stmt1) != gimple_vuse (last_store))
+		  break;
+
+		/* Can move STMT1 to STORE_BB.  */
+		if (dump_enabled_p ())
+		  {
+		    dump_printf_loc (MSG_NOTE, vect_location,
+				     "Move stmt to created bb\n");
+		    dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt1, 0);
+		  }
+		gsi_move_before (&gsi_from, &gsi_to);
+		/* Shift GSI_TO for further insertion. */
+		gsi_prev (&gsi_to);
+	      }
+	    /* Put other masked stores with the same mask to STORE_BB.  */
+	    if (worklist.is_empty ()
+		|| gimple_call_arg (worklist.last (), 2) != mask
+		|| worklist.last () != stmt1)
+	      break;
+	    last = worklist.pop ();
+	}
+      add_phi_arg (phi, gimple_vuse (last_store), e, UNKNOWN_LOCATION);
+    }
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 7c6fa73..645257b 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -2019,6 +2019,7 @@ vectorizable_mask_load_store (gimple *stmt, gimple_stmt_iterator *gsi,
     {
       tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE;
       prev_stmt_info = NULL;
+      LOOP_VINFO_HAS_MASK_STORE (loop_vinfo) = true;
       for (i = 0; i < ncopies; i++)
 	{
 	  unsigned align, misalign;
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index c496c4b..3d8fbac 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -604,12 +604,18 @@ vectorize_loops (void)
   for (i = 1; i < vect_loops_num; i++)
     {
       loop_vec_info loop_vinfo;
+      bool has_mask_store;
 
       loop = get_loop (cfun, i);
       if (!loop)
 	continue;
       loop_vinfo = (loop_vec_info) loop->aux;
+      has_mask_store = false;
+      if (loop_vinfo)
+	has_mask_store = LOOP_VINFO_HAS_MASK_STORE (loop_vinfo);
       destroy_loop_vec_info (loop_vinfo, true);
+      if (has_mask_store)
+	optimize_mask_stores (loop);
       loop->aux = NULL;
     }
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index ac68750..b68ecd8 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -333,6 +333,9 @@ typedef struct _loop_vec_info : public vec_info {
      loop version without if-conversion.  */
   struct loop *scalar_loop;
 
+  /* Mark loops having masked stores.  */
+  bool has_mask_store;
+
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -368,6 +371,7 @@ typedef struct _loop_vec_info : public vec_info {
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
 #define LOOP_VINFO_SCALAR_LOOP(L)	   (L)->scalar_loop
+#define LOOP_VINFO_HAS_MASK_STORE(L)       (L)->has_mask_store
 #define LOOP_VINFO_SCALAR_ITERATION_COST(L) (L)->scalar_cost_vec
 #define LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST(L) (L)->single_scalar_iteration_cost
 
@@ -1010,6 +1014,7 @@ extern void vect_get_vec_defs (tree, tree, gimple *, vec<tree> *,
 			       vec<tree> *, slp_tree, int);
 extern tree vect_gen_perm_mask_any (tree, const unsigned char *);
 extern tree vect_gen_perm_mask_checked (tree, const unsigned char *);
+extern void optimize_mask_stores (struct loop*);
 
 /* In tree-vect-data-refs.c.  */
 extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR68542]
  2016-01-22 14:29                         ` Yuri Rumyantsev
@ 2016-01-22 14:50                           ` H.J. Lu
  2016-01-28 13:26                           ` Richard Biener
  1 sibling, 0 replies; 20+ messages in thread
From: H.J. Lu @ 2016-01-22 14:50 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: Richard Biener, gcc-patches, Igor Zamyatin, Kirill Yukhin

On Fri, Jan 22, 2016 at 6:29 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> I fixed all remarks pointed by you in vectorizer part of patch. Could
> you take a look on modified patch.
>
> Uros,
>
> Could you please review i386 part of patch related to support of
> conditional branches with vector comparison.
>
> Bootstrap and regression testing did not show any new failures.
> Is it OK for trunk?
>
> Thanks.
> Yuri.
>
> ChangeLog:
>
> 2016-01-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> PR middle-end/68542
> * config/i386/i386.c (ix86_expand_branch): Add support for conditional
> brnach with vector comparison.
  ^^^^^^^^^ Typo.
> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
> for vector comparion with eq/ne only.
> (optimize_mask_stores): New function.
> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
> has_mask_store field of vect_info.
> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
> vectorized loops having masked stores after vec_info destroy.
> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
> correspondent macros.
> (optimize_mask_stores): Add prototype.
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/vect/vect-mask-store-move-1.c: New test.
> * testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c: New test.
>


-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR68542]
  2016-01-22 14:29                         ` Yuri Rumyantsev
  2016-01-22 14:50                           ` H.J. Lu
@ 2016-01-28 13:26                           ` Richard Biener
  2016-01-28 13:37                             ` Yuri Rumyantsev
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Biener @ 2016-01-28 13:26 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin, Kirill Yukhin

On Fri, Jan 22, 2016 at 3:29 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> I fixed all remarks pointed by you in vectorizer part of patch. Could
> you take a look on modified patch.

+               if (is_gimple_call (stmt1))
+                 lhs = gimple_call_lhs (stmt1);
+               else
+                 if (gimple_code (stmt1) == GIMPLE_ASSIGN)
+                   lhs = gimple_assign_lhs (stmt1);
+                 else
+                   break;

use

  lhs = gimple_get_lhs (stmt1);
  if (!lhs)
    break;

you forget to free bbs, I suggest to do it after seeding the worklist.

Ok with those changes if the backend changes are approved.

Sorry for the delay in reviewing this.

Thanks,
Richard.

> Uros,
>
> Could you please review i386 part of patch related to support of
> conditional branches with vector comparison.
>
> Bootstrap and regression testing did not show any new failures.
> Is it OK for trunk?
>
> Thanks.
> Yuri.
>
> ChangeLog:
>
> 2016-01-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> PR middle-end/68542
> * config/i386/i386.c (ix86_expand_branch): Add support for conditional
> brnach with vector comparison.
> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
> for vector comparion with eq/ne only.
> (optimize_mask_stores): New function.
> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
> has_mask_store field of vect_info.
> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
> vectorized loops having masked stores after vec_info destroy.
> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
> correspondent macros.
> (optimize_mask_stores): Add prototype.
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/vect/vect-mask-store-move-1.c: New test.
> * testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c: New test.
>
> 2016-01-20 15:24 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Mon, Jan 18, 2016 at 3:50 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> Here is the second part of patch which really preforms mask stores and
>>> all statements related to it to new basic block guarded by test on
>>> zero mask. Hew test is also added.
>>>
>>> Is it OK for trunk?
>>
>> +  /* Pick up all masked stores in loop if any.  */
>> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> +    {
>> +      stmt = gsi_stmt (gsi);
>>
>> you fail to iterate over all BBs of the loop here.  Please follow
>> other uses in the
>> vectorizer.
>>
>> +  while (!worklist.is_empty ())
>> +    {
>> +      gimple *last, *last_store;
>> +      edge e, efalse;
>> +      tree mask;
>> +      basic_block store_bb, join_bb;
>> +      gimple_stmt_iterator gsi_to;
>> +      /* tree arg3; */
>>
>> remove
>>
>> +      tree vdef, new_vdef;
>> +      gphi *phi;
>> +      bool first_dump;
>> +      tree vectype;
>> +      tree zero;
>> +
>> +      last = worklist.pop ();
>> +      mask = gimple_call_arg (last, 2);
>> +      /* Create new bb.  */
>>
>> bb should be initialized from gimple_bb (last), not loop->header
>>
>> +      e = split_block (bb, last);
>>
>> +               gsi_from = gsi_for_stmt (stmt1);
>> +               gsi_to = gsi_start_bb (store_bb);
>> +               gsi_move_before (&gsi_from, &gsi_to);
>> +               update_stmt (stmt1);
>>
>> I think the update_stmt is redundant and you should be able to
>> keep two gsis throughout the loop, from and to, no?
>>
>> +           /* Put other masked stores with the same mask to STORE_BB.  */
>> +           if (worklist.is_empty ()
>> +               || gimple_call_arg (worklist.last (), 2) != mask
>> +               || !is_valid_sink (worklist.last (), last_store))
>>
>> as I understand the code the last check is redundant with the invariant
>> you track if you verify the stmt you breaked from the inner loop is
>> actually equal to worklist.last () and you add a flag to track whether
>> you did visit a load (vuse) in the sinking loop you didn't end up sinking.
>>
>> +             /* Issue different messages depending on FIRST_DUMP.  */
>> +             if (first_dump)
>> +               {
>> +                 dump_printf_loc (MSG_NOTE, vect_location,
>> +                                  "Move MASK_STORE to new bb#%d\n",
>> +                              store_bb->index);
>> +                 first_dump = false;
>> +               }
>> +             else
>> +               dump_printf_loc (MSG_NOTE, vect_location,
>> +                                "Move MASK_STORE to created bb\n");
>>
>> just add a separate dump when you create the BB, "Created new bb#%d for ..."
>> to avoid this.
>>
>> Note that I can't comment on the x86 backend part so that will need to
>> be reviewed by somebody
>> else.
>>
>> Thanks,
>> Richard.
>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2016-01-18  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> PR middle-end/68542
>>> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
>>> comparison with boolean result.
>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
>>> for vector comparion with eq/ne only.
>>> * tree-vect-loop.c (is_valid_sink): New function.
>>> (optimize_mask_stores): Likewise.
>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
>>> has_mask_store field of vect_info.
>>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
>>> vectorized loops having masked stores.
>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
>>> correspondent macros.
>>> (optimize_mask_stores): Add prototype.
>>>
>>> gcc/testsuite/ChangeLog:
>>> * gcc.dg/vect/vect-mask-store-move-1.c: New test.
>>>
>>> 2016-01-18 17:07 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Mon, Jan 18, 2016 at 3:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Thanks Richard.
>>>>>
>>>>> I changed the check on type as you proposed.
>>>>>
>>>>> What about the second back-end part of patch (it has been sent 08.12.15).
>>>>
>>>> Can't see it in my inbox - can you reply to the mail with a ping?
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> Thanks.
>>>>> Yuri.
>>>>>
>>>>> 2016-01-18 15:44 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Mon, Jan 11, 2016 at 11:06 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Hi Richard,
>>>>>>>
>>>>>>> Did you have anu chance to look at updated patch?
>>>>>>
>>>>>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>>>>>> index acbb70b..208a752 100644
>>>>>> --- a/gcc/tree-vrp.c
>>>>>> +++ b/gcc/tree-vrp.c
>>>>>> @@ -5771,6 +5771,10 @@ register_edge_assert_for (tree name, edge e,
>>>>>> gimple_stmt_iterator si,
>>>>>>                                                 &comp_code, &val))
>>>>>>      return;
>>>>>>
>>>>>> +  /* VRP doesn't track ranges for vector types.  */
>>>>>> +  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
>>>>>> +    return;
>>>>>> +
>>>>>>
>>>>>> please instead fix extract_code_and_val_from_cond_with_ops with
>>>>>>
>>>>>> Index: gcc/tree-vrp.c
>>>>>> ===================================================================
>>>>>> --- gcc/tree-vrp.c      (revision 232506)
>>>>>> +++ gcc/tree-vrp.c      (working copy)
>>>>>> @@ -5067,8 +5067,9 @@ extract_code_and_val_from_cond_with_ops
>>>>>>    if (invert)
>>>>>>      comp_code = invert_tree_comparison (comp_code, 0);
>>>>>>
>>>>>> -  /* VRP does not handle float types.  */
>>>>>> -  if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (val)))
>>>>>> +  /* VRP only handles integral and pointer types.  */
>>>>>> +  if (! INTEGRAL_TYPE_P (TREE_TYPE (val))
>>>>>> +      && ! POINTER_TYPE_P (TREE_TYPE (val)))
>>>>>>      return false;
>>>>>>
>>>>>>    /* Do not register always-false predicates.
>>>>>>
>>>>>> Ok with that change.
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>> Thanks.
>>>>>>> Yuri.
>>>>>>>
>>>>>>> 2015-12-18 13:20 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>> Hi Richard,
>>>>>>>>
>>>>>>>> Here is updated patch for middle-end part of the whole patch which
>>>>>>>> fixes all your remarks I hope.
>>>>>>>>
>>>>>>>> Regression testing and bootstrapping did not show any new failures.
>>>>>>>> Is it OK for trunk?
>>>>>>>>
>>>>>>>> Yuri.
>>>>>>>>
>>>>>>>> ChangeLog:
>>>>>>>> 2015-12-18  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>
>>>>>>>> PR middle-end/68542
>>>>>>>> * fold-const.c (fold_binary_op_with_conditional_arg): Bail out for case
>>>>>>>> of mixind vector and scalar types.
>>>>>>>> (fold_relational_const): Add handling of vector
>>>>>>>> comparison with boolean result.
>>>>>>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>>>>>>>> comparison of vector operands with boolean result for EQ/NE only.
>>>>>>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>>>>>>>> (verify_gimple_cond): Likewise.
>>>>>>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR68542]
  2016-01-28 13:26                           ` Richard Biener
@ 2016-01-28 13:37                             ` Yuri Rumyantsev
  2016-01-28 14:24                               ` Uros Bizjak
  0 siblings, 1 reply; 20+ messages in thread
From: Yuri Rumyantsev @ 2016-01-28 13:37 UTC (permalink / raw)
  To: Richard Biener, Uros Bizjak; +Cc: gcc-patches, Igor Zamyatin, Kirill Yukhin

Thanks Richard.

Uros,

Could you please review back-end part of this patch?

Thanks.
Yuri.

2016-01-28 16:26 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Fri, Jan 22, 2016 at 3:29 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> I fixed all remarks pointed by you in vectorizer part of patch. Could
>> you take a look on modified patch.
>
> +               if (is_gimple_call (stmt1))
> +                 lhs = gimple_call_lhs (stmt1);
> +               else
> +                 if (gimple_code (stmt1) == GIMPLE_ASSIGN)
> +                   lhs = gimple_assign_lhs (stmt1);
> +                 else
> +                   break;
>
> use
>
>   lhs = gimple_get_lhs (stmt1);
>   if (!lhs)
>     break;
>
> you forget to free bbs, I suggest to do it after seeding the worklist.
>
> Ok with those changes if the backend changes are approved.
>
> Sorry for the delay in reviewing this.
>
> Thanks,
> Richard.
>
>> Uros,
>>
>> Could you please review i386 part of patch related to support of
>> conditional branches with vector comparison.
>>
>> Bootstrap and regression testing did not show any new failures.
>> Is it OK for trunk?
>>
>> Thanks.
>> Yuri.
>>
>> ChangeLog:
>>
>> 2016-01-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> PR middle-end/68542
>> * config/i386/i386.c (ix86_expand_branch): Add support for conditional
>> brnach with vector comparison.
>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
>> for vector comparion with eq/ne only.
>> (optimize_mask_stores): New function.
>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
>> has_mask_store field of vect_info.
>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
>> vectorized loops having masked stores after vec_info destroy.
>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
>> correspondent macros.
>> (optimize_mask_stores): Add prototype.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.dg/vect/vect-mask-store-move-1.c: New test.
>> * testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c: New test.
>>
>> 2016-01-20 15:24 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Mon, Jan 18, 2016 at 3:50 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> Here is the second part of patch which really preforms mask stores and
>>>> all statements related to it to new basic block guarded by test on
>>>> zero mask. Hew test is also added.
>>>>
>>>> Is it OK for trunk?
>>>
>>> +  /* Pick up all masked stores in loop if any.  */
>>> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>> +    {
>>> +      stmt = gsi_stmt (gsi);
>>>
>>> you fail to iterate over all BBs of the loop here.  Please follow
>>> other uses in the
>>> vectorizer.
>>>
>>> +  while (!worklist.is_empty ())
>>> +    {
>>> +      gimple *last, *last_store;
>>> +      edge e, efalse;
>>> +      tree mask;
>>> +      basic_block store_bb, join_bb;
>>> +      gimple_stmt_iterator gsi_to;
>>> +      /* tree arg3; */
>>>
>>> remove
>>>
>>> +      tree vdef, new_vdef;
>>> +      gphi *phi;
>>> +      bool first_dump;
>>> +      tree vectype;
>>> +      tree zero;
>>> +
>>> +      last = worklist.pop ();
>>> +      mask = gimple_call_arg (last, 2);
>>> +      /* Create new bb.  */
>>>
>>> bb should be initialized from gimple_bb (last), not loop->header
>>>
>>> +      e = split_block (bb, last);
>>>
>>> +               gsi_from = gsi_for_stmt (stmt1);
>>> +               gsi_to = gsi_start_bb (store_bb);
>>> +               gsi_move_before (&gsi_from, &gsi_to);
>>> +               update_stmt (stmt1);
>>>
>>> I think the update_stmt is redundant and you should be able to
>>> keep two gsis throughout the loop, from and to, no?
>>>
>>> +           /* Put other masked stores with the same mask to STORE_BB.  */
>>> +           if (worklist.is_empty ()
>>> +               || gimple_call_arg (worklist.last (), 2) != mask
>>> +               || !is_valid_sink (worklist.last (), last_store))
>>>
>>> as I understand the code the last check is redundant with the invariant
>>> you track if you verify the stmt you breaked from the inner loop is
>>> actually equal to worklist.last () and you add a flag to track whether
>>> you did visit a load (vuse) in the sinking loop you didn't end up sinking.
>>>
>>> +             /* Issue different messages depending on FIRST_DUMP.  */
>>> +             if (first_dump)
>>> +               {
>>> +                 dump_printf_loc (MSG_NOTE, vect_location,
>>> +                                  "Move MASK_STORE to new bb#%d\n",
>>> +                              store_bb->index);
>>> +                 first_dump = false;
>>> +               }
>>> +             else
>>> +               dump_printf_loc (MSG_NOTE, vect_location,
>>> +                                "Move MASK_STORE to created bb\n");
>>>
>>> just add a separate dump when you create the BB, "Created new bb#%d for ..."
>>> to avoid this.
>>>
>>> Note that I can't comment on the x86 backend part so that will need to
>>> be reviewed by somebody
>>> else.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> Thanks.
>>>> Yuri.
>>>>
>>>> 2016-01-18  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> PR middle-end/68542
>>>> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
>>>> comparison with boolean result.
>>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
>>>> for vector comparion with eq/ne only.
>>>> * tree-vect-loop.c (is_valid_sink): New function.
>>>> (optimize_mask_stores): Likewise.
>>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
>>>> has_mask_store field of vect_info.
>>>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
>>>> vectorized loops having masked stores.
>>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
>>>> correspondent macros.
>>>> (optimize_mask_stores): Add prototype.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>> * gcc.dg/vect/vect-mask-store-move-1.c: New test.
>>>>
>>>> 2016-01-18 17:07 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Mon, Jan 18, 2016 at 3:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Thanks Richard.
>>>>>>
>>>>>> I changed the check on type as you proposed.
>>>>>>
>>>>>> What about the second back-end part of patch (it has been sent 08.12.15).
>>>>>
>>>>> Can't see it in my inbox - can you reply to the mail with a ping?
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>> Thanks.
>>>>>> Yuri.
>>>>>>
>>>>>> 2016-01-18 15:44 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Mon, Jan 11, 2016 at 11:06 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Hi Richard,
>>>>>>>>
>>>>>>>> Did you have anu chance to look at updated patch?
>>>>>>>
>>>>>>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>>>>>>> index acbb70b..208a752 100644
>>>>>>> --- a/gcc/tree-vrp.c
>>>>>>> +++ b/gcc/tree-vrp.c
>>>>>>> @@ -5771,6 +5771,10 @@ register_edge_assert_for (tree name, edge e,
>>>>>>> gimple_stmt_iterator si,
>>>>>>>                                                 &comp_code, &val))
>>>>>>>      return;
>>>>>>>
>>>>>>> +  /* VRP doesn't track ranges for vector types.  */
>>>>>>> +  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
>>>>>>> +    return;
>>>>>>> +
>>>>>>>
>>>>>>> please instead fix extract_code_and_val_from_cond_with_ops with
>>>>>>>
>>>>>>> Index: gcc/tree-vrp.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/tree-vrp.c      (revision 232506)
>>>>>>> +++ gcc/tree-vrp.c      (working copy)
>>>>>>> @@ -5067,8 +5067,9 @@ extract_code_and_val_from_cond_with_ops
>>>>>>>    if (invert)
>>>>>>>      comp_code = invert_tree_comparison (comp_code, 0);
>>>>>>>
>>>>>>> -  /* VRP does not handle float types.  */
>>>>>>> -  if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (val)))
>>>>>>> +  /* VRP only handles integral and pointer types.  */
>>>>>>> +  if (! INTEGRAL_TYPE_P (TREE_TYPE (val))
>>>>>>> +      && ! POINTER_TYPE_P (TREE_TYPE (val)))
>>>>>>>      return false;
>>>>>>>
>>>>>>>    /* Do not register always-false predicates.
>>>>>>>
>>>>>>> Ok with that change.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Thanks.
>>>>>>>> Yuri.
>>>>>>>>
>>>>>>>> 2015-12-18 13:20 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>> Hi Richard,
>>>>>>>>>
>>>>>>>>> Here is updated patch for middle-end part of the whole patch which
>>>>>>>>> fixes all your remarks I hope.
>>>>>>>>>
>>>>>>>>> Regression testing and bootstrapping did not show any new failures.
>>>>>>>>> Is it OK for trunk?
>>>>>>>>>
>>>>>>>>> Yuri.
>>>>>>>>>
>>>>>>>>> ChangeLog:
>>>>>>>>> 2015-12-18  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>
>>>>>>>>> PR middle-end/68542
>>>>>>>>> * fold-const.c (fold_binary_op_with_conditional_arg): Bail out for case
>>>>>>>>> of mixind vector and scalar types.
>>>>>>>>> (fold_relational_const): Add handling of vector
>>>>>>>>> comparison with boolean result.
>>>>>>>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>>>>>>>>> comparison of vector operands with boolean result for EQ/NE only.
>>>>>>>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>>>>>>>>> (verify_gimple_cond): Likewise.
>>>>>>>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR68542]
  2016-01-28 13:37                             ` Yuri Rumyantsev
@ 2016-01-28 14:24                               ` Uros Bizjak
  0 siblings, 0 replies; 20+ messages in thread
From: Uros Bizjak @ 2016-01-28 14:24 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: Richard Biener, gcc-patches, Igor Zamyatin, Kirill Yukhin

On Thu, Jan 28, 2016 at 2:37 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Thanks Richard.
>
> Uros,
>
> Could you please review back-end part of this patch?

No problem, but please in future CC me on the x86 patches, so I won't forgot.

+(define_expand "cbranch<mode>4"
+  [(set (reg:CC FLAGS_REG)
+    (compare:CC (match_operand:V48_AVX2 1 "nonimmediate_operand")
+            (match_operand:V48_AVX2 2 "register_operand")))
+

Please swap predicates, operand 1 should be register_operand and
operand 2 nonimmediate operand.

+   (set (pc) (if_then_else
+           (match_operator 0 "bt_comparison_operator"
+        [(reg:CC FLAGS_REG) (const_int 0)])
+           (label_ref (match_operand 3))
+           (pc)))]
+  "TARGET_AVX2"

PTEST was introduced with SSE4_1, I see no reason to limit this
transformation to AVX2. However, since you check MODE_VECTOR_INT in
the expander, it looks that the pattern should only handle integer
modes.

+  /* Handle special case - vector comparsion with boolean result, transform
+     it using ptest instruction.  */
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+    {
+      rtx lhs;

No need for the above variable, we will use tmp here.

+      rtx flag;
+      machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : V2DImode;

space here.

+      gcc_assert (code == EQ || code == NE);

+      if (!REG_P (op0))
+    op0 = force_reg (mode, op0);
+      if (!REG_P (op1))
+    op1 = force_reg (mode, op1);

No need for the above fixups, predicates already did the job.

+      /* Generate subtraction since we can't check that one operand is
+     zero vector.  */
+      lhs = gen_reg_rtx (mode);

tmp = ...

+      emit_insn (gen_rtx_SET (lhs, gen_rtx_MINUS (mode, op0, op1)));

Since we are only interested in equality, should we rather use XOR,
since it is commutative operator.

+      lhs = gen_rtx_SUBREG (p_mode, lhs, 0);

tmp = gen_lowpart (p_mode, tmp);

+      tmp = gen_rtx_SET (gen_rtx_REG (CCmode, FLAGS_REG),
+             gen_rtx_UNSPEC (CCmode,
+                     gen_rtvec (2, lhs, lhs),
+                     UNSPEC_PTEST));
+      emit_insn (tmp);

There is no need for a temporary here, just use

emit_insn (gen_rtx_SET ( ... ) ...

+      tmp = gen_rtx_fmt_ee (code, VOIDmode, flag, const0_rtx);
+      if (code == EQ)
+    tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp,
+                    gen_rtx_LABEL_REF (VOIDmode, label), pc_rtx);
+      else
+    tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp,
+                    pc_rtx, gen_rtx_LABEL_REF (VOIDmode, label));
+      emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
+      return;

Oh.... please use something involving std::swap here.

Uros.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-01-28 14:24 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-30 13:24 [PATCH PR68542] Yuri Rumyantsev
2015-12-04 12:18 ` Richard Biener
2015-12-04 15:07   ` Yuri Rumyantsev
2015-12-07 10:57     ` Yuri Rumyantsev
2015-12-08 12:34       ` Yuri Rumyantsev
2015-12-10 13:36     ` Richard Biener
2015-12-11 14:03       ` Yuri Rumyantsev
2015-12-16 13:37         ` Richard Biener
2015-12-18 10:20           ` Yuri Rumyantsev
2016-01-11 10:06             ` Yuri Rumyantsev
2016-01-18 12:44               ` Richard Biener
2016-01-18 14:02                 ` Yuri Rumyantsev
2016-01-18 14:07                   ` Richard Biener
2016-01-18 14:50                     ` Yuri Rumyantsev
2016-01-20 12:25                       ` Richard Biener
2016-01-22 14:29                         ` Yuri Rumyantsev
2016-01-22 14:50                           ` H.J. Lu
2016-01-28 13:26                           ` Richard Biener
2016-01-28 13:37                             ` Yuri Rumyantsev
2016-01-28 14:24                               ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).