public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [0/5] Don't defer vector type choice for bools (PR 92596)
@ 2019-11-29 10:11 Richard Sandiford
  2019-11-29 10:13 ` [1/5] Improve tree-vect-patterns.c handling of boolean comparisons Richard Sandiford
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Richard Sandiford @ 2019-11-29 10:11 UTC (permalink / raw)
  To: gcc-patches

When vectorising a comparison between N-bit integers or N-bit floats,
we want the boolean result to use the vector mask type for N-bit elements.
On most targets this is a vector of N-bit integers, but for SVE it's
a vector predicate and on AVX512 it's a scalar integer mask.

On the other hand, when loading or storing an M-byte boolean, we want to
treat it like any other M-byte integer type.

This difference leads to some complicated handling.  E.g. booleean logic
ops fed by two N-bit comparisons should use a vector mask for N-bit
elements.  But boolean logic ops fed by two M-byte data loads should
use normal M-byte integer vectors.  Boolean logic ops fed by an N-bit
comparison and an M-bit comparison need to convert one of the inputs
first (handled via pattern stmts).  Boolean logic ops fed by an N-bit
comparison and a load are not yet supported.  Etc.

Historically we've tried to make this choice on the fly.  This has
two major downsides:

(a) search_type_for_mask has to use a worklist to find the mask type for
    a particular operation.  The results are not cached between calls,
    so this is a potential source of quadratic behavior.

(b) we can only choose the vector type for a boolean result once
    we know the vector types of the inputs.  So both the loop and
    SLP vectorisers make another pass for boolean types.

The second example in PR 92596 is another case in which (b) causes
problems.  I tried various non-invasive ways of working around it,
but although they worked for the testcase and testsuite, it was easy
to see that they were flaky and would probably cause problems later.
In the end I think the best fix is to stop trying to make this decision
on the fly and record it in the stmt_vec_info instead.

Obviously it's not ideal to be doing something like this in stage 3,
but it is a bug fix and I think it will make bool-related problems
easier to handle in future.

Each patch tested individually on aarch64-linux-gnu and the series
as a whole on x86_64-linux-gnu.  OK to install?

Richard

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [1/5] Improve tree-vect-patterns.c handling of boolean comparisons
  2019-11-29 10:11 [0/5] Don't defer vector type choice for bools (PR 92596) Richard Sandiford
@ 2019-11-29 10:13 ` Richard Sandiford
  2019-11-29 10:53   ` Richard Biener
  2019-11-29 10:14 ` [2/5] Make vectorizable_operation punt early on codes it doesn't handle Richard Sandiford
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Richard Sandiford @ 2019-11-29 10:13 UTC (permalink / raw)
  To: gcc-patches

vect_recog_bool_pattern assumed that a comparison between two booleans
should always become a comparison of vector mask types (implemented as an
XOR_EXPR).  But if the booleans in question are generated as data values
(e.g. because they're loaded directly from memory), we should treat them
like ordinary integers instead, just as we do for boolean logic ops whose
operands are loaded from memory.  vect_get_mask_type_for_stmt already
handled this case:

      /* We may compare boolean value loaded as vector of integers.
	 Fix mask_type in such case.  */
      if (mask_type
	  && !VECTOR_BOOLEAN_TYPE_P (mask_type)
	  && gimple_code (stmt) == GIMPLE_ASSIGN
	  && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison)
	mask_type = truth_type_for (mask_type);

and not handling it here complicated later patches.

The initial list of targets for vect_bool_cmp is deliberately conservative.


2019-11-30  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* doc/sourcebuild.texi (vect_bool_cmp): Document.
	* tree-vect-patterns.c (search_type_for_mask_1): If neither
	operand to a boolean comparison is a natural vector mask,
	handle both operands like normal integers instead.

gcc/testsuite/
	* gcc.dg/vect/vect-bool-cmp-2.c: New test.
	* lib/target-supports.exp (check_effective_target_vect_bool_cmp): New
	effective target procedure.

Index: gcc/doc/sourcebuild.texi
===================================================================
--- gcc/doc/sourcebuild.texi	2019-11-20 21:11:59.065472803 +0000
+++ gcc/doc/sourcebuild.texi	2019-11-29 09:11:21.365130870 +0000
@@ -1522,6 +1522,10 @@ Target does not support a vector add ins
 @item vect_no_bitwise
 Target does not support vector bitwise instructions.
 
+@item vect_bool_cmp
+Target supports comparison of @code{bool} vectors for at least one
+vector length.
+
 @item vect_char_add
 Target supports addition of @code{char} vectors for at least one
 vector length.
Index: gcc/tree-vect-patterns.c
===================================================================
--- gcc/tree-vect-patterns.c	2019-11-16 10:29:21.207212217 +0000
+++ gcc/tree-vect-patterns.c	2019-11-29 09:11:21.389130702 +0000
@@ -3944,7 +3944,8 @@ search_type_for_mask_1 (tree var, vec_in
 					     vinfo, cache);
 	      if (!res || (res2 && TYPE_PRECISION (res) > TYPE_PRECISION (res2)))
 		res = res2;
-	      break;
+	      if (res)
+		break;
 	    }
 
 	  comp_vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1));
Index: gcc/testsuite/gcc.dg/vect/vect-bool-cmp-2.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-bool-cmp-2.c	2019-11-29 09:11:21.373130815 +0000
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+
+void
+f (_Bool *restrict x, _Bool *restrict y)
+{
+  for (int i = 0; i < 128; ++i)
+    x[i] = x[i] == y[i];
+}
+
+/* { dg-final { scan-tree-dump "loop vectorized" "vect" { target vect_bool_cmp } } } */
Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	2019-11-26 22:11:24.494545152 +0000
+++ gcc/testsuite/lib/target-supports.exp	2019-11-29 09:11:21.373130815 +0000
@@ -5749,6 +5749,16 @@ proc check_effective_target_vect_bswap {
 	     || [istarget amdgcn-*-*] }}]
 }
 
+# Return 1 if the target supports comparison of bool vectors for at
+# least one vector length.
+
+proc check_effective_target_vect_bool_cmp { } {
+    return [check_cached_effective_target_indexed vect_bool_cmp {
+      expr { [istarget i?86-*-*] || [istarget x86_64-*-*]
+	     || [istarget aarch64*-*-*]
+	     || [is-effective-target arm_neon] }}]
+}
+
 # Return 1 if the target supports addition of char vectors for at least
 # one vector length.
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [3/5] Make vect_get_mask_type_for_stmt take a group size
  2019-11-29 10:11 [0/5] Don't defer vector type choice for bools (PR 92596) Richard Sandiford
  2019-11-29 10:13 ` [1/5] Improve tree-vect-patterns.c handling of boolean comparisons Richard Sandiford
  2019-11-29 10:14 ` [2/5] Make vectorizable_operation punt early on codes it doesn't handle Richard Sandiford
@ 2019-11-29 10:14 ` Richard Sandiford
  2019-11-29 10:30   ` Richard Biener
  2019-11-29 10:15 ` [4/5] Record the vector mask precision in stmt_vec_info Richard Sandiford
  2019-11-29 10:25 ` [5/5] Don't defer choice of vector type for bools (PR 92596) Richard Sandiford
  4 siblings, 1 reply; 11+ messages in thread
From: Richard Sandiford @ 2019-11-29 10:14 UTC (permalink / raw)
  To: gcc-patches

This patch makes vect_get_mask_type_for_stmt and
get_mask_type_for_scalar_type take a group size instead of
the SLP node, so that later patches can call it before an
SLP node has been built.


2019-11-29  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vectorizer.h (get_mask_type_for_scalar_type): Replace
	the slp_tree parameter with a group size parameter.
	(vect_get_mask_type_for_stmt): Likewise.
	* tree-vect-stmts.c (get_mask_type_for_scalar_type): Likewise.
	(vect_get_mask_type_for_stmt): Likewise.
	* tree-vect-slp.c (vect_slp_analyze_node_operations_1): Update
	call accordingly.

Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2019-11-16 10:40:08.422638677 +0000
+++ gcc/tree-vectorizer.h	2019-11-29 09:11:27.781086362 +0000
@@ -1640,7 +1640,7 @@ extern tree get_related_vectype_for_scal
 						 poly_uint64 = 0);
 extern tree get_vectype_for_scalar_type (vec_info *, tree, unsigned int = 0);
 extern tree get_vectype_for_scalar_type (vec_info *, tree, slp_tree);
-extern tree get_mask_type_for_scalar_type (vec_info *, tree, slp_tree = 0);
+extern tree get_mask_type_for_scalar_type (vec_info *, tree, unsigned int = 0);
 extern tree get_same_sized_vectype (tree, tree);
 extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
 extern bool vect_get_loop_mask_type (loop_vec_info);
@@ -1693,7 +1693,7 @@ extern gcall *vect_gen_while (tree, tree
 extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
 extern opt_result vect_get_vector_types_for_stmt (stmt_vec_info, tree *,
 						  tree *, unsigned int = 0);
-extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info, slp_tree = 0);
+extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info, unsigned int = 0);
 
 /* In tree-vect-data-refs.c.  */
 extern bool vect_can_force_dr_alignment_p (const_tree, poly_uint64);
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2019-11-29 09:11:24.553108756 +0000
+++ gcc/tree-vect-stmts.c	2019-11-29 09:11:27.781086362 +0000
@@ -11362,14 +11362,15 @@ get_vectype_for_scalar_type (vec_info *v
 
    Returns the mask type corresponding to a result of comparison
    of vectors of specified SCALAR_TYPE as supported by target.
-   NODE, if nonnull, is the SLP tree node that will use the returned
-   vector type.  */
+   If GROUP_SIZE is nonzero and we're performing BB vectorization,
+   make sure that the number of elements in the vector is no bigger
+   than GROUP_SIZE.  */
 
 tree
 get_mask_type_for_scalar_type (vec_info *vinfo, tree scalar_type,
-			       slp_tree node)
+			       unsigned int group_size)
 {
-  tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type, node);
+  tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
 
   if (!vectype)
     return NULL;
@@ -12229,11 +12230,12 @@ vect_get_vector_types_for_stmt (stmt_vec
 
 /* Try to determine the correct vector type for STMT_INFO, which is a
    statement that produces a scalar boolean result.  Return the vector
-   type on success, otherwise return NULL_TREE.  NODE, if nonnull,
-   is the SLP tree node that will use the returned vector type.  */
+   type on success, otherwise return NULL_TREE.  If GROUP_SIZE is nonzero
+   and we're performing BB vectorization, make sure that the number of
+   elements in the vector is no bigger than GROUP_SIZE.  */
 
 opt_tree
-vect_get_mask_type_for_stmt (stmt_vec_info stmt_info, slp_tree node)
+vect_get_mask_type_for_stmt (stmt_vec_info stmt_info, unsigned int group_size)
 {
   vec_info *vinfo = stmt_info->vinfo;
   gimple *stmt = stmt_info->stmt;
@@ -12245,7 +12247,8 @@ vect_get_mask_type_for_stmt (stmt_vec_in
       && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt))))
     {
       scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
-      mask_type = get_mask_type_for_scalar_type (vinfo, scalar_type, node);
+      mask_type = get_mask_type_for_scalar_type (vinfo, scalar_type,
+						 group_size);
 
       if (!mask_type)
 	return opt_tree::failure_at (stmt,
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2019-11-26 22:04:58.099362339 +0000
+++ gcc/tree-vect-slp.c	2019-11-29 09:11:27.777086392 +0000
@@ -2757,7 +2757,8 @@ vect_slp_analyze_node_operations_1 (vec_
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
   if (bb_vinfo && STMT_VINFO_VECTYPE (stmt_info) == boolean_type_node)
     {
-      tree vectype = vect_get_mask_type_for_stmt (stmt_info, node);
+      unsigned int group_size = SLP_TREE_SCALAR_STMTS (node).length ();
+      tree vectype = vect_get_mask_type_for_stmt (stmt_info, group_size);
       if (!vectype)
 	/* vect_get_mask_type_for_stmt has already explained the
 	   failure.  */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [2/5] Make vectorizable_operation punt early on codes it doesn't handle
  2019-11-29 10:11 [0/5] Don't defer vector type choice for bools (PR 92596) Richard Sandiford
  2019-11-29 10:13 ` [1/5] Improve tree-vect-patterns.c handling of boolean comparisons Richard Sandiford
@ 2019-11-29 10:14 ` Richard Sandiford
  2019-11-29 10:26   ` Richard Biener
  2019-11-29 10:14 ` [3/5] Make vect_get_mask_type_for_stmt take a group size Richard Sandiford
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Richard Sandiford @ 2019-11-29 10:14 UTC (permalink / raw)
  To: gcc-patches

vectorizable_operation returned false for codes that are handled by
vectorizable_shift, but only after it had already done a lot of work.
Checking earlier should be more efficient and avoid polluting the logs
with duplicate info.

Also, there was no such early-out for comparisons or COND_EXPRs.
Fixing that avoids a false scan-tree-dump hit with a later patch.


2019-11-29  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vect-stmts.c (vectorizable_operation): Punt early
	on codes that are handled elsewhere.

Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2019-11-29 08:28:12.015121876 +0000
+++ gcc/tree-vect-stmts.c	2019-11-29 09:11:24.553108756 +0000
@@ -5999,6 +5999,21 @@ vectorizable_operation (stmt_vec_info st
 
   orig_code = code = gimple_assign_rhs_code (stmt);
 
+  /* Shifts are handled in vectorizable_shift.  */
+  if (code == LSHIFT_EXPR
+      || code == RSHIFT_EXPR
+      || code == LROTATE_EXPR
+      || code == RROTATE_EXPR)
+   return false;
+
+  /* Comparisons are handled in vectorizable_comparison.  */
+  if (TREE_CODE_CLASS (code) == tcc_comparison)
+    return false;
+
+  /* Conditions are handled in vectorizable_condition.  */
+  if (code == COND_EXPR)
+    return false;
+
   /* For pointer addition and subtraction, we should use the normal
      plus and minus for the vector operation.  */
   if (code == POINTER_PLUS_EXPR)
@@ -6123,11 +6138,6 @@ vectorizable_operation (stmt_vec_info st
 
   gcc_assert (ncopies >= 1);
 
-  /* Shifts are handled in vectorizable_shift ().  */
-  if (code == LSHIFT_EXPR || code == RSHIFT_EXPR || code == LROTATE_EXPR
-      || code == RROTATE_EXPR)
-   return false;
-
   /* Supportable by target?  */
 
   vec_mode = TYPE_MODE (vectype);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [4/5] Record the vector mask precision in stmt_vec_info
  2019-11-29 10:11 [0/5] Don't defer vector type choice for bools (PR 92596) Richard Sandiford
                   ` (2 preceding siblings ...)
  2019-11-29 10:14 ` [3/5] Make vect_get_mask_type_for_stmt take a group size Richard Sandiford
@ 2019-11-29 10:15 ` Richard Sandiford
  2019-11-29 10:34   ` Richard Biener
  2019-11-29 10:25 ` [5/5] Don't defer choice of vector type for bools (PR 92596) Richard Sandiford
  4 siblings, 1 reply; 11+ messages in thread
From: Richard Sandiford @ 2019-11-29 10:15 UTC (permalink / raw)
  To: gcc-patches

search_type_for_mask uses a worklist to search a chain of boolean
operations for a natural vector mask type.  This patch instead does
that in vect_determine_stmt_precisions, where we also look for
overpromoted integer operations.  We then only need to compute
the precision once and can cache it in the stmt_vec_info.

The new function vect_determine_mask_precision is supposed
to handle exactly the same cases as search_type_for_mask_1,
and in the same way.  There's a lot we could improve here,
but that's not stage 3 material.

I wondered about sharing mask_precision with other fields like
operation_precision, but in the end that seemed too dangerous.
We have patterns to convert between boolean and non-boolean
operations and it would be very easy to get mixed up about
which case the fields are describing.


2019-11-29  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vectorizer.h (stmt_vec_info::mask_precision): New field.
	(vect_use_mask_type_p): New function.
	* tree-vect-patterns.c (vect_init_pattern_stmt): Copy the
	mask precision to the pattern statement.
	(append_pattern_def_seq): Add a scalar_type_for_mask parameter
	and use it to initialize the new stmt's mask precision.
	(search_type_for_mask_1): Delete.
	(search_type_for_mask): Replace with...
	(integer_type_for_mask): ...this new function.  Use the information
	cached in the stmt_vec_info.
	(vect_recog_bool_pattern): Update accordingly.
	(build_mask_conversion): Pass the scalar type associated with the
	mask type to append_pattern_def_seq.
	(vect_recog_mask_conversion_pattern): Likewise.  Call
	integer_type_for_mask instead of search_type_for_mask.
	(vect_convert_mask_for_vectype): Call integer_type_for_mask instead
	of search_type_for_mask.
	(possible_vector_mask_operation_p): New function.
	(vect_determine_mask_precision): Likewise.
	(vect_determine_stmt_precisions): Call it.

Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2019-11-29 09:11:27.781086362 +0000
+++ gcc/tree-vectorizer.h	2019-11-29 09:11:31.277062112 +0000
@@ -1089,6 +1089,23 @@ typedef struct data_reference *dr_p;
   unsigned int operation_precision;
   signop operation_sign;
 
+  /* If the statement produces a boolean result, this value describes
+     how we should choose the associated vector type.  The possible
+     values are:
+
+     - an integer precision N if we should use the vector mask type
+       associated with N-bit integers.  This is only used if all relevant
+       input booleans also want the vector mask type for N-bit integers,
+       or if we can convert them into that form by pattern-matching.
+
+     - ~0U if we considered choosing a vector mask type but decided
+       to treat the boolean as a normal integer type instead.
+
+     - 0 otherwise.  This means either that the operation isn't one that
+       could have a vector mask type (and so should have a normal vector
+       type instead) or that we simply haven't made a choice either way.  */
+  unsigned int mask_precision;
+
   /* True if this is only suitable for SLP vectorization.  */
   bool slp_vect_only_p;
 };
@@ -1245,6 +1262,15 @@ nested_in_vect_loop_p (class loop *loop,
 	  && (loop->inner == (gimple_bb (stmt_info->stmt))->loop_father));
 }
 
+/* Return true if STMT_INFO should produce a vector mask type rather than
+   a normal nonmask type.  */
+
+static inline bool
+vect_use_mask_type_p (stmt_vec_info stmt_info)
+{
+  return stmt_info->mask_precision && stmt_info->mask_precision != ~0U;
+}
+
 /* Return TRUE if a statement represented by STMT_INFO is a part of a
    pattern.  */
 
Index: gcc/tree-vect-patterns.c
===================================================================
--- gcc/tree-vect-patterns.c	2019-11-29 09:11:21.389130702 +0000
+++ gcc/tree-vect-patterns.c	2019-11-29 09:11:31.277062112 +0000
@@ -112,7 +112,12 @@ vect_init_pattern_stmt (gimple *pattern_
   STMT_VINFO_DEF_TYPE (pattern_stmt_info)
     = STMT_VINFO_DEF_TYPE (orig_stmt_info);
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
-    STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
+    {
+      gcc_assert (VECTOR_BOOLEAN_TYPE_P (vectype)
+		  == vect_use_mask_type_p (orig_stmt_info));
+      STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
+      pattern_stmt_info->mask_precision = orig_stmt_info->mask_precision;
+    }
   return pattern_stmt_info;
 }
 
@@ -131,17 +136,25 @@ vect_set_pattern_stmt (gimple *pattern_s
 
 /* Add NEW_STMT to STMT_INFO's pattern definition statements.  If VECTYPE
    is nonnull, record that NEW_STMT's vector type is VECTYPE, which might
-   be different from the vector type of the final pattern statement.  */
+   be different from the vector type of the final pattern statement.
+   If VECTYPE is a mask type, SCALAR_TYPE_FOR_MASK is the scalar type
+   from which it was derived.  */
 
 static inline void
 append_pattern_def_seq (stmt_vec_info stmt_info, gimple *new_stmt,
-			tree vectype = NULL_TREE)
+			tree vectype = NULL_TREE,
+			tree scalar_type_for_mask = NULL_TREE)
 {
+  gcc_assert (!scalar_type_for_mask
+	      == (!vectype || !VECTOR_BOOLEAN_TYPE_P (vectype)));
   vec_info *vinfo = stmt_info->vinfo;
   if (vectype)
     {
       stmt_vec_info new_stmt_info = vinfo->add_stmt (new_stmt);
       STMT_VINFO_VECTYPE (new_stmt_info) = vectype;
+      if (scalar_type_for_mask)
+	new_stmt_info->mask_precision
+	  = GET_MODE_BITSIZE (SCALAR_TYPE_MODE (scalar_type_for_mask));
     }
   gimple_seq_add_stmt_without_update (&STMT_VINFO_PATTERN_DEF_SEQ (stmt_info),
 				      new_stmt);
@@ -3886,108 +3899,22 @@ adjust_bool_stmts (hash_set <gimple *> &
   return gimple_assign_lhs (pattern_stmt);
 }
 
-/* Helper for search_type_for_mask.  */
+/* Return the proper type for converting bool VAR into
+   an integer value or NULL_TREE if no such type exists.
+   The type is chosen so that the converted value has the
+   same number of elements as VAR's vector type.  */
 
 static tree
-search_type_for_mask_1 (tree var, vec_info *vinfo,
-			hash_map<gimple *, tree> &cache)
+integer_type_for_mask (tree var, vec_info *vinfo)
 {
-  tree rhs1;
-  enum tree_code rhs_code;
-  tree res = NULL_TREE, res2;
-
   if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (var)))
     return NULL_TREE;
 
   stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
-  if (!def_stmt_info)
+  if (!def_stmt_info || !vect_use_mask_type_p (def_stmt_info))
     return NULL_TREE;
 
-  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
-  if (!def_stmt)
-    return NULL_TREE;
-
-  tree *c = cache.get (def_stmt);
-  if (c)
-    return *c;
-
-  rhs_code = gimple_assign_rhs_code (def_stmt);
-  rhs1 = gimple_assign_rhs1 (def_stmt);
-
-  switch (rhs_code)
-    {
-    case SSA_NAME:
-    case BIT_NOT_EXPR:
-    CASE_CONVERT:
-      res = search_type_for_mask_1 (rhs1, vinfo, cache);
-      break;
-
-    case BIT_AND_EXPR:
-    case BIT_IOR_EXPR:
-    case BIT_XOR_EXPR:
-      res = search_type_for_mask_1 (rhs1, vinfo, cache);
-      res2 = search_type_for_mask_1 (gimple_assign_rhs2 (def_stmt), vinfo,
-				     cache);
-      if (!res || (res2 && TYPE_PRECISION (res) > TYPE_PRECISION (res2)))
-	res = res2;
-      break;
-
-    default:
-      if (TREE_CODE_CLASS (rhs_code) == tcc_comparison)
-	{
-	  tree comp_vectype, mask_type;
-
-	  if (VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
-	    {
-	      res = search_type_for_mask_1 (rhs1, vinfo, cache);
-	      res2 = search_type_for_mask_1 (gimple_assign_rhs2 (def_stmt),
-					     vinfo, cache);
-	      if (!res || (res2 && TYPE_PRECISION (res) > TYPE_PRECISION (res2)))
-		res = res2;
-	      if (res)
-		break;
-	    }
-
-	  comp_vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1));
-	  if (comp_vectype == NULL_TREE)
-	    {
-	      res = NULL_TREE;
-	      break;
-	    }
-
-	  mask_type = get_mask_type_for_scalar_type (vinfo, TREE_TYPE (rhs1));
-	  if (!mask_type
-	      || !expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
-	    {
-	      res = NULL_TREE;
-	      break;
-	    }
-
-	  if (TREE_CODE (TREE_TYPE (rhs1)) != INTEGER_TYPE
-	      || !TYPE_UNSIGNED (TREE_TYPE (rhs1)))
-	    {
-	      scalar_mode mode = SCALAR_TYPE_MODE (TREE_TYPE (rhs1));
-	      res = build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), 1);
-	    }
-	  else
-	    res = TREE_TYPE (rhs1);
-	}
-    }
-
-  cache.put (def_stmt, res);
-  return res;
-}
-
-/* Return the proper type for converting bool VAR into
-   an integer value or NULL_TREE if no such type exists.
-   The type is chosen so that converted value has the
-   same number of elements as VAR's vector type.  */
-
-static tree
-search_type_for_mask (tree var, vec_info *vinfo)
-{
-  hash_map<gimple *, tree> cache;
-  return search_type_for_mask_1 (var, vinfo, cache);
+  return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
 }
 
 /* Function vect_recog_bool_pattern
@@ -4079,7 +4006,7 @@ vect_recog_bool_pattern (stmt_vec_info s
 	}
       else
 	{
-	  tree type = search_type_for_mask (var, vinfo);
+	  tree type = integer_type_for_mask (var, vinfo);
 	  tree cst0, cst1, tmp;
 
 	  if (!type)
@@ -4164,7 +4091,7 @@ vect_recog_bool_pattern (stmt_vec_info s
 	rhs = adjust_bool_stmts (bool_stmts, TREE_TYPE (vectype), stmt_vinfo);
       else
 	{
-	  tree type = search_type_for_mask (var, vinfo);
+	  tree type = integer_type_for_mask (var, vinfo);
 	  tree cst0, cst1, new_vectype;
 
 	  if (!type)
@@ -4219,7 +4146,7 @@ build_mask_conversion (tree mask, tree v
   masktype = truth_type_for (vectype);
   tmp = vect_recog_temp_ssa_var (TREE_TYPE (masktype), NULL);
   stmt = gimple_build_assign (tmp, CONVERT_EXPR, mask);
-  append_pattern_def_seq (stmt_vinfo, stmt, masktype);
+  append_pattern_def_seq (stmt_vinfo, stmt, masktype, TREE_TYPE (vectype));
 
   return tmp;
 }
@@ -4285,7 +4212,7 @@ vect_recog_mask_conversion_pattern (stmt
 	}
 
       tree mask_arg = gimple_call_arg (last_stmt, mask_argno);
-      tree mask_arg_type = search_type_for_mask (mask_arg, vinfo);
+      tree mask_arg_type = integer_type_for_mask (mask_arg, vinfo);
       if (!mask_arg_type)
 	return NULL;
       vectype2 = get_mask_type_for_scalar_type (vinfo, mask_arg_type);
@@ -4338,7 +4265,7 @@ vect_recog_mask_conversion_pattern (stmt
 
       if (TREE_CODE (rhs1) == SSA_NAME)
 	{
-	  rhs1_type = search_type_for_mask (rhs1, vinfo);
+	  rhs1_type = integer_type_for_mask (rhs1, vinfo);
 	  if (!rhs1_type)
 	    return NULL;
 	}
@@ -4358,7 +4285,7 @@ vect_recog_mask_conversion_pattern (stmt
 
 	     it is better for b1 and b2 to use the mask type associated
 	     with int elements rather bool (byte) elements.  */
-	  rhs1_type = search_type_for_mask (TREE_OPERAND (rhs1, 0), vinfo);
+	  rhs1_type = integer_type_for_mask (TREE_OPERAND (rhs1, 0), vinfo);
 	  if (!rhs1_type)
 	    rhs1_type = TREE_TYPE (TREE_OPERAND (rhs1, 0));
 	}
@@ -4414,7 +4341,8 @@ vect_recog_mask_conversion_pattern (stmt
 	  tmp = vect_recog_temp_ssa_var (TREE_TYPE (rhs1), NULL);
 	  pattern_stmt = gimple_build_assign (tmp, rhs1);
 	  rhs1 = tmp;
-	  append_pattern_def_seq (stmt_vinfo, pattern_stmt, vectype2);
+	  append_pattern_def_seq (stmt_vinfo, pattern_stmt, vectype2,
+				  rhs1_type);
 	}
 
       if (maybe_ne (TYPE_VECTOR_SUBPARTS (vectype1),
@@ -4447,8 +4375,8 @@ vect_recog_mask_conversion_pattern (stmt
 
   rhs2 = gimple_assign_rhs2 (last_stmt);
 
-  rhs1_type = search_type_for_mask (rhs1, vinfo);
-  rhs2_type = search_type_for_mask (rhs2, vinfo);
+  rhs1_type = integer_type_for_mask (rhs1, vinfo);
+  rhs2_type = integer_type_for_mask (rhs2, vinfo);
 
   if (!rhs1_type || !rhs2_type
       || TYPE_PRECISION (rhs1_type) == TYPE_PRECISION (rhs2_type))
@@ -4509,7 +4437,7 @@ vect_get_load_store_mask (stmt_vec_info
 vect_convert_mask_for_vectype (tree mask, tree vectype,
 			       stmt_vec_info stmt_info, vec_info *vinfo)
 {
-  tree mask_type = search_type_for_mask (mask, vinfo);
+  tree mask_type = integer_type_for_mask (mask, vinfo);
   if (mask_type)
     {
       tree mask_vectype = get_mask_type_for_scalar_type (vinfo, mask_type);
@@ -4949,6 +4877,148 @@ vect_determine_precisions_from_users (st
   vect_set_min_input_precision (stmt_info, type, min_input_precision);
 }
 
+/* Return true if the statement described by STMT_INFO sets a boolean
+   SSA_NAME and if we know how to vectorize this kind of statement using
+   vector mask types.  */
+
+static bool
+possible_vector_mask_operation_p (stmt_vec_info stmt_info)
+{
+  tree lhs = gimple_get_lhs (stmt_info->stmt);
+  if (!lhs
+      || TREE_CODE (lhs) != SSA_NAME
+      || !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (lhs)))
+    return false;
+
+  if (gassign *assign = dyn_cast <gassign *> (stmt_info->stmt))
+    {
+      tree_code rhs_code = gimple_assign_rhs_code (assign);
+      switch (rhs_code)
+	{
+	CASE_CONVERT:
+	case SSA_NAME:
+	case BIT_NOT_EXPR:
+	case BIT_IOR_EXPR:
+	case BIT_XOR_EXPR:
+	case BIT_AND_EXPR:
+	  return true;
+
+	default:
+	  return TREE_CODE_CLASS (rhs_code) == tcc_comparison;
+	}
+    }
+  return false;
+}
+
+/* If STMT_INFO sets a boolean SSA_NAME, see whether we should use
+   a vector mask type instead of a normal vector type.  Record the
+   result in STMT_INFO->mask_precision.  */
+
+static void
+vect_determine_mask_precision (stmt_vec_info stmt_info)
+{
+  vec_info *vinfo = stmt_info->vinfo;
+
+  if (!possible_vector_mask_operation_p (stmt_info)
+      || stmt_info->mask_precision)
+    return;
+
+  auto_vec<stmt_vec_info, 32> worklist;
+  worklist.quick_push (stmt_info);
+  while (!worklist.is_empty ())
+    {
+      stmt_info = worklist.last ();
+      unsigned int orig_length = worklist.length ();
+
+      /* If at least one boolean input uses a vector mask type,
+	 pick the mask type with the narrowest elements.
+
+	 ??? This is the traditional behavior.  It should always produce
+	 the smallest number of operations, but isn't necessarily the
+	 optimal choice.  For example, if we have:
+
+	   a = b & c
+
+	 where:
+
+	 - the user of a wants it to have a mask type for 16-bit elements (M16)
+	 - b also uses M16
+	 - c uses a mask type for 8-bit elements (M8)
+
+	 then picking M8 gives:
+
+	 - 1 M16->M8 pack for b
+	 - 1 M8 AND for a
+	 - 2 M8->M16 unpacks for the user of a
+
+	 whereas picking M16 would have given:
+
+	 - 2 M8->M16 unpacks for c
+	 - 2 M16 ANDs for a
+
+	 The number of operations are equal, but M16 would have given
+	 a shorter dependency chain and allowed more ILP.  */
+      unsigned int precision = ~0U;
+      gassign *assign = as_a <gassign *> (stmt_info->stmt);
+      unsigned int nops = gimple_num_ops (assign);
+      for (unsigned int i = 1; i < nops; ++i)
+	{
+	  tree rhs = gimple_op (assign, i);
+	  if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs)))
+	    continue;
+
+	  stmt_vec_info def_stmt_info = vinfo->lookup_def (rhs);
+	  if (!def_stmt_info)
+	    /* Don't let external or constant operands influence the choice.
+	       We can convert them to whichever vector type we pick.  */
+	    continue;
+
+	  if (def_stmt_info->mask_precision)
+	    {
+	      if (precision > def_stmt_info->mask_precision)
+		precision = def_stmt_info->mask_precision;
+	    }
+	  else if (possible_vector_mask_operation_p (def_stmt_info))
+	    worklist.safe_push (def_stmt_info);
+	}
+
+      /* Defer the choice if we need to visit operands first.  */
+      if (orig_length != worklist.length ())
+	continue;
+
+      /* If the statement compares two values that shouldn't use vector masks,
+	 try comparing the values as normal scalars instead.  */
+      tree_code rhs_code = gimple_assign_rhs_code (assign);
+      if (precision == ~0U
+	  && TREE_CODE_CLASS (rhs_code) == tcc_comparison)
+	{
+	  tree rhs1_type = TREE_TYPE (gimple_assign_rhs1 (assign));
+	  scalar_mode mode;
+	  tree vectype, mask_type;
+	  if (is_a <scalar_mode> (TYPE_MODE (rhs1_type), &mode)
+	      && (vectype = get_vectype_for_scalar_type (vinfo, rhs1_type))
+	      && (mask_type = get_mask_type_for_scalar_type (vinfo, rhs1_type))
+	      && expand_vec_cmp_expr_p (vectype, mask_type, rhs_code))
+	    precision = GET_MODE_BITSIZE (mode);
+	}
+
+      if (dump_enabled_p ())
+	{
+	  if (precision == ~0U)
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "using normal nonmask vectors for %G",
+			     stmt_info->stmt);
+	  else
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "using boolean precision %d for %G",
+			     precision, stmt_info->stmt);
+	}
+
+      stmt_info->mask_precision = precision;
+      worklist.pop ();
+    }
+}
+
 /* Handle vect_determine_precisions for STMT_INFO, given that we
    have already done so for the users of its result.  */
 
@@ -4961,6 +5031,7 @@ vect_determine_stmt_precisions (stmt_vec
       vect_determine_precisions_from_range (stmt_info, stmt);
       vect_determine_precisions_from_users (stmt_info, stmt);
     }
+  vect_determine_mask_precision (stmt_info);
 }
 
 /* Walk backwards through the vectorizable region to determine the

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [5/5] Don't defer choice of vector type for bools (PR 92596)
  2019-11-29 10:11 [0/5] Don't defer vector type choice for bools (PR 92596) Richard Sandiford
                   ` (3 preceding siblings ...)
  2019-11-29 10:15 ` [4/5] Record the vector mask precision in stmt_vec_info Richard Sandiford
@ 2019-11-29 10:25 ` Richard Sandiford
  2019-11-29 12:58   ` Richard Biener
  4 siblings, 1 reply; 11+ messages in thread
From: Richard Sandiford @ 2019-11-29 10:25 UTC (permalink / raw)
  To: gcc-patches

Now that stmt_vec_info records the choice between vector mask
types and normal nonmask types, we can use that information in
vect_get_vector_types_for_stmt instead of deferring the choice
of vector type till later.

vect_get_mask_type_for_stmt used to check whether the boolean inputs
to an operation:
(a) consistently used mask types or consistently used nonmask types; and
(b) agreed on the number of elements.

(b) shouldn't be a problem when (a) is met.  If the operation
consistently uses mask types, tree-vect-patterns.c will have corrected
any mismatches in mask precision.  (This is because we only use mask
types for a small well-known set of operations and tree-vect-patterns.c
knows how to handle any that could have different mask precisions.)
And if the operation consistently uses normal nonmask types, there's
no reason why booleans should need extra vector compatibility checks
compared to ordinary integers.

So the potential difficulties all seem to come from (a).  Now that
we've chosen the result type ahead of time, we also have to consider
whether the outputs and inputs consistently use mask types.

Taking each vectorizable_* routine in turn:

- vectorizable_call

    vect_get_vector_types_for_stmt only handled booleans specially
    for gassigns, so vect_get_mask_type_for_stmt never had chance to
    handle calls.  I'm not sure we support any calls that operate on
    booleans, but as things stand, a boolean result would always have
    a nonmask type.  Presumably any vector argument would also need to
    use nonmask types, unless it corresponds to internal_fn_mask_index
    (which is already a special case).

    For safety, I've added a check for mask/nonmask combinations here
    even though we didn't check this previously.

- vectorizable_simd_clone_call

    Again, vect_get_mask_type_for_stmt never had chance to handle calls.
    The result of the call will always be a nonmask type and the patch
    for PR 92710 rejects mask arguments.  So all booleans should
    consistently use nonmask types here.

- vectorizable_conversion

    The function already rejects any conversion between booleans in which
    one type isn't a mask type.

- vectorizable_operation

    This function definitely needs a consistency check, e.g. to handle
    & and | in which one operand is loaded from memory and the other is
    a comparison result.  Ideally we'd handle this via pattern stmts
    instead (like we do for the all-mask case), but that's future work.

- vectorizable_assignment

    VECT_SCALAR_BOOLEAN_TYPE_P requires single-bit precision, so the
    current code already rejects problematic cases.

- vectorizable_load

    Loads always produce nonmask types and there are no relevant inputs
    to check against.

- vectorizable_store

    vect_check_store_rhs already rejects mask/nonmask combinations
    via useless_type_conversion_p.

- vectorizable_reduction
- vectorizable_lc_phi

    PHIs always have nonmask types.  After the change above, attempts
    to combine the PHI result with a mask type would be rejected by
    vectorizable_operation.  (Again, it would be better to handle
    this using pattern stmts.)

- vectorizable_induction

    We don't generate inductions for booleans.

- vectorizable_shift

    The function already rejects boolean shifts via type_has_mode_precision_p.

- vectorizable_condition

    The function already rejects mismatches via useless_type_conversion_p.

- vectorizable_comparison

    The function already rejects comparisons between mask and nonmask types.
    The result is always a mask type.


2019-11-29  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	PR tree-optimization/92596
	* tree-vect-stmts.c (vectorizable_call): Punt on hybrid mask/nonmask
	operations.
	(vectorizable_operation): Likewise, instead of relying on
	vect_get_mask_type_for_stmt to do this.
	(vect_get_vector_types_for_stmt): Always return a vector type
	immediately, rather than deferring the choice for boolean results.
	Use a vector mask type instead of a normal vector if
	vect_use_mask_type_p.
	(vect_get_mask_type_for_stmt): Delete.
	* tree-vect-loop.c (vect_determine_vf_for_stmt_1): Remove
	mask_producers argument and special boolean_type_node handling.
	(vect_determine_vf_for_stmt): Remove mask_producers argument and
	update calls to vect_determine_vf_for_stmt_1.  Remove doubled call.
	(vect_determine_vectorization_factor): Update call accordingly.
	* tree-vect-slp.c (vect_build_slp_tree_1): Remove special
	boolean_type_node handling.
	(vect_slp_analyze_node_operations_1): Likewise.

gcc/testsuite/
	PR tree-optimization/92596
	* gcc.dg/vect/bb-slp-pr92596.c: New test.
	* gcc.dg/vect/bb-slp-43.c: Likewise.

Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2019-11-29 09:13:43.000000000 +0000
+++ gcc/tree-vect-stmts.c	2019-11-29 09:13:43.768143063 +0000
@@ -3325,6 +3325,15 @@ vectorizable_call (stmt_vec_info stmt_in
       return false;
     }
 
+  if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
+      != VECTOR_BOOLEAN_TYPE_P (vectype_in))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "mixed mask and nonmask vector types\n");
+      return false;
+    }
+
   /* FORNOW */
   nunits_in = TYPE_VECTOR_SUBPARTS (vectype_in);
   nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
@@ -6037,7 +6046,8 @@ vectorizable_operation (stmt_vec_info st
 
   /* Most operations cannot handle bit-precision types without extra
      truncations.  */
-  if (!VECTOR_BOOLEAN_TYPE_P (vectype_out)
+  bool mask_op_p = VECTOR_BOOLEAN_TYPE_P (vectype_out);
+  if (!mask_op_p
       && !type_has_mode_precision_p (TREE_TYPE (scalar_dest))
       /* Exception are bitwise binary operations.  */
       && code != BIT_IOR_EXPR
@@ -6099,10 +6109,11 @@ vectorizable_operation (stmt_vec_info st
   if (maybe_ne (nunits_out, nunits_in))
     return false;
 
+  tree vectype2 = NULL_TREE, vectype3 = NULL_TREE;
   if (op_type == binary_op || op_type == ternary_op)
     {
       op1 = gimple_assign_rhs2 (stmt);
-      if (!vect_is_simple_use (op1, vinfo, &dt[1]))
+      if (!vect_is_simple_use (op1, vinfo, &dt[1], &vectype2))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -6113,7 +6124,7 @@ vectorizable_operation (stmt_vec_info st
   if (op_type == ternary_op)
     {
       op2 = gimple_assign_rhs3 (stmt);
-      if (!vect_is_simple_use (op2, vinfo, &dt[2]))
+      if (!vect_is_simple_use (op2, vinfo, &dt[2], &vectype3))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -6138,6 +6149,21 @@ vectorizable_operation (stmt_vec_info st
 
   gcc_assert (ncopies >= 1);
 
+  /* Reject attempts to combine mask types with nonmask types, e.g. if
+     we have an AND between a (nonmask) boolean loaded from memory and
+     a (mask) boolean result of a comparison.
+
+     TODO: We could easily fix these cases up using pattern statements.  */
+  if (VECTOR_BOOLEAN_TYPE_P (vectype) != mask_op_p
+      || (vectype2 && VECTOR_BOOLEAN_TYPE_P (vectype2) != mask_op_p)
+      || (vectype3 && VECTOR_BOOLEAN_TYPE_P (vectype3) != mask_op_p))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "mixed mask and nonmask vector types\n");
+      return false;
+    }
+
   /* Supportable by target?  */
 
   vec_mode = TYPE_MODE (vectype);
@@ -12065,9 +12091,6 @@ vect_gen_while_not (gimple_seq *seq, tre
 
    - Set *STMT_VECTYPE_OUT to:
      - NULL_TREE if the statement doesn't need to be vectorized;
-     - boolean_type_node if the statement is a boolean operation whose
-       vector type can only be determined once all the other vector types
-       are known; and
      - the equivalent of STMT_VINFO_VECTYPE otherwise.
 
    - Set *NUNITS_VECTYPE_OUT to the vector type that contains the maximum
@@ -12124,11 +12147,22 @@ vect_get_vector_types_for_stmt (stmt_vec
   tree scalar_type = NULL_TREE;
   if (group_size == 0 && STMT_VINFO_VECTYPE (stmt_info))
     {
-      *stmt_vectype_out = vectype = STMT_VINFO_VECTYPE (stmt_info);
+      vectype = STMT_VINFO_VECTYPE (stmt_info);
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location,
 			 "precomputed vectype: %T\n", vectype);
     }
+  else if (vect_use_mask_type_p (stmt_info))
+    {
+      unsigned int precision = stmt_info->mask_precision;
+      scalar_type = build_nonstandard_integer_type (precision, 1);
+      vectype = get_mask_type_for_scalar_type (vinfo, scalar_type, group_size);
+      if (!vectype)
+	return opt_result::failure_at (stmt, "not vectorized: unsupported"
+				       " data-type %T\n", scalar_type);
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
+    }
   else
     {
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
@@ -12138,28 +12172,6 @@ vect_get_vector_types_for_stmt (stmt_vec
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 
-      /* Pure bool ops don't participate in number-of-units computation.
-	 For comparisons use the types being compared.  */
-      if (!STMT_VINFO_DATA_REF (stmt_info)
-	  && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type)
-	  && is_gimple_assign (stmt)
-	  && gimple_assign_rhs_code (stmt) != COND_EXPR)
-	{
-	  *stmt_vectype_out = boolean_type_node;
-
-	  tree rhs1 = gimple_assign_rhs1 (stmt);
-	  if (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison
-	      && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
-	    scalar_type = TREE_TYPE (rhs1);
-	  else
-	    {
-	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_NOTE, vect_location,
-				 "pure bool operation.\n");
-	      return opt_result::success ();
-	    }
-	}
-
       if (dump_enabled_p ())
 	{
 	  if (group_size)
@@ -12177,18 +12189,15 @@ vect_get_vector_types_for_stmt (stmt_vec
 				       " unsupported data-type %T\n",
 				       scalar_type);
 
-      if (!*stmt_vectype_out)
-	*stmt_vectype_out = vectype;
-
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
     }
+  *stmt_vectype_out = vectype;
 
   /* Don't try to compute scalar types if the stmt produces a boolean
      vector; use the existing vector type instead.  */
   tree nunits_vectype = vectype;
-  if (!VECTOR_BOOLEAN_TYPE_P (vectype)
-      && *stmt_vectype_out != boolean_type_node)
+  if (!VECTOR_BOOLEAN_TYPE_P (vectype))
     {
       /* The number of units is set according to the smallest scalar
 	 type (or the largest vector size, but we only support one
@@ -12213,9 +12222,8 @@ vect_get_vector_types_for_stmt (stmt_vec
 	}
     }
 
-  gcc_assert (*stmt_vectype_out == boolean_type_node
-	      || multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype),
-			     TYPE_VECTOR_SUBPARTS (*stmt_vectype_out)));
+  gcc_assert (multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype),
+			  TYPE_VECTOR_SUBPARTS (*stmt_vectype_out)));
 
   if (dump_enabled_p ())
     {
@@ -12227,84 +12235,3 @@ vect_get_vector_types_for_stmt (stmt_vec
   *nunits_vectype_out = nunits_vectype;
   return opt_result::success ();
 }
-
-/* Try to determine the correct vector type for STMT_INFO, which is a
-   statement that produces a scalar boolean result.  Return the vector
-   type on success, otherwise return NULL_TREE.  If GROUP_SIZE is nonzero
-   and we're performing BB vectorization, make sure that the number of
-   elements in the vector is no bigger than GROUP_SIZE.  */
-
-opt_tree
-vect_get_mask_type_for_stmt (stmt_vec_info stmt_info, unsigned int group_size)
-{
-  vec_info *vinfo = stmt_info->vinfo;
-  gimple *stmt = stmt_info->stmt;
-  tree mask_type = NULL;
-  tree vectype, scalar_type;
-
-  if (is_gimple_assign (stmt)
-      && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison
-      && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt))))
-    {
-      scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
-      mask_type = get_mask_type_for_scalar_type (vinfo, scalar_type,
-						 group_size);
-
-      if (!mask_type)
-	return opt_tree::failure_at (stmt,
-				     "not vectorized: unsupported mask\n");
-    }
-  else
-    {
-      tree rhs;
-      ssa_op_iter iter;
-      enum vect_def_type dt;
-
-      FOR_EACH_SSA_TREE_OPERAND (rhs, stmt, iter, SSA_OP_USE)
-	{
-	  if (!vect_is_simple_use (rhs, stmt_info->vinfo, &dt, &vectype))
-	    return opt_tree::failure_at (stmt,
-					 "not vectorized:can't compute mask"
-					 " type for statement, %G", stmt);
-
-	  /* No vectype probably means external definition.
-	     Allow it in case there is another operand which
-	     allows to determine mask type.  */
-	  if (!vectype)
-	    continue;
-
-	  if (!mask_type)
-	    mask_type = vectype;
-	  else if (maybe_ne (TYPE_VECTOR_SUBPARTS (mask_type),
-			     TYPE_VECTOR_SUBPARTS (vectype)))
-	    return opt_tree::failure_at (stmt,
-					 "not vectorized: different sized mask"
-					 " types in statement, %T and %T\n",
-					 mask_type, vectype);
-	  else if (VECTOR_BOOLEAN_TYPE_P (mask_type)
-		   != VECTOR_BOOLEAN_TYPE_P (vectype))
-	    return opt_tree::failure_at (stmt,
-					 "not vectorized: mixed mask and "
-					 "nonmask vector types in statement, "
-					 "%T and %T\n",
-					 mask_type, vectype);
-	}
-
-      /* We may compare boolean value loaded as vector of integers.
-	 Fix mask_type in such case.  */
-      if (mask_type
-	  && !VECTOR_BOOLEAN_TYPE_P (mask_type)
-	  && gimple_code (stmt) == GIMPLE_ASSIGN
-	  && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison)
-	mask_type = truth_type_for (mask_type);
-    }
-
-  /* No mask_type should mean loop invariant predicate.
-     This is probably a subject for optimization in if-conversion.  */
-  if (!mask_type)
-    return opt_tree::failure_at (stmt,
-				 "not vectorized: can't compute mask type "
-				 "for statement: %G", stmt);
-
-  return opt_tree::success (mask_type);
-}
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2019-11-29 09:13:43.000000000 +0000
+++ gcc/tree-vect-loop.c	2019-11-29 09:13:43.764143091 +0000
@@ -163,8 +163,7 @@ static stmt_vec_info vect_is_simple_redu
 static opt_result
 vect_determine_vf_for_stmt_1 (stmt_vec_info stmt_info,
 			      bool vectype_maybe_set_p,
-			      poly_uint64 *vf,
-			      vec<stmt_vec_info > *mask_producers)
+			      poly_uint64 *vf)
 {
   gimple *stmt = stmt_info->stmt;
 
@@ -192,8 +191,6 @@ vect_determine_vf_for_stmt_1 (stmt_vec_i
 	gcc_assert ((STMT_VINFO_DATA_REF (stmt_info)
 		     || vectype_maybe_set_p)
 		    && STMT_VINFO_VECTYPE (stmt_info) == stmt_vectype);
-      else if (stmt_vectype == boolean_type_node)
-	mask_producers->safe_push (stmt_info);
       else
 	STMT_VINFO_VECTYPE (stmt_info) = stmt_vectype;
     }
@@ -206,21 +203,17 @@ vect_determine_vf_for_stmt_1 (stmt_vec_i
 
 /* Subroutine of vect_determine_vectorization_factor.  Set the vector
    types of STMT_INFO and all attached pattern statements and update
-   the vectorization factor VF accordingly.  If some of the statements
-   produce a mask result whose vector type can only be calculated later,
-   add them to MASK_PRODUCERS.  Return true on success or false if
-   something prevented vectorization.  */
+   the vectorization factor VF accordingly.  Return true on success
+   or false if something prevented vectorization.  */
 
 static opt_result
-vect_determine_vf_for_stmt (stmt_vec_info stmt_info, poly_uint64 *vf,
-			    vec<stmt_vec_info > *mask_producers)
+vect_determine_vf_for_stmt (stmt_vec_info stmt_info, poly_uint64 *vf)
 {
   vec_info *vinfo = stmt_info->vinfo;
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location, "==> examining statement: %G",
 		     stmt_info->stmt);
-  opt_result res
-    = vect_determine_vf_for_stmt_1 (stmt_info, false, vf, mask_producers);
+  opt_result res = vect_determine_vf_for_stmt_1 (stmt_info, false, vf);
   if (!res)
     return res;
 
@@ -239,10 +232,7 @@ vect_determine_vf_for_stmt (stmt_vec_inf
 	    dump_printf_loc (MSG_NOTE, vect_location,
 			     "==> examining pattern def stmt: %G",
 			     def_stmt_info->stmt);
-	  if (!vect_determine_vf_for_stmt_1 (def_stmt_info, true,
-					     vf, mask_producers))
-	  res = vect_determine_vf_for_stmt_1 (def_stmt_info, true,
-					      vf, mask_producers);
+	  res = vect_determine_vf_for_stmt_1 (def_stmt_info, true, vf);
 	  if (!res)
 	    return res;
 	}
@@ -251,7 +241,7 @@ vect_determine_vf_for_stmt (stmt_vec_inf
 	dump_printf_loc (MSG_NOTE, vect_location,
 			 "==> examining pattern statement: %G",
 			 stmt_info->stmt);
-      res = vect_determine_vf_for_stmt_1 (stmt_info, true, vf, mask_producers);
+      res = vect_determine_vf_for_stmt_1 (stmt_info, true, vf);
       if (!res)
 	return res;
     }
@@ -296,7 +286,6 @@ vect_determine_vectorization_factor (loo
   tree vectype;
   stmt_vec_info stmt_info;
   unsigned i;
-  auto_vec<stmt_vec_info> mask_producers;
 
   DUMP_VECT_SCOPE ("vect_determine_vectorization_factor");
 
@@ -354,8 +343,7 @@ vect_determine_vectorization_factor (loo
 	{
 	  stmt_info = loop_vinfo->lookup_stmt (gsi_stmt (si));
 	  opt_result res
-	    = vect_determine_vf_for_stmt (stmt_info, &vectorization_factor,
-					  &mask_producers);
+	    = vect_determine_vf_for_stmt (stmt_info, &vectorization_factor);
 	  if (!res)
 	    return res;
         }
@@ -373,16 +361,6 @@ vect_determine_vectorization_factor (loo
     return opt_result::failure_at (vect_location,
 				   "not vectorized: unsupported data-type\n");
   LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor;
-
-  for (i = 0; i < mask_producers.length (); i++)
-    {
-      stmt_info = mask_producers[i];
-      opt_tree mask_type = vect_get_mask_type_for_stmt (stmt_info);
-      if (!mask_type)
-	return opt_result::propagate_failure (mask_type);
-      STMT_VINFO_VECTYPE (stmt_info) = mask_type;
-    }
-
   return opt_result::success ();
 }
 
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2019-11-29 09:13:43.000000000 +0000
+++ gcc/tree-vect-slp.c	2019-11-29 09:13:43.764143091 +0000
@@ -925,17 +925,6 @@ vect_build_slp_tree_1 (unsigned char *sw
 	      || rhs_code == LROTATE_EXPR
 	      || rhs_code == RROTATE_EXPR)
 	    {
-	      if (vectype == boolean_type_node)
-		{
-		  if (dump_enabled_p ())
-		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				     "Build SLP failed: shift of a"
-				     " boolean.\n");
-		  /* Fatal mismatch.  */
-		  matches[0] = false;
-		  return false;
-		}
-
 	      vec_mode = TYPE_MODE (vectype);
 
 	      /* First see if we have a vector/vector shift.  */
@@ -1157,9 +1146,8 @@ vect_build_slp_tree_1 (unsigned char *sw
   if (alt_stmt_code != ERROR_MARK
       && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference)
     {
-      if (vectype == boolean_type_node
-	  || !vect_two_operations_perm_ok_p (stmts, group_size,
-					     vectype, alt_stmt_code))
+      if (!vect_two_operations_perm_ok_p (stmts, group_size,
+					  vectype, alt_stmt_code))
 	{
 	  for (i = 0; i < group_size; ++i)
 	    if (gimple_assign_rhs_code (stmts[i]->stmt) == alt_stmt_code)
@@ -2751,25 +2739,6 @@ vect_slp_analyze_node_operations_1 (vec_
   stmt_vec_info stmt_info = SLP_TREE_SCALAR_STMTS (node)[0];
   gcc_assert (STMT_SLP_TYPE (stmt_info) != loop_vect);
 
-  /* For BB vectorization vector types are assigned here.
-     Memory accesses already got their vector type assigned
-     in vect_analyze_data_refs.  */
-  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
-  if (bb_vinfo && STMT_VINFO_VECTYPE (stmt_info) == boolean_type_node)
-    {
-      unsigned int group_size = SLP_TREE_SCALAR_STMTS (node).length ();
-      tree vectype = vect_get_mask_type_for_stmt (stmt_info, group_size);
-      if (!vectype)
-	/* vect_get_mask_type_for_stmt has already explained the
-	   failure.  */
-	return false;
-
-      stmt_vec_info sstmt_info;
-      unsigned int i;
-      FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, sstmt_info)
-	STMT_VINFO_VECTYPE (sstmt_info) = vectype;
-    }
-
   /* Calculate the number of vector statements to be created for the
      scalar stmts in this node.  For SLP reductions it is equal to the
      number of vector statements in the children (which has already been
Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr92596.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/vect/bb-slp-pr92596.c	2019-11-29 09:13:43.764143091 +0000
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+typedef struct {
+  long n[5];
+} secp256k1_fe;
+
+secp256k1_fe a;
+
+void fn1(int p1) { a.n[0] = a.n[1] = a.n[2] = p1; }
+void fn2() {
+  int b;
+  fn1(!b);
+}
Index: gcc/testsuite/gcc.dg/vect/bb-slp-43.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/vect/bb-slp-43.c	2019-11-29 09:13:43.752143175 +0000
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+
+void
+f (int *restrict x, short *restrict y)
+{
+  x[0] = x[0] == 1 & y[0] == 2;
+  x[1] = x[1] == 1 & y[1] == 2;
+  x[2] = x[2] == 1 & y[2] == 2;
+  x[3] = x[3] == 1 & y[3] == 2;
+  x[4] = x[4] == 1 & y[4] == 2;
+  x[5] = x[5] == 1 & y[5] == 2;
+  x[6] = x[6] == 1 & y[6] == 2;
+  x[7] = x[7] == 1 & y[7] == 2;
+}
+
+/* { dg-final { scan-tree-dump-not "mixed mask and nonmask" "slp2" } } */
+/* { dg-final { scan-tree-dump-not "vector operands from scalars" "slp2" { target { { vect_int && vect_bool_cmp } && { vect_unpack && vect_hw_misalign } } xfail vect_variable_length } } } */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [2/5] Make vectorizable_operation punt early on codes it doesn't handle
  2019-11-29 10:14 ` [2/5] Make vectorizable_operation punt early on codes it doesn't handle Richard Sandiford
@ 2019-11-29 10:26   ` Richard Biener
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Biener @ 2019-11-29 10:26 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Fri, Nov 29, 2019 at 11:13 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> vectorizable_operation returned false for codes that are handled by
> vectorizable_shift, but only after it had already done a lot of work.
> Checking earlier should be more efficient and avoid polluting the logs
> with duplicate info.
>
> Also, there was no such early-out for comparisons or COND_EXPRs.
> Fixing that avoids a false scan-tree-dump hit with a later patch.

OK.

>
> 2019-11-29  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-vect-stmts.c (vectorizable_operation): Punt early
>         on codes that are handled elsewhere.
>
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2019-11-29 08:28:12.015121876 +0000
> +++ gcc/tree-vect-stmts.c       2019-11-29 09:11:24.553108756 +0000
> @@ -5999,6 +5999,21 @@ vectorizable_operation (stmt_vec_info st
>
>    orig_code = code = gimple_assign_rhs_code (stmt);
>
> +  /* Shifts are handled in vectorizable_shift.  */
> +  if (code == LSHIFT_EXPR
> +      || code == RSHIFT_EXPR
> +      || code == LROTATE_EXPR
> +      || code == RROTATE_EXPR)
> +   return false;
> +
> +  /* Comparisons are handled in vectorizable_comparison.  */
> +  if (TREE_CODE_CLASS (code) == tcc_comparison)
> +    return false;
> +
> +  /* Conditions are handled in vectorizable_condition.  */
> +  if (code == COND_EXPR)
> +    return false;
> +
>    /* For pointer addition and subtraction, we should use the normal
>       plus and minus for the vector operation.  */
>    if (code == POINTER_PLUS_EXPR)
> @@ -6123,11 +6138,6 @@ vectorizable_operation (stmt_vec_info st
>
>    gcc_assert (ncopies >= 1);
>
> -  /* Shifts are handled in vectorizable_shift ().  */
> -  if (code == LSHIFT_EXPR || code == RSHIFT_EXPR || code == LROTATE_EXPR
> -      || code == RROTATE_EXPR)
> -   return false;
> -
>    /* Supportable by target?  */
>
>    vec_mode = TYPE_MODE (vectype);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [3/5] Make vect_get_mask_type_for_stmt take a group size
  2019-11-29 10:14 ` [3/5] Make vect_get_mask_type_for_stmt take a group size Richard Sandiford
@ 2019-11-29 10:30   ` Richard Biener
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Biener @ 2019-11-29 10:30 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Fri, Nov 29, 2019 at 11:14 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> This patch makes vect_get_mask_type_for_stmt and
> get_mask_type_for_scalar_type take a group size instead of
> the SLP node, so that later patches can call it before an
> SLP node has been built.

OK.

>
> 2019-11-29  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-vectorizer.h (get_mask_type_for_scalar_type): Replace
>         the slp_tree parameter with a group size parameter.
>         (vect_get_mask_type_for_stmt): Likewise.
>         * tree-vect-stmts.c (get_mask_type_for_scalar_type): Likewise.
>         (vect_get_mask_type_for_stmt): Likewise.
>         * tree-vect-slp.c (vect_slp_analyze_node_operations_1): Update
>         call accordingly.
>
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h       2019-11-16 10:40:08.422638677 +0000
> +++ gcc/tree-vectorizer.h       2019-11-29 09:11:27.781086362 +0000
> @@ -1640,7 +1640,7 @@ extern tree get_related_vectype_for_scal
>                                                  poly_uint64 = 0);
>  extern tree get_vectype_for_scalar_type (vec_info *, tree, unsigned int = 0);
>  extern tree get_vectype_for_scalar_type (vec_info *, tree, slp_tree);
> -extern tree get_mask_type_for_scalar_type (vec_info *, tree, slp_tree = 0);
> +extern tree get_mask_type_for_scalar_type (vec_info *, tree, unsigned int = 0);
>  extern tree get_same_sized_vectype (tree, tree);
>  extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
>  extern bool vect_get_loop_mask_type (loop_vec_info);
> @@ -1693,7 +1693,7 @@ extern gcall *vect_gen_while (tree, tree
>  extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
>  extern opt_result vect_get_vector_types_for_stmt (stmt_vec_info, tree *,
>                                                   tree *, unsigned int = 0);
> -extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info, slp_tree = 0);
> +extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info, unsigned int = 0);
>
>  /* In tree-vect-data-refs.c.  */
>  extern bool vect_can_force_dr_alignment_p (const_tree, poly_uint64);
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2019-11-29 09:11:24.553108756 +0000
> +++ gcc/tree-vect-stmts.c       2019-11-29 09:11:27.781086362 +0000
> @@ -11362,14 +11362,15 @@ get_vectype_for_scalar_type (vec_info *v
>
>     Returns the mask type corresponding to a result of comparison
>     of vectors of specified SCALAR_TYPE as supported by target.
> -   NODE, if nonnull, is the SLP tree node that will use the returned
> -   vector type.  */
> +   If GROUP_SIZE is nonzero and we're performing BB vectorization,
> +   make sure that the number of elements in the vector is no bigger
> +   than GROUP_SIZE.  */
>
>  tree
>  get_mask_type_for_scalar_type (vec_info *vinfo, tree scalar_type,
> -                              slp_tree node)
> +                              unsigned int group_size)
>  {
> -  tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type, node);
> +  tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
>
>    if (!vectype)
>      return NULL;
> @@ -12229,11 +12230,12 @@ vect_get_vector_types_for_stmt (stmt_vec
>
>  /* Try to determine the correct vector type for STMT_INFO, which is a
>     statement that produces a scalar boolean result.  Return the vector
> -   type on success, otherwise return NULL_TREE.  NODE, if nonnull,
> -   is the SLP tree node that will use the returned vector type.  */
> +   type on success, otherwise return NULL_TREE.  If GROUP_SIZE is nonzero
> +   and we're performing BB vectorization, make sure that the number of
> +   elements in the vector is no bigger than GROUP_SIZE.  */
>
>  opt_tree
> -vect_get_mask_type_for_stmt (stmt_vec_info stmt_info, slp_tree node)
> +vect_get_mask_type_for_stmt (stmt_vec_info stmt_info, unsigned int group_size)
>  {
>    vec_info *vinfo = stmt_info->vinfo;
>    gimple *stmt = stmt_info->stmt;
> @@ -12245,7 +12247,8 @@ vect_get_mask_type_for_stmt (stmt_vec_in
>        && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt))))
>      {
>        scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
> -      mask_type = get_mask_type_for_scalar_type (vinfo, scalar_type, node);
> +      mask_type = get_mask_type_for_scalar_type (vinfo, scalar_type,
> +                                                group_size);
>
>        if (!mask_type)
>         return opt_tree::failure_at (stmt,
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2019-11-26 22:04:58.099362339 +0000
> +++ gcc/tree-vect-slp.c 2019-11-29 09:11:27.777086392 +0000
> @@ -2757,7 +2757,8 @@ vect_slp_analyze_node_operations_1 (vec_
>    bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
>    if (bb_vinfo && STMT_VINFO_VECTYPE (stmt_info) == boolean_type_node)
>      {
> -      tree vectype = vect_get_mask_type_for_stmt (stmt_info, node);
> +      unsigned int group_size = SLP_TREE_SCALAR_STMTS (node).length ();
> +      tree vectype = vect_get_mask_type_for_stmt (stmt_info, group_size);
>        if (!vectype)
>         /* vect_get_mask_type_for_stmt has already explained the
>            failure.  */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [4/5] Record the vector mask precision in stmt_vec_info
  2019-11-29 10:15 ` [4/5] Record the vector mask precision in stmt_vec_info Richard Sandiford
@ 2019-11-29 10:34   ` Richard Biener
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Biener @ 2019-11-29 10:34 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Fri, Nov 29, 2019 at 11:14 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> search_type_for_mask uses a worklist to search a chain of boolean
> operations for a natural vector mask type.  This patch instead does
> that in vect_determine_stmt_precisions, where we also look for
> overpromoted integer operations.  We then only need to compute
> the precision once and can cache it in the stmt_vec_info.
>
> The new function vect_determine_mask_precision is supposed
> to handle exactly the same cases as search_type_for_mask_1,
> and in the same way.  There's a lot we could improve here,
> but that's not stage 3 material.
>
> I wondered about sharing mask_precision with other fields like
> operation_precision, but in the end that seemed too dangerous.
> We have patterns to convert between boolean and non-boolean
> operations and it would be very easy to get mixed up about
> which case the fields are describing.

OK.

>
> 2019-11-29  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-vectorizer.h (stmt_vec_info::mask_precision): New field.
>         (vect_use_mask_type_p): New function.
>         * tree-vect-patterns.c (vect_init_pattern_stmt): Copy the
>         mask precision to the pattern statement.
>         (append_pattern_def_seq): Add a scalar_type_for_mask parameter
>         and use it to initialize the new stmt's mask precision.
>         (search_type_for_mask_1): Delete.
>         (search_type_for_mask): Replace with...
>         (integer_type_for_mask): ...this new function.  Use the information
>         cached in the stmt_vec_info.
>         (vect_recog_bool_pattern): Update accordingly.
>         (build_mask_conversion): Pass the scalar type associated with the
>         mask type to append_pattern_def_seq.
>         (vect_recog_mask_conversion_pattern): Likewise.  Call
>         integer_type_for_mask instead of search_type_for_mask.
>         (vect_convert_mask_for_vectype): Call integer_type_for_mask instead
>         of search_type_for_mask.
>         (possible_vector_mask_operation_p): New function.
>         (vect_determine_mask_precision): Likewise.
>         (vect_determine_stmt_precisions): Call it.
>
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h       2019-11-29 09:11:27.781086362 +0000
> +++ gcc/tree-vectorizer.h       2019-11-29 09:11:31.277062112 +0000
> @@ -1089,6 +1089,23 @@ typedef struct data_reference *dr_p;
>    unsigned int operation_precision;
>    signop operation_sign;
>
> +  /* If the statement produces a boolean result, this value describes
> +     how we should choose the associated vector type.  The possible
> +     values are:
> +
> +     - an integer precision N if we should use the vector mask type
> +       associated with N-bit integers.  This is only used if all relevant
> +       input booleans also want the vector mask type for N-bit integers,
> +       or if we can convert them into that form by pattern-matching.
> +
> +     - ~0U if we considered choosing a vector mask type but decided
> +       to treat the boolean as a normal integer type instead.
> +
> +     - 0 otherwise.  This means either that the operation isn't one that
> +       could have a vector mask type (and so should have a normal vector
> +       type instead) or that we simply haven't made a choice either way.  */
> +  unsigned int mask_precision;
> +
>    /* True if this is only suitable for SLP vectorization.  */
>    bool slp_vect_only_p;
>  };
> @@ -1245,6 +1262,15 @@ nested_in_vect_loop_p (class loop *loop,
>           && (loop->inner == (gimple_bb (stmt_info->stmt))->loop_father));
>  }
>
> +/* Return true if STMT_INFO should produce a vector mask type rather than
> +   a normal nonmask type.  */
> +
> +static inline bool
> +vect_use_mask_type_p (stmt_vec_info stmt_info)
> +{
> +  return stmt_info->mask_precision && stmt_info->mask_precision != ~0U;
> +}
> +
>  /* Return TRUE if a statement represented by STMT_INFO is a part of a
>     pattern.  */
>
> Index: gcc/tree-vect-patterns.c
> ===================================================================
> --- gcc/tree-vect-patterns.c    2019-11-29 09:11:21.389130702 +0000
> +++ gcc/tree-vect-patterns.c    2019-11-29 09:11:31.277062112 +0000
> @@ -112,7 +112,12 @@ vect_init_pattern_stmt (gimple *pattern_
>    STMT_VINFO_DEF_TYPE (pattern_stmt_info)
>      = STMT_VINFO_DEF_TYPE (orig_stmt_info);
>    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
> -    STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> +    {
> +      gcc_assert (VECTOR_BOOLEAN_TYPE_P (vectype)
> +                 == vect_use_mask_type_p (orig_stmt_info));
> +      STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> +      pattern_stmt_info->mask_precision = orig_stmt_info->mask_precision;
> +    }
>    return pattern_stmt_info;
>  }
>
> @@ -131,17 +136,25 @@ vect_set_pattern_stmt (gimple *pattern_s
>
>  /* Add NEW_STMT to STMT_INFO's pattern definition statements.  If VECTYPE
>     is nonnull, record that NEW_STMT's vector type is VECTYPE, which might
> -   be different from the vector type of the final pattern statement.  */
> +   be different from the vector type of the final pattern statement.
> +   If VECTYPE is a mask type, SCALAR_TYPE_FOR_MASK is the scalar type
> +   from which it was derived.  */
>
>  static inline void
>  append_pattern_def_seq (stmt_vec_info stmt_info, gimple *new_stmt,
> -                       tree vectype = NULL_TREE)
> +                       tree vectype = NULL_TREE,
> +                       tree scalar_type_for_mask = NULL_TREE)
>  {
> +  gcc_assert (!scalar_type_for_mask
> +             == (!vectype || !VECTOR_BOOLEAN_TYPE_P (vectype)));
>    vec_info *vinfo = stmt_info->vinfo;
>    if (vectype)
>      {
>        stmt_vec_info new_stmt_info = vinfo->add_stmt (new_stmt);
>        STMT_VINFO_VECTYPE (new_stmt_info) = vectype;
> +      if (scalar_type_for_mask)
> +       new_stmt_info->mask_precision
> +         = GET_MODE_BITSIZE (SCALAR_TYPE_MODE (scalar_type_for_mask));
>      }
>    gimple_seq_add_stmt_without_update (&STMT_VINFO_PATTERN_DEF_SEQ (stmt_info),
>                                       new_stmt);
> @@ -3886,108 +3899,22 @@ adjust_bool_stmts (hash_set <gimple *> &
>    return gimple_assign_lhs (pattern_stmt);
>  }
>
> -/* Helper for search_type_for_mask.  */
> +/* Return the proper type for converting bool VAR into
> +   an integer value or NULL_TREE if no such type exists.
> +   The type is chosen so that the converted value has the
> +   same number of elements as VAR's vector type.  */
>
>  static tree
> -search_type_for_mask_1 (tree var, vec_info *vinfo,
> -                       hash_map<gimple *, tree> &cache)
> +integer_type_for_mask (tree var, vec_info *vinfo)
>  {
> -  tree rhs1;
> -  enum tree_code rhs_code;
> -  tree res = NULL_TREE, res2;
> -
>    if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (var)))
>      return NULL_TREE;
>
>    stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
> -  if (!def_stmt_info)
> +  if (!def_stmt_info || !vect_use_mask_type_p (def_stmt_info))
>      return NULL_TREE;
>
> -  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
> -  if (!def_stmt)
> -    return NULL_TREE;
> -
> -  tree *c = cache.get (def_stmt);
> -  if (c)
> -    return *c;
> -
> -  rhs_code = gimple_assign_rhs_code (def_stmt);
> -  rhs1 = gimple_assign_rhs1 (def_stmt);
> -
> -  switch (rhs_code)
> -    {
> -    case SSA_NAME:
> -    case BIT_NOT_EXPR:
> -    CASE_CONVERT:
> -      res = search_type_for_mask_1 (rhs1, vinfo, cache);
> -      break;
> -
> -    case BIT_AND_EXPR:
> -    case BIT_IOR_EXPR:
> -    case BIT_XOR_EXPR:
> -      res = search_type_for_mask_1 (rhs1, vinfo, cache);
> -      res2 = search_type_for_mask_1 (gimple_assign_rhs2 (def_stmt), vinfo,
> -                                    cache);
> -      if (!res || (res2 && TYPE_PRECISION (res) > TYPE_PRECISION (res2)))
> -       res = res2;
> -      break;
> -
> -    default:
> -      if (TREE_CODE_CLASS (rhs_code) == tcc_comparison)
> -       {
> -         tree comp_vectype, mask_type;
> -
> -         if (VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
> -           {
> -             res = search_type_for_mask_1 (rhs1, vinfo, cache);
> -             res2 = search_type_for_mask_1 (gimple_assign_rhs2 (def_stmt),
> -                                            vinfo, cache);
> -             if (!res || (res2 && TYPE_PRECISION (res) > TYPE_PRECISION (res2)))
> -               res = res2;
> -             if (res)
> -               break;
> -           }
> -
> -         comp_vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1));
> -         if (comp_vectype == NULL_TREE)
> -           {
> -             res = NULL_TREE;
> -             break;
> -           }
> -
> -         mask_type = get_mask_type_for_scalar_type (vinfo, TREE_TYPE (rhs1));
> -         if (!mask_type
> -             || !expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
> -           {
> -             res = NULL_TREE;
> -             break;
> -           }
> -
> -         if (TREE_CODE (TREE_TYPE (rhs1)) != INTEGER_TYPE
> -             || !TYPE_UNSIGNED (TREE_TYPE (rhs1)))
> -           {
> -             scalar_mode mode = SCALAR_TYPE_MODE (TREE_TYPE (rhs1));
> -             res = build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), 1);
> -           }
> -         else
> -           res = TREE_TYPE (rhs1);
> -       }
> -    }
> -
> -  cache.put (def_stmt, res);
> -  return res;
> -}
> -
> -/* Return the proper type for converting bool VAR into
> -   an integer value or NULL_TREE if no such type exists.
> -   The type is chosen so that converted value has the
> -   same number of elements as VAR's vector type.  */
> -
> -static tree
> -search_type_for_mask (tree var, vec_info *vinfo)
> -{
> -  hash_map<gimple *, tree> cache;
> -  return search_type_for_mask_1 (var, vinfo, cache);
> +  return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
>  }
>
>  /* Function vect_recog_bool_pattern
> @@ -4079,7 +4006,7 @@ vect_recog_bool_pattern (stmt_vec_info s
>         }
>        else
>         {
> -         tree type = search_type_for_mask (var, vinfo);
> +         tree type = integer_type_for_mask (var, vinfo);
>           tree cst0, cst1, tmp;
>
>           if (!type)
> @@ -4164,7 +4091,7 @@ vect_recog_bool_pattern (stmt_vec_info s
>         rhs = adjust_bool_stmts (bool_stmts, TREE_TYPE (vectype), stmt_vinfo);
>        else
>         {
> -         tree type = search_type_for_mask (var, vinfo);
> +         tree type = integer_type_for_mask (var, vinfo);
>           tree cst0, cst1, new_vectype;
>
>           if (!type)
> @@ -4219,7 +4146,7 @@ build_mask_conversion (tree mask, tree v
>    masktype = truth_type_for (vectype);
>    tmp = vect_recog_temp_ssa_var (TREE_TYPE (masktype), NULL);
>    stmt = gimple_build_assign (tmp, CONVERT_EXPR, mask);
> -  append_pattern_def_seq (stmt_vinfo, stmt, masktype);
> +  append_pattern_def_seq (stmt_vinfo, stmt, masktype, TREE_TYPE (vectype));
>
>    return tmp;
>  }
> @@ -4285,7 +4212,7 @@ vect_recog_mask_conversion_pattern (stmt
>         }
>
>        tree mask_arg = gimple_call_arg (last_stmt, mask_argno);
> -      tree mask_arg_type = search_type_for_mask (mask_arg, vinfo);
> +      tree mask_arg_type = integer_type_for_mask (mask_arg, vinfo);
>        if (!mask_arg_type)
>         return NULL;
>        vectype2 = get_mask_type_for_scalar_type (vinfo, mask_arg_type);
> @@ -4338,7 +4265,7 @@ vect_recog_mask_conversion_pattern (stmt
>
>        if (TREE_CODE (rhs1) == SSA_NAME)
>         {
> -         rhs1_type = search_type_for_mask (rhs1, vinfo);
> +         rhs1_type = integer_type_for_mask (rhs1, vinfo);
>           if (!rhs1_type)
>             return NULL;
>         }
> @@ -4358,7 +4285,7 @@ vect_recog_mask_conversion_pattern (stmt
>
>              it is better for b1 and b2 to use the mask type associated
>              with int elements rather bool (byte) elements.  */
> -         rhs1_type = search_type_for_mask (TREE_OPERAND (rhs1, 0), vinfo);
> +         rhs1_type = integer_type_for_mask (TREE_OPERAND (rhs1, 0), vinfo);
>           if (!rhs1_type)
>             rhs1_type = TREE_TYPE (TREE_OPERAND (rhs1, 0));
>         }
> @@ -4414,7 +4341,8 @@ vect_recog_mask_conversion_pattern (stmt
>           tmp = vect_recog_temp_ssa_var (TREE_TYPE (rhs1), NULL);
>           pattern_stmt = gimple_build_assign (tmp, rhs1);
>           rhs1 = tmp;
> -         append_pattern_def_seq (stmt_vinfo, pattern_stmt, vectype2);
> +         append_pattern_def_seq (stmt_vinfo, pattern_stmt, vectype2,
> +                                 rhs1_type);
>         }
>
>        if (maybe_ne (TYPE_VECTOR_SUBPARTS (vectype1),
> @@ -4447,8 +4375,8 @@ vect_recog_mask_conversion_pattern (stmt
>
>    rhs2 = gimple_assign_rhs2 (last_stmt);
>
> -  rhs1_type = search_type_for_mask (rhs1, vinfo);
> -  rhs2_type = search_type_for_mask (rhs2, vinfo);
> +  rhs1_type = integer_type_for_mask (rhs1, vinfo);
> +  rhs2_type = integer_type_for_mask (rhs2, vinfo);
>
>    if (!rhs1_type || !rhs2_type
>        || TYPE_PRECISION (rhs1_type) == TYPE_PRECISION (rhs2_type))
> @@ -4509,7 +4437,7 @@ vect_get_load_store_mask (stmt_vec_info
>  vect_convert_mask_for_vectype (tree mask, tree vectype,
>                                stmt_vec_info stmt_info, vec_info *vinfo)
>  {
> -  tree mask_type = search_type_for_mask (mask, vinfo);
> +  tree mask_type = integer_type_for_mask (mask, vinfo);
>    if (mask_type)
>      {
>        tree mask_vectype = get_mask_type_for_scalar_type (vinfo, mask_type);
> @@ -4949,6 +4877,148 @@ vect_determine_precisions_from_users (st
>    vect_set_min_input_precision (stmt_info, type, min_input_precision);
>  }
>
> +/* Return true if the statement described by STMT_INFO sets a boolean
> +   SSA_NAME and if we know how to vectorize this kind of statement using
> +   vector mask types.  */
> +
> +static bool
> +possible_vector_mask_operation_p (stmt_vec_info stmt_info)
> +{
> +  tree lhs = gimple_get_lhs (stmt_info->stmt);
> +  if (!lhs
> +      || TREE_CODE (lhs) != SSA_NAME
> +      || !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (lhs)))
> +    return false;
> +
> +  if (gassign *assign = dyn_cast <gassign *> (stmt_info->stmt))
> +    {
> +      tree_code rhs_code = gimple_assign_rhs_code (assign);
> +      switch (rhs_code)
> +       {
> +       CASE_CONVERT:
> +       case SSA_NAME:
> +       case BIT_NOT_EXPR:
> +       case BIT_IOR_EXPR:
> +       case BIT_XOR_EXPR:
> +       case BIT_AND_EXPR:
> +         return true;
> +
> +       default:
> +         return TREE_CODE_CLASS (rhs_code) == tcc_comparison;
> +       }
> +    }
> +  return false;
> +}
> +
> +/* If STMT_INFO sets a boolean SSA_NAME, see whether we should use
> +   a vector mask type instead of a normal vector type.  Record the
> +   result in STMT_INFO->mask_precision.  */
> +
> +static void
> +vect_determine_mask_precision (stmt_vec_info stmt_info)
> +{
> +  vec_info *vinfo = stmt_info->vinfo;
> +
> +  if (!possible_vector_mask_operation_p (stmt_info)
> +      || stmt_info->mask_precision)
> +    return;
> +
> +  auto_vec<stmt_vec_info, 32> worklist;
> +  worklist.quick_push (stmt_info);
> +  while (!worklist.is_empty ())
> +    {
> +      stmt_info = worklist.last ();
> +      unsigned int orig_length = worklist.length ();
> +
> +      /* If at least one boolean input uses a vector mask type,
> +        pick the mask type with the narrowest elements.
> +
> +        ??? This is the traditional behavior.  It should always produce
> +        the smallest number of operations, but isn't necessarily the
> +        optimal choice.  For example, if we have:
> +
> +          a = b & c
> +
> +        where:
> +
> +        - the user of a wants it to have a mask type for 16-bit elements (M16)
> +        - b also uses M16
> +        - c uses a mask type for 8-bit elements (M8)
> +
> +        then picking M8 gives:
> +
> +        - 1 M16->M8 pack for b
> +        - 1 M8 AND for a
> +        - 2 M8->M16 unpacks for the user of a
> +
> +        whereas picking M16 would have given:
> +
> +        - 2 M8->M16 unpacks for c
> +        - 2 M16 ANDs for a
> +
> +        The number of operations are equal, but M16 would have given
> +        a shorter dependency chain and allowed more ILP.  */
> +      unsigned int precision = ~0U;
> +      gassign *assign = as_a <gassign *> (stmt_info->stmt);
> +      unsigned int nops = gimple_num_ops (assign);
> +      for (unsigned int i = 1; i < nops; ++i)
> +       {
> +         tree rhs = gimple_op (assign, i);
> +         if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs)))
> +           continue;
> +
> +         stmt_vec_info def_stmt_info = vinfo->lookup_def (rhs);
> +         if (!def_stmt_info)
> +           /* Don't let external or constant operands influence the choice.
> +              We can convert them to whichever vector type we pick.  */
> +           continue;
> +
> +         if (def_stmt_info->mask_precision)
> +           {
> +             if (precision > def_stmt_info->mask_precision)
> +               precision = def_stmt_info->mask_precision;
> +           }
> +         else if (possible_vector_mask_operation_p (def_stmt_info))
> +           worklist.safe_push (def_stmt_info);
> +       }
> +
> +      /* Defer the choice if we need to visit operands first.  */
> +      if (orig_length != worklist.length ())
> +       continue;
> +
> +      /* If the statement compares two values that shouldn't use vector masks,
> +        try comparing the values as normal scalars instead.  */
> +      tree_code rhs_code = gimple_assign_rhs_code (assign);
> +      if (precision == ~0U
> +         && TREE_CODE_CLASS (rhs_code) == tcc_comparison)
> +       {
> +         tree rhs1_type = TREE_TYPE (gimple_assign_rhs1 (assign));
> +         scalar_mode mode;
> +         tree vectype, mask_type;
> +         if (is_a <scalar_mode> (TYPE_MODE (rhs1_type), &mode)
> +             && (vectype = get_vectype_for_scalar_type (vinfo, rhs1_type))
> +             && (mask_type = get_mask_type_for_scalar_type (vinfo, rhs1_type))
> +             && expand_vec_cmp_expr_p (vectype, mask_type, rhs_code))
> +           precision = GET_MODE_BITSIZE (mode);
> +       }
> +
> +      if (dump_enabled_p ())
> +       {
> +         if (precision == ~0U)
> +           dump_printf_loc (MSG_NOTE, vect_location,
> +                            "using normal nonmask vectors for %G",
> +                            stmt_info->stmt);
> +         else
> +           dump_printf_loc (MSG_NOTE, vect_location,
> +                            "using boolean precision %d for %G",
> +                            precision, stmt_info->stmt);
> +       }
> +
> +      stmt_info->mask_precision = precision;
> +      worklist.pop ();
> +    }
> +}
> +
>  /* Handle vect_determine_precisions for STMT_INFO, given that we
>     have already done so for the users of its result.  */
>
> @@ -4961,6 +5031,7 @@ vect_determine_stmt_precisions (stmt_vec
>        vect_determine_precisions_from_range (stmt_info, stmt);
>        vect_determine_precisions_from_users (stmt_info, stmt);
>      }
> +  vect_determine_mask_precision (stmt_info);
>  }
>
>  /* Walk backwards through the vectorizable region to determine the

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [1/5] Improve tree-vect-patterns.c handling of boolean comparisons
  2019-11-29 10:13 ` [1/5] Improve tree-vect-patterns.c handling of boolean comparisons Richard Sandiford
@ 2019-11-29 10:53   ` Richard Biener
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Biener @ 2019-11-29 10:53 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Fri, Nov 29, 2019 at 11:12 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> vect_recog_bool_pattern assumed that a comparison between two booleans
> should always become a comparison of vector mask types (implemented as an
> XOR_EXPR).  But if the booleans in question are generated as data values
> (e.g. because they're loaded directly from memory), we should treat them
> like ordinary integers instead, just as we do for boolean logic ops whose
> operands are loaded from memory.  vect_get_mask_type_for_stmt already
> handled this case:
>
>       /* We may compare boolean value loaded as vector of integers.
>          Fix mask_type in such case.  */
>       if (mask_type
>           && !VECTOR_BOOLEAN_TYPE_P (mask_type)
>           && gimple_code (stmt) == GIMPLE_ASSIGN
>           && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison)
>         mask_type = truth_type_for (mask_type);
>
> and not handling it here complicated later patches.
>
> The initial list of targets for vect_bool_cmp is deliberately conservative.

OK.

Richard.

>
> 2019-11-30  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * doc/sourcebuild.texi (vect_bool_cmp): Document.
>         * tree-vect-patterns.c (search_type_for_mask_1): If neither
>         operand to a boolean comparison is a natural vector mask,
>         handle both operands like normal integers instead.
>
> gcc/testsuite/
>         * gcc.dg/vect/vect-bool-cmp-2.c: New test.
>         * lib/target-supports.exp (check_effective_target_vect_bool_cmp): New
>         effective target procedure.
>
> Index: gcc/doc/sourcebuild.texi
> ===================================================================
> --- gcc/doc/sourcebuild.texi    2019-11-20 21:11:59.065472803 +0000
> +++ gcc/doc/sourcebuild.texi    2019-11-29 09:11:21.365130870 +0000
> @@ -1522,6 +1522,10 @@ Target does not support a vector add ins
>  @item vect_no_bitwise
>  Target does not support vector bitwise instructions.
>
> +@item vect_bool_cmp
> +Target supports comparison of @code{bool} vectors for at least one
> +vector length.
> +
>  @item vect_char_add
>  Target supports addition of @code{char} vectors for at least one
>  vector length.
> Index: gcc/tree-vect-patterns.c
> ===================================================================
> --- gcc/tree-vect-patterns.c    2019-11-16 10:29:21.207212217 +0000
> +++ gcc/tree-vect-patterns.c    2019-11-29 09:11:21.389130702 +0000
> @@ -3944,7 +3944,8 @@ search_type_for_mask_1 (tree var, vec_in
>                                              vinfo, cache);
>               if (!res || (res2 && TYPE_PRECISION (res) > TYPE_PRECISION (res2)))
>                 res = res2;
> -             break;
> +             if (res)
> +               break;
>             }
>
>           comp_vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1));
> Index: gcc/testsuite/gcc.dg/vect/vect-bool-cmp-2.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-bool-cmp-2.c 2019-11-29 09:11:21.373130815 +0000
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +
> +void
> +f (_Bool *restrict x, _Bool *restrict y)
> +{
> +  for (int i = 0; i < 128; ++i)
> +    x[i] = x[i] == y[i];
> +}
> +
> +/* { dg-final { scan-tree-dump "loop vectorized" "vect" { target vect_bool_cmp } } } */
> Index: gcc/testsuite/lib/target-supports.exp
> ===================================================================
> --- gcc/testsuite/lib/target-supports.exp       2019-11-26 22:11:24.494545152 +0000
> +++ gcc/testsuite/lib/target-supports.exp       2019-11-29 09:11:21.373130815 +0000
> @@ -5749,6 +5749,16 @@ proc check_effective_target_vect_bswap {
>              || [istarget amdgcn-*-*] }}]
>  }
>
> +# Return 1 if the target supports comparison of bool vectors for at
> +# least one vector length.
> +
> +proc check_effective_target_vect_bool_cmp { } {
> +    return [check_cached_effective_target_indexed vect_bool_cmp {
> +      expr { [istarget i?86-*-*] || [istarget x86_64-*-*]
> +            || [istarget aarch64*-*-*]
> +            || [is-effective-target arm_neon] }}]
> +}
> +
>  # Return 1 if the target supports addition of char vectors for at least
>  # one vector length.
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [5/5] Don't defer choice of vector type for bools (PR 92596)
  2019-11-29 10:25 ` [5/5] Don't defer choice of vector type for bools (PR 92596) Richard Sandiford
@ 2019-11-29 12:58   ` Richard Biener
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Biener @ 2019-11-29 12:58 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Fri, Nov 29, 2019 at 11:16 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Now that stmt_vec_info records the choice between vector mask
> types and normal nonmask types, we can use that information in
> vect_get_vector_types_for_stmt instead of deferring the choice
> of vector type till later.
>
> vect_get_mask_type_for_stmt used to check whether the boolean inputs
> to an operation:
> (a) consistently used mask types or consistently used nonmask types; and
> (b) agreed on the number of elements.
>
> (b) shouldn't be a problem when (a) is met.  If the operation
> consistently uses mask types, tree-vect-patterns.c will have corrected
> any mismatches in mask precision.  (This is because we only use mask
> types for a small well-known set of operations and tree-vect-patterns.c
> knows how to handle any that could have different mask precisions.)
> And if the operation consistently uses normal nonmask types, there's
> no reason why booleans should need extra vector compatibility checks
> compared to ordinary integers.
>
> So the potential difficulties all seem to come from (a).  Now that
> we've chosen the result type ahead of time, we also have to consider
> whether the outputs and inputs consistently use mask types.
>
> Taking each vectorizable_* routine in turn:
>
> - vectorizable_call
>
>     vect_get_vector_types_for_stmt only handled booleans specially
>     for gassigns, so vect_get_mask_type_for_stmt never had chance to
>     handle calls.  I'm not sure we support any calls that operate on
>     booleans, but as things stand, a boolean result would always have
>     a nonmask type.  Presumably any vector argument would also need to
>     use nonmask types, unless it corresponds to internal_fn_mask_index
>     (which is already a special case).
>
>     For safety, I've added a check for mask/nonmask combinations here
>     even though we didn't check this previously.
>
> - vectorizable_simd_clone_call
>
>     Again, vect_get_mask_type_for_stmt never had chance to handle calls.
>     The result of the call will always be a nonmask type and the patch
>     for PR 92710 rejects mask arguments.  So all booleans should
>     consistently use nonmask types here.
>
> - vectorizable_conversion
>
>     The function already rejects any conversion between booleans in which
>     one type isn't a mask type.
>
> - vectorizable_operation
>
>     This function definitely needs a consistency check, e.g. to handle
>     & and | in which one operand is loaded from memory and the other is
>     a comparison result.  Ideally we'd handle this via pattern stmts
>     instead (like we do for the all-mask case), but that's future work.
>
> - vectorizable_assignment
>
>     VECT_SCALAR_BOOLEAN_TYPE_P requires single-bit precision, so the
>     current code already rejects problematic cases.
>
> - vectorizable_load
>
>     Loads always produce nonmask types and there are no relevant inputs
>     to check against.
>
> - vectorizable_store
>
>     vect_check_store_rhs already rejects mask/nonmask combinations
>     via useless_type_conversion_p.
>
> - vectorizable_reduction
> - vectorizable_lc_phi
>
>     PHIs always have nonmask types.  After the change above, attempts
>     to combine the PHI result with a mask type would be rejected by
>     vectorizable_operation.  (Again, it would be better to handle
>     this using pattern stmts.)
>
> - vectorizable_induction
>
>     We don't generate inductions for booleans.
>
> - vectorizable_shift
>
>     The function already rejects boolean shifts via type_has_mode_precision_p.
>
> - vectorizable_condition
>
>     The function already rejects mismatches via useless_type_conversion_p.
>
> - vectorizable_comparison
>
>     The function already rejects comparisons between mask and nonmask types.
>     The result is always a mask type.

OK.

Thanks for cleaning this up!
Richard.

>
> 2019-11-29  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         PR tree-optimization/92596
>         * tree-vect-stmts.c (vectorizable_call): Punt on hybrid mask/nonmask
>         operations.
>         (vectorizable_operation): Likewise, instead of relying on
>         vect_get_mask_type_for_stmt to do this.
>         (vect_get_vector_types_for_stmt): Always return a vector type
>         immediately, rather than deferring the choice for boolean results.
>         Use a vector mask type instead of a normal vector if
>         vect_use_mask_type_p.
>         (vect_get_mask_type_for_stmt): Delete.
>         * tree-vect-loop.c (vect_determine_vf_for_stmt_1): Remove
>         mask_producers argument and special boolean_type_node handling.
>         (vect_determine_vf_for_stmt): Remove mask_producers argument and
>         update calls to vect_determine_vf_for_stmt_1.  Remove doubled call.
>         (vect_determine_vectorization_factor): Update call accordingly.
>         * tree-vect-slp.c (vect_build_slp_tree_1): Remove special
>         boolean_type_node handling.
>         (vect_slp_analyze_node_operations_1): Likewise.
>
> gcc/testsuite/
>         PR tree-optimization/92596
>         * gcc.dg/vect/bb-slp-pr92596.c: New test.
>         * gcc.dg/vect/bb-slp-43.c: Likewise.
>
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2019-11-29 09:13:43.000000000 +0000
> +++ gcc/tree-vect-stmts.c       2019-11-29 09:13:43.768143063 +0000
> @@ -3325,6 +3325,15 @@ vectorizable_call (stmt_vec_info stmt_in
>        return false;
>      }
>
> +  if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
> +      != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> +    {
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                        "mixed mask and nonmask vector types\n");
> +      return false;
> +    }
> +
>    /* FORNOW */
>    nunits_in = TYPE_VECTOR_SUBPARTS (vectype_in);
>    nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
> @@ -6037,7 +6046,8 @@ vectorizable_operation (stmt_vec_info st
>
>    /* Most operations cannot handle bit-precision types without extra
>       truncations.  */
> -  if (!VECTOR_BOOLEAN_TYPE_P (vectype_out)
> +  bool mask_op_p = VECTOR_BOOLEAN_TYPE_P (vectype_out);
> +  if (!mask_op_p
>        && !type_has_mode_precision_p (TREE_TYPE (scalar_dest))
>        /* Exception are bitwise binary operations.  */
>        && code != BIT_IOR_EXPR
> @@ -6099,10 +6109,11 @@ vectorizable_operation (stmt_vec_info st
>    if (maybe_ne (nunits_out, nunits_in))
>      return false;
>
> +  tree vectype2 = NULL_TREE, vectype3 = NULL_TREE;
>    if (op_type == binary_op || op_type == ternary_op)
>      {
>        op1 = gimple_assign_rhs2 (stmt);
> -      if (!vect_is_simple_use (op1, vinfo, &dt[1]))
> +      if (!vect_is_simple_use (op1, vinfo, &dt[1], &vectype2))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -6113,7 +6124,7 @@ vectorizable_operation (stmt_vec_info st
>    if (op_type == ternary_op)
>      {
>        op2 = gimple_assign_rhs3 (stmt);
> -      if (!vect_is_simple_use (op2, vinfo, &dt[2]))
> +      if (!vect_is_simple_use (op2, vinfo, &dt[2], &vectype3))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -6138,6 +6149,21 @@ vectorizable_operation (stmt_vec_info st
>
>    gcc_assert (ncopies >= 1);
>
> +  /* Reject attempts to combine mask types with nonmask types, e.g. if
> +     we have an AND between a (nonmask) boolean loaded from memory and
> +     a (mask) boolean result of a comparison.
> +
> +     TODO: We could easily fix these cases up using pattern statements.  */
> +  if (VECTOR_BOOLEAN_TYPE_P (vectype) != mask_op_p
> +      || (vectype2 && VECTOR_BOOLEAN_TYPE_P (vectype2) != mask_op_p)
> +      || (vectype3 && VECTOR_BOOLEAN_TYPE_P (vectype3) != mask_op_p))
> +    {
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                        "mixed mask and nonmask vector types\n");
> +      return false;
> +    }
> +
>    /* Supportable by target?  */
>
>    vec_mode = TYPE_MODE (vectype);
> @@ -12065,9 +12091,6 @@ vect_gen_while_not (gimple_seq *seq, tre
>
>     - Set *STMT_VECTYPE_OUT to:
>       - NULL_TREE if the statement doesn't need to be vectorized;
> -     - boolean_type_node if the statement is a boolean operation whose
> -       vector type can only be determined once all the other vector types
> -       are known; and
>       - the equivalent of STMT_VINFO_VECTYPE otherwise.
>
>     - Set *NUNITS_VECTYPE_OUT to the vector type that contains the maximum
> @@ -12124,11 +12147,22 @@ vect_get_vector_types_for_stmt (stmt_vec
>    tree scalar_type = NULL_TREE;
>    if (group_size == 0 && STMT_VINFO_VECTYPE (stmt_info))
>      {
> -      *stmt_vectype_out = vectype = STMT_VINFO_VECTYPE (stmt_info);
> +      vectype = STMT_VINFO_VECTYPE (stmt_info);
>        if (dump_enabled_p ())
>         dump_printf_loc (MSG_NOTE, vect_location,
>                          "precomputed vectype: %T\n", vectype);
>      }
> +  else if (vect_use_mask_type_p (stmt_info))
> +    {
> +      unsigned int precision = stmt_info->mask_precision;
> +      scalar_type = build_nonstandard_integer_type (precision, 1);
> +      vectype = get_mask_type_for_scalar_type (vinfo, scalar_type, group_size);
> +      if (!vectype)
> +       return opt_result::failure_at (stmt, "not vectorized: unsupported"
> +                                      " data-type %T\n", scalar_type);
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
> +    }
>    else
>      {
>        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
> @@ -12138,28 +12172,6 @@ vect_get_vector_types_for_stmt (stmt_vec
>        else
>         scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
>
> -      /* Pure bool ops don't participate in number-of-units computation.
> -        For comparisons use the types being compared.  */
> -      if (!STMT_VINFO_DATA_REF (stmt_info)
> -         && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type)
> -         && is_gimple_assign (stmt)
> -         && gimple_assign_rhs_code (stmt) != COND_EXPR)
> -       {
> -         *stmt_vectype_out = boolean_type_node;
> -
> -         tree rhs1 = gimple_assign_rhs1 (stmt);
> -         if (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison
> -             && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
> -           scalar_type = TREE_TYPE (rhs1);
> -         else
> -           {
> -             if (dump_enabled_p ())
> -               dump_printf_loc (MSG_NOTE, vect_location,
> -                                "pure bool operation.\n");
> -             return opt_result::success ();
> -           }
> -       }
> -
>        if (dump_enabled_p ())
>         {
>           if (group_size)
> @@ -12177,18 +12189,15 @@ vect_get_vector_types_for_stmt (stmt_vec
>                                        " unsupported data-type %T\n",
>                                        scalar_type);
>
> -      if (!*stmt_vectype_out)
> -       *stmt_vectype_out = vectype;
> -
>        if (dump_enabled_p ())
>         dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
>      }
> +  *stmt_vectype_out = vectype;
>
>    /* Don't try to compute scalar types if the stmt produces a boolean
>       vector; use the existing vector type instead.  */
>    tree nunits_vectype = vectype;
> -  if (!VECTOR_BOOLEAN_TYPE_P (vectype)
> -      && *stmt_vectype_out != boolean_type_node)
> +  if (!VECTOR_BOOLEAN_TYPE_P (vectype))
>      {
>        /* The number of units is set according to the smallest scalar
>          type (or the largest vector size, but we only support one
> @@ -12213,9 +12222,8 @@ vect_get_vector_types_for_stmt (stmt_vec
>         }
>      }
>
> -  gcc_assert (*stmt_vectype_out == boolean_type_node
> -             || multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype),
> -                            TYPE_VECTOR_SUBPARTS (*stmt_vectype_out)));
> +  gcc_assert (multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype),
> +                         TYPE_VECTOR_SUBPARTS (*stmt_vectype_out)));
>
>    if (dump_enabled_p ())
>      {
> @@ -12227,84 +12235,3 @@ vect_get_vector_types_for_stmt (stmt_vec
>    *nunits_vectype_out = nunits_vectype;
>    return opt_result::success ();
>  }
> -
> -/* Try to determine the correct vector type for STMT_INFO, which is a
> -   statement that produces a scalar boolean result.  Return the vector
> -   type on success, otherwise return NULL_TREE.  If GROUP_SIZE is nonzero
> -   and we're performing BB vectorization, make sure that the number of
> -   elements in the vector is no bigger than GROUP_SIZE.  */
> -
> -opt_tree
> -vect_get_mask_type_for_stmt (stmt_vec_info stmt_info, unsigned int group_size)
> -{
> -  vec_info *vinfo = stmt_info->vinfo;
> -  gimple *stmt = stmt_info->stmt;
> -  tree mask_type = NULL;
> -  tree vectype, scalar_type;
> -
> -  if (is_gimple_assign (stmt)
> -      && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison
> -      && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt))))
> -    {
> -      scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
> -      mask_type = get_mask_type_for_scalar_type (vinfo, scalar_type,
> -                                                group_size);
> -
> -      if (!mask_type)
> -       return opt_tree::failure_at (stmt,
> -                                    "not vectorized: unsupported mask\n");
> -    }
> -  else
> -    {
> -      tree rhs;
> -      ssa_op_iter iter;
> -      enum vect_def_type dt;
> -
> -      FOR_EACH_SSA_TREE_OPERAND (rhs, stmt, iter, SSA_OP_USE)
> -       {
> -         if (!vect_is_simple_use (rhs, stmt_info->vinfo, &dt, &vectype))
> -           return opt_tree::failure_at (stmt,
> -                                        "not vectorized:can't compute mask"
> -                                        " type for statement, %G", stmt);
> -
> -         /* No vectype probably means external definition.
> -            Allow it in case there is another operand which
> -            allows to determine mask type.  */
> -         if (!vectype)
> -           continue;
> -
> -         if (!mask_type)
> -           mask_type = vectype;
> -         else if (maybe_ne (TYPE_VECTOR_SUBPARTS (mask_type),
> -                            TYPE_VECTOR_SUBPARTS (vectype)))
> -           return opt_tree::failure_at (stmt,
> -                                        "not vectorized: different sized mask"
> -                                        " types in statement, %T and %T\n",
> -                                        mask_type, vectype);
> -         else if (VECTOR_BOOLEAN_TYPE_P (mask_type)
> -                  != VECTOR_BOOLEAN_TYPE_P (vectype))
> -           return opt_tree::failure_at (stmt,
> -                                        "not vectorized: mixed mask and "
> -                                        "nonmask vector types in statement, "
> -                                        "%T and %T\n",
> -                                        mask_type, vectype);
> -       }
> -
> -      /* We may compare boolean value loaded as vector of integers.
> -        Fix mask_type in such case.  */
> -      if (mask_type
> -         && !VECTOR_BOOLEAN_TYPE_P (mask_type)
> -         && gimple_code (stmt) == GIMPLE_ASSIGN
> -         && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison)
> -       mask_type = truth_type_for (mask_type);
> -    }
> -
> -  /* No mask_type should mean loop invariant predicate.
> -     This is probably a subject for optimization in if-conversion.  */
> -  if (!mask_type)
> -    return opt_tree::failure_at (stmt,
> -                                "not vectorized: can't compute mask type "
> -                                "for statement: %G", stmt);
> -
> -  return opt_tree::success (mask_type);
> -}
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c        2019-11-29 09:13:43.000000000 +0000
> +++ gcc/tree-vect-loop.c        2019-11-29 09:13:43.764143091 +0000
> @@ -163,8 +163,7 @@ static stmt_vec_info vect_is_simple_redu
>  static opt_result
>  vect_determine_vf_for_stmt_1 (stmt_vec_info stmt_info,
>                               bool vectype_maybe_set_p,
> -                             poly_uint64 *vf,
> -                             vec<stmt_vec_info > *mask_producers)
> +                             poly_uint64 *vf)
>  {
>    gimple *stmt = stmt_info->stmt;
>
> @@ -192,8 +191,6 @@ vect_determine_vf_for_stmt_1 (stmt_vec_i
>         gcc_assert ((STMT_VINFO_DATA_REF (stmt_info)
>                      || vectype_maybe_set_p)
>                     && STMT_VINFO_VECTYPE (stmt_info) == stmt_vectype);
> -      else if (stmt_vectype == boolean_type_node)
> -       mask_producers->safe_push (stmt_info);
>        else
>         STMT_VINFO_VECTYPE (stmt_info) = stmt_vectype;
>      }
> @@ -206,21 +203,17 @@ vect_determine_vf_for_stmt_1 (stmt_vec_i
>
>  /* Subroutine of vect_determine_vectorization_factor.  Set the vector
>     types of STMT_INFO and all attached pattern statements and update
> -   the vectorization factor VF accordingly.  If some of the statements
> -   produce a mask result whose vector type can only be calculated later,
> -   add them to MASK_PRODUCERS.  Return true on success or false if
> -   something prevented vectorization.  */
> +   the vectorization factor VF accordingly.  Return true on success
> +   or false if something prevented vectorization.  */
>
>  static opt_result
> -vect_determine_vf_for_stmt (stmt_vec_info stmt_info, poly_uint64 *vf,
> -                           vec<stmt_vec_info > *mask_producers)
> +vect_determine_vf_for_stmt (stmt_vec_info stmt_info, poly_uint64 *vf)
>  {
>    vec_info *vinfo = stmt_info->vinfo;
>    if (dump_enabled_p ())
>      dump_printf_loc (MSG_NOTE, vect_location, "==> examining statement: %G",
>                      stmt_info->stmt);
> -  opt_result res
> -    = vect_determine_vf_for_stmt_1 (stmt_info, false, vf, mask_producers);
> +  opt_result res = vect_determine_vf_for_stmt_1 (stmt_info, false, vf);
>    if (!res)
>      return res;
>
> @@ -239,10 +232,7 @@ vect_determine_vf_for_stmt (stmt_vec_inf
>             dump_printf_loc (MSG_NOTE, vect_location,
>                              "==> examining pattern def stmt: %G",
>                              def_stmt_info->stmt);
> -         if (!vect_determine_vf_for_stmt_1 (def_stmt_info, true,
> -                                            vf, mask_producers))
> -         res = vect_determine_vf_for_stmt_1 (def_stmt_info, true,
> -                                             vf, mask_producers);
> +         res = vect_determine_vf_for_stmt_1 (def_stmt_info, true, vf);
>           if (!res)
>             return res;
>         }
> @@ -251,7 +241,7 @@ vect_determine_vf_for_stmt (stmt_vec_inf
>         dump_printf_loc (MSG_NOTE, vect_location,
>                          "==> examining pattern statement: %G",
>                          stmt_info->stmt);
> -      res = vect_determine_vf_for_stmt_1 (stmt_info, true, vf, mask_producers);
> +      res = vect_determine_vf_for_stmt_1 (stmt_info, true, vf);
>        if (!res)
>         return res;
>      }
> @@ -296,7 +286,6 @@ vect_determine_vectorization_factor (loo
>    tree vectype;
>    stmt_vec_info stmt_info;
>    unsigned i;
> -  auto_vec<stmt_vec_info> mask_producers;
>
>    DUMP_VECT_SCOPE ("vect_determine_vectorization_factor");
>
> @@ -354,8 +343,7 @@ vect_determine_vectorization_factor (loo
>         {
>           stmt_info = loop_vinfo->lookup_stmt (gsi_stmt (si));
>           opt_result res
> -           = vect_determine_vf_for_stmt (stmt_info, &vectorization_factor,
> -                                         &mask_producers);
> +           = vect_determine_vf_for_stmt (stmt_info, &vectorization_factor);
>           if (!res)
>             return res;
>          }
> @@ -373,16 +361,6 @@ vect_determine_vectorization_factor (loo
>      return opt_result::failure_at (vect_location,
>                                    "not vectorized: unsupported data-type\n");
>    LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor;
> -
> -  for (i = 0; i < mask_producers.length (); i++)
> -    {
> -      stmt_info = mask_producers[i];
> -      opt_tree mask_type = vect_get_mask_type_for_stmt (stmt_info);
> -      if (!mask_type)
> -       return opt_result::propagate_failure (mask_type);
> -      STMT_VINFO_VECTYPE (stmt_info) = mask_type;
> -    }
> -
>    return opt_result::success ();
>  }
>
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2019-11-29 09:13:43.000000000 +0000
> +++ gcc/tree-vect-slp.c 2019-11-29 09:13:43.764143091 +0000
> @@ -925,17 +925,6 @@ vect_build_slp_tree_1 (unsigned char *sw
>               || rhs_code == LROTATE_EXPR
>               || rhs_code == RROTATE_EXPR)
>             {
> -             if (vectype == boolean_type_node)
> -               {
> -                 if (dump_enabled_p ())
> -                   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                                    "Build SLP failed: shift of a"
> -                                    " boolean.\n");
> -                 /* Fatal mismatch.  */
> -                 matches[0] = false;
> -                 return false;
> -               }
> -
>               vec_mode = TYPE_MODE (vectype);
>
>               /* First see if we have a vector/vector shift.  */
> @@ -1157,9 +1146,8 @@ vect_build_slp_tree_1 (unsigned char *sw
>    if (alt_stmt_code != ERROR_MARK
>        && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference)
>      {
> -      if (vectype == boolean_type_node
> -         || !vect_two_operations_perm_ok_p (stmts, group_size,
> -                                            vectype, alt_stmt_code))
> +      if (!vect_two_operations_perm_ok_p (stmts, group_size,
> +                                         vectype, alt_stmt_code))
>         {
>           for (i = 0; i < group_size; ++i)
>             if (gimple_assign_rhs_code (stmts[i]->stmt) == alt_stmt_code)
> @@ -2751,25 +2739,6 @@ vect_slp_analyze_node_operations_1 (vec_
>    stmt_vec_info stmt_info = SLP_TREE_SCALAR_STMTS (node)[0];
>    gcc_assert (STMT_SLP_TYPE (stmt_info) != loop_vect);
>
> -  /* For BB vectorization vector types are assigned here.
> -     Memory accesses already got their vector type assigned
> -     in vect_analyze_data_refs.  */
> -  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
> -  if (bb_vinfo && STMT_VINFO_VECTYPE (stmt_info) == boolean_type_node)
> -    {
> -      unsigned int group_size = SLP_TREE_SCALAR_STMTS (node).length ();
> -      tree vectype = vect_get_mask_type_for_stmt (stmt_info, group_size);
> -      if (!vectype)
> -       /* vect_get_mask_type_for_stmt has already explained the
> -          failure.  */
> -       return false;
> -
> -      stmt_vec_info sstmt_info;
> -      unsigned int i;
> -      FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, sstmt_info)
> -       STMT_VINFO_VECTYPE (sstmt_info) = vectype;
> -    }
> -
>    /* Calculate the number of vector statements to be created for the
>       scalar stmts in this node.  For SLP reductions it is equal to the
>       number of vector statements in the children (which has already been
> Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr92596.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.dg/vect/bb-slp-pr92596.c  2019-11-29 09:13:43.764143091 +0000
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3" } */
> +
> +typedef struct {
> +  long n[5];
> +} secp256k1_fe;
> +
> +secp256k1_fe a;
> +
> +void fn1(int p1) { a.n[0] = a.n[1] = a.n[2] = p1; }
> +void fn2() {
> +  int b;
> +  fn1(!b);
> +}
> Index: gcc/testsuite/gcc.dg/vect/bb-slp-43.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.dg/vect/bb-slp-43.c       2019-11-29 09:13:43.752143175 +0000
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +
> +void
> +f (int *restrict x, short *restrict y)
> +{
> +  x[0] = x[0] == 1 & y[0] == 2;
> +  x[1] = x[1] == 1 & y[1] == 2;
> +  x[2] = x[2] == 1 & y[2] == 2;
> +  x[3] = x[3] == 1 & y[3] == 2;
> +  x[4] = x[4] == 1 & y[4] == 2;
> +  x[5] = x[5] == 1 & y[5] == 2;
> +  x[6] = x[6] == 1 & y[6] == 2;
> +  x[7] = x[7] == 1 & y[7] == 2;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "mixed mask and nonmask" "slp2" } } */
> +/* { dg-final { scan-tree-dump-not "vector operands from scalars" "slp2" { target { { vect_int && vect_bool_cmp } && { vect_unpack && vect_hw_misalign } } xfail vect_variable_length } } } */

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-11-29 12:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-29 10:11 [0/5] Don't defer vector type choice for bools (PR 92596) Richard Sandiford
2019-11-29 10:13 ` [1/5] Improve tree-vect-patterns.c handling of boolean comparisons Richard Sandiford
2019-11-29 10:53   ` Richard Biener
2019-11-29 10:14 ` [2/5] Make vectorizable_operation punt early on codes it doesn't handle Richard Sandiford
2019-11-29 10:26   ` Richard Biener
2019-11-29 10:14 ` [3/5] Make vect_get_mask_type_for_stmt take a group size Richard Sandiford
2019-11-29 10:30   ` Richard Biener
2019-11-29 10:15 ` [4/5] Record the vector mask precision in stmt_vec_info Richard Sandiford
2019-11-29 10:34   ` Richard Biener
2019-11-29 10:25 ` [5/5] Don't defer choice of vector type for bools (PR 92596) Richard Sandiford
2019-11-29 12:58   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).