public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC][Scalar masks 1/x] Introduce GEN_MASK_EXPR.
@ 2015-08-17 16:25 Ilya Enkovich
  2015-08-18 10:47 ` Richard Biener
  2015-08-20 18:46 ` Jeff Law
  0 siblings, 2 replies; 8+ messages in thread
From: Ilya Enkovich @ 2015-08-17 16:25 UTC (permalink / raw)
  To: gcc-patches

Hi,

This patch starts a series introducing scalar masks support in the vectorizer.  It was discussed on the recent Cauldron and changes overiew is available here: https://gcc.gnu.org/wiki/cauldron2015?action=AttachFile&do=view&target=Vectorization+for+Intel+AVX-512.pdf.  Here is shortly a list of changes introduced by this series:

 - Add new tree expr to produce scalar masks in a vectorized code
 - Fix-up if-conversion to use bool predicates instead of integer masks
 - Disable some bool patterns to avoid bool to int conversion where masks can be used
 - Support bool operands in vectorization factor computation
 - Support scalar masks in MASK_LOAD, MASK_STORE and VEC_COND_EXPR by adding new optabs
 - Support vectorization for statements which are now not transformed by bool patterns
 - Add target support (hooks, optabs, expands)

This patch introduces GEN_MASK_EXPR code.  Intitially I wanted to use a comparison as an operand for it directly mapping it into AVX-512 comparison instruction.  But a feedback was to simplify new code's semantics and use it for converting vectors into scalar masks.  Therefore if we want to compare two vectors into a scalar masks we use two statements:

  vect.18_87 = vect__5.13_81 > vect__6.16_86;
  mask__ifc__23.17_88 = GEN_MASK <vect.18_87>;
 
Trying it in practice I found it producing worse code. The problem is that on target first comparison is expanded into two instructions: cmp with mask result + masked move to get a vector. GEN_MASK is then expanded into another comparison with zero vector.  Thus I get two comparisons + move instead of a single comparison and have to optimize this out on a target side (current optimizers can't handle it).  That's actually what I wanted to avoid.  For now I changed GEN_MASK_EXPR to get a vector value as an operand but didn't change expand pattern which has four opernads: two vectors to compare + cmp operator + result.  On expand I try to detect GEN_MASK uses a result of comparison and thus avoid double comparison generation.

Patch series is not actually fully finished yet.  I still have several type conversion tests not being vectorized and it wasn't widely tested.  That's what I'm working on now.

Will be glad to any comments.

Thanks,
Ilya
--
2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* expr.c (expand_expr_real_2): Support GEN_MASK_EXPR.
	* gimple-pretty-print.c (dump_unary_rhs): Likewise.
	* gimple.c (get_gimple_rhs_num_ops): Likewise.
	* optabs.c: Include gimple.h.
	(vector_compare_rtx): Add OPNO arg.
	(get_gen_mask_icode): New.
	(expand_gen_mask_expr_p): New.
	(expand_gen_mask_expr): New.
	(expand_vec_cond_expr): Adjust vector_compare_rtx call.
	* optabs.def (gen_mask_optab): New.
	(gen_masku_optab): New.
	* optabs.h (expand_gen_mask_expr_p): New.
	(expand_gen_mask_expr): New.
	* tree-cfg.c (verify_gimple_assign_unary): Support GEN_MASK_EXPR.
	* tree-inline.c (estimate_operator_cost): Likewise.
	* tree-pretty-print.c (dump_generic_node): Likewise.
	* tree-ssa-operands.c (get_expr_operands): Likewise.
	* tree.def (GEN_MASK_EXPR): New.


diff --git a/gcc/expr.c b/gcc/expr.c
index 31b4573..8af5926 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9180,6 +9180,10 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	return temp;
       }
 
+    case GEN_MASK_EXPR:
+      target = expand_gen_mask_expr (type, treeop0, target);
+      return target;
+
     case VEC_COND_EXPR:
       target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
       return target;
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 53900dd..ac25b79 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -300,6 +300,12 @@ dump_unary_rhs (pretty_printer *buffer, gassign *gs, int spc, int flags)
       pp_greater (buffer);
       break;
 
+    case GEN_MASK_EXPR:
+      pp_string (buffer, "GEN_MASK <");
+      dump_generic_node (buffer, rhs, spc, flags, false);
+      pp_greater (buffer);
+      break;
+
     default:
       if (TREE_CODE_CLASS (rhs_code) == tcc_declaration
 	  || TREE_CODE_CLASS (rhs_code) == tcc_constant
diff --git a/gcc/gimple.c b/gcc/gimple.c
index cca328a..93caf01 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -2005,7 +2005,8 @@ get_gimple_rhs_num_ops (enum tree_code code)
    : ((SYM) == TRUTH_AND_EXPR						    \
       || (SYM) == TRUTH_OR_EXPR						    \
       || (SYM) == TRUTH_XOR_EXPR) ? GIMPLE_BINARY_RHS			    \
-   : (SYM) == TRUTH_NOT_EXPR ? GIMPLE_UNARY_RHS				    \
+   : ((SYM) == TRUTH_NOT_EXPR						    \
+      || (SYM) == GEN_MASK_EXPR) ? GIMPLE_UNARY_RHS			    \
    : ((SYM) == COND_EXPR						    \
       || (SYM) == WIDEN_MULT_PLUS_EXPR					    \
       || (SYM) == WIDEN_MULT_MINUS_EXPR					    \
diff --git a/gcc/optabs.c b/gcc/optabs.c
index a6ca706..bf466ca 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "recog.h"
 #include "reload.h"
 #include "target.h"
+#include "gimple.h"
 
 struct target_optabs default_target_optabs;
 struct target_libfuncs default_target_libfuncs;
@@ -6488,11 +6489,13 @@ get_rtx_code (enum tree_code tcode, bool unsignedp)
 }
 
 /* Return comparison rtx for COND. Use UNSIGNEDP to select signed or
-   unsigned operators. Do not generate compare instruction.  */
+   unsigned operators.  OPNO holds an index of the first comparison
+   operand in insn with code ICODE.  Do not generate compare instruction.  */
 
 static rtx
 vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
-		    bool unsignedp, enum insn_code icode)
+		    bool unsignedp, enum insn_code icode,
+		    unsigned int opno)
 {
   struct expand_operand ops[2];
   rtx rtx_op0, rtx_op1;
@@ -6518,7 +6521,7 @@ vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
 
   create_input_operand (&ops[0], rtx_op0, m0);
   create_input_operand (&ops[1], rtx_op1, m1);
-  if (!maybe_legitimize_operands (icode, 4, 2, ops))
+  if (!maybe_legitimize_operands (icode, opno, 2, ops))
     gcc_unreachable ();
   return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
 }
@@ -6789,6 +6792,77 @@ expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
   return tmp;
 }
 
+/* Return insn code for a comparison operator with MODE,
+   unsigned if UNS is true.  */
+
+static inline enum insn_code
+get_gen_mask_icode (machine_mode mode, bool uns)
+{
+  optab tab = uns ? gen_masku_optab : gen_mask_optab;
+  return optab_handler (tab, mode);
+}
+
+/* Return TRUE if appropriate vector insn is available
+   for vector comparison expr with vector type VALUE_TYPE.  */
+
+bool
+expand_gen_mask_expr_p (tree value_type)
+{
+  enum insn_code icode = get_gen_mask_icode (TYPE_MODE (value_type),
+					     TYPE_UNSIGNED (value_type));
+  return (icode != CODE_FOR_nothing);
+}
+
+/* Generate insns for a GEN_MASK_EXPR, given its TYPE and operand.  */
+
+rtx
+expand_gen_mask_expr (tree type, tree op0, rtx target)
+{
+  struct expand_operand ops[4];
+  enum insn_code icode;
+  rtx comparison;
+  machine_mode mode = TYPE_MODE (type);
+  machine_mode cmp_op_mode;
+  bool unsignedp;
+  tree op0a, op0b;
+  enum tree_code tcode;
+  gimple def_stmt;
+
+  /* Avoid double comparison.  */
+  if (TREE_CODE (op0) == SSA_NAME
+      && (def_stmt = SSA_NAME_DEF_STMT (op0))
+      && is_gimple_assign (def_stmt)
+      && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt)) == tcc_comparison)
+    {
+      op0a = gimple_assign_rhs1 (def_stmt);
+      op0b = gimple_assign_rhs2 (def_stmt);
+      tcode = gimple_assign_rhs_code (def_stmt);
+    }
+  else
+    {
+      op0a = op0;
+      op0b = build_zero_cst (TREE_TYPE (op0));
+      tcode = NE_EXPR;
+    }
+
+  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
+  cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
+
+  gcc_assert (GET_MODE_BITSIZE (mode) >= GET_MODE_NUNITS (cmp_op_mode));
+
+  icode = get_gen_mask_icode (cmp_op_mode, unsignedp);
+  if (icode == CODE_FOR_nothing)
+    return 0;
+
+  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 2);
+  create_output_operand (&ops[0], target, mode);
+  create_fixed_operand (&ops[1], comparison);
+  create_fixed_operand (&ops[2], XEXP (comparison, 0));
+  create_fixed_operand (&ops[3], XEXP (comparison, 1));
+  expand_insn (icode, 4, ops);
+  return ops[0].value;
+}
+
 /* Return insn code for a conditional operator with a comparison in
    mode CMODE, unsigned if UNS is true, resulting in a value of mode VMODE.  */
 
@@ -6861,7 +6935,7 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
   if (icode == CODE_FOR_nothing)
     return 0;
 
-  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode);
+  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 4);
   rtx_op1 = expand_normal (op1);
   rtx_op2 = expand_normal (op2);
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 888b21c..dc664d1 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -266,6 +266,8 @@ OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
 OPTAB_D (maskload_optab, "maskload$a")
 OPTAB_D (maskstore_optab, "maskstore$a")
+OPTAB_D (gen_mask_optab, "gen_mask$a")
+OPTAB_D (gen_masku_optab, "gen_masku$a")
 OPTAB_D (vec_extract_optab, "vec_extract$a")
 OPTAB_D (vec_init_optab, "vec_init$a")
 OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 95f5cbc..d7b7fb0 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -495,6 +495,12 @@ extern bool can_vec_perm_p (machine_mode, bool, const unsigned char *);
 /* Generate code for VEC_PERM_EXPR.  */
 extern rtx expand_vec_perm (machine_mode, rtx, rtx, rtx, rtx);
 
+/* Return tree if target supports vector operations for comparison.  */
+extern bool expand_gen_mask_expr_p (tree);
+
+/* Generate code for GEN_MASK_EXPR.  */
+extern rtx expand_gen_mask_expr (tree, tree, rtx);
+
 /* Return tree if target supports vector operations for COND_EXPR.  */
 bool expand_vec_cond_expr_p (tree, tree);
 
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 588ab69..5bae411 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3646,6 +3646,16 @@ verify_gimple_assign_unary (gassign *stmt)
     case CONJ_EXPR:
       break;
 
+    case GEN_MASK_EXPR:
+	if (!INTEGRAL_TYPE_P (lhs_type) || !VECTOR_TYPE_P (rhs1_type))
+	  {
+	    error ("invalid types in gen_mask");
+	    debug_generic_expr (lhs_type);
+	    debug_generic_expr (rhs1_type);
+	    return true;
+	  }
+	return false;
+
     default:
       gcc_unreachable ();
     }
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index e1ceea4..052c055 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -3850,6 +3850,7 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights,
     case COMPLEX_EXPR:
     case PAREN_EXPR:
     case VIEW_CONVERT_EXPR:
+    case GEN_MASK_EXPR:
       return 0;
 
     /* Assign cost of 1 to usual operations.
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 7cd1fe7..5b6cdf0 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -2485,6 +2485,12 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, int flags,
       pp_greater (pp);
       break;
 
+    case GEN_MASK_EXPR:
+      pp_string (pp, " GEN_MASK_EXPR < ");
+      dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (pp, " > ");
+      break;
+
     case VEC_COND_EXPR:
       pp_string (pp, " VEC_COND_EXPR < ");
       dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
diff --git a/gcc/tree-ssa-operands.c b/gcc/tree-ssa-operands.c
index b1e3f99..7151ad1 100644
--- a/gcc/tree-ssa-operands.c
+++ b/gcc/tree-ssa-operands.c
@@ -837,6 +837,7 @@ get_expr_operands (struct function *fn, gimple stmt, tree *expr_p, int flags)
 	gimple_set_has_volatile_ops (stmt, true);
       /* FALLTHRU */
 
+    case GEN_MASK_EXPR:
     case VIEW_CONVERT_EXPR:
     do_unary:
       get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 0), flags);
diff --git a/gcc/tree.def b/gcc/tree.def
index 56580af..42984ca 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -536,6 +536,10 @@ DEFTREECODE (TARGET_EXPR, "target_expr", tcc_expression, 4)
    1 and 2 are NULL.  The operands are then taken from the cfg edges. */
 DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3)
 
+/* Generate a scalar bitmask for a vector value.  Mask gets bits set
+   for nonzero vector elements.  */
+DEFTREECODE (GEN_MASK_EXPR, "gen_mask_expr", tcc_expression, 1)
+
 /* Vector conditional expression. It is like COND_EXPR, but with
    vector operands.
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][Scalar masks 1/x] Introduce GEN_MASK_EXPR.
  2015-08-17 16:25 [RFC][Scalar masks 1/x] Introduce GEN_MASK_EXPR Ilya Enkovich
@ 2015-08-18 10:47 ` Richard Biener
  2015-08-20 18:46 ` Jeff Law
  1 sibling, 0 replies; 8+ messages in thread
From: Richard Biener @ 2015-08-18 10:47 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: GCC Patches

On Mon, Aug 17, 2015 at 6:22 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> Hi,
>
> This patch starts a series introducing scalar masks support in the vectorizer.  It was discussed on the recent Cauldron and changes overiew is available here: https://gcc.gnu.org/wiki/cauldron2015?action=AttachFile&do=view&target=Vectorization+for+Intel+AVX-512.pdf.  Here is shortly a list of changes introduced by this series:
>
>  - Add new tree expr to produce scalar masks in a vectorized code
>  - Fix-up if-conversion to use bool predicates instead of integer masks
>  - Disable some bool patterns to avoid bool to int conversion where masks can be used
>  - Support bool operands in vectorization factor computation
>  - Support scalar masks in MASK_LOAD, MASK_STORE and VEC_COND_EXPR by adding new optabs
>  - Support vectorization for statements which are now not transformed by bool patterns
>  - Add target support (hooks, optabs, expands)
>
> This patch introduces GEN_MASK_EXPR code.  Intitially I wanted to use a comparison as an operand for it directly mapping it into AVX-512 comparison instruction.  But a feedback was to simplify new code's semantics and use it for converting vectors into scalar masks.  Therefore if we want to compare two vectors into a scalar masks we use two statements:
>
>   vect.18_87 = vect__5.13_81 > vect__6.16_86;
>   mask__ifc__23.17_88 = GEN_MASK <vect.18_87>;
>
> Trying it in practice I found it producing worse code. The problem is that on target first comparison is expanded into two instructions: cmp with mask result + masked move to get a vector. GEN_MASK is then expanded into another comparison with zero vector.  Thus I get two comparisons + move instead of a single comparison and have to optimize this out on a target side (current optimizers can't handle it).  That's actually what I wanted to avoid.  For now I changed GEN_MASK_EXPR to get a vector value as an operand but didn't change expand pattern which has four opernads: two vectors to compare + cmp operator + result.  On expand I try to detect GEN_MASK uses a result of comparison and thus avoid double comparison generation.
>
> Patch series is not actually fully finished yet.  I still have several type conversion tests not being vectorized and it wasn't widely tested.  That's what I'm working on now.
>
> Will be glad to any comments.

You need to expand the documentation of GEN_MASK_EXPR to say something
about which bits correspond
to which vector elements.  You also should add constant-folding of
GEN_MASK_EXPR to const_unop in fold-const.c
(which would also document the above).

I agree that you need to combine the two GIMPLE stmts at expand time.
How do intrinsics expose
those masks btw?

One other idea that crossed my mind is that we might want to support
vector<bool> and map that to
an integer mode thus effectively allowing vect1 > vect2 generating a
vector<bool> directly.  But I'm not
sure what kind of issues you run into with that approach.

> Thanks,
> Ilya
> --
> 2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>
>
>         * expr.c (expand_expr_real_2): Support GEN_MASK_EXPR.
>         * gimple-pretty-print.c (dump_unary_rhs): Likewise.
>         * gimple.c (get_gimple_rhs_num_ops): Likewise.
>         * optabs.c: Include gimple.h.
>         (vector_compare_rtx): Add OPNO arg.
>         (get_gen_mask_icode): New.
>         (expand_gen_mask_expr_p): New.
>         (expand_gen_mask_expr): New.
>         (expand_vec_cond_expr): Adjust vector_compare_rtx call.
>         * optabs.def (gen_mask_optab): New.
>         (gen_masku_optab): New.
>         * optabs.h (expand_gen_mask_expr_p): New.
>         (expand_gen_mask_expr): New.
>         * tree-cfg.c (verify_gimple_assign_unary): Support GEN_MASK_EXPR.
>         * tree-inline.c (estimate_operator_cost): Likewise.
>         * tree-pretty-print.c (dump_generic_node): Likewise.
>         * tree-ssa-operands.c (get_expr_operands): Likewise.
>         * tree.def (GEN_MASK_EXPR): New.
>
>
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 31b4573..8af5926 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -9180,6 +9180,10 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
>         return temp;
>        }
>
> +    case GEN_MASK_EXPR:
> +      target = expand_gen_mask_expr (type, treeop0, target);
> +      return target;
> +
>      case VEC_COND_EXPR:
>        target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
>        return target;
> diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
> index 53900dd..ac25b79 100644
> --- a/gcc/gimple-pretty-print.c
> +++ b/gcc/gimple-pretty-print.c
> @@ -300,6 +300,12 @@ dump_unary_rhs (pretty_printer *buffer, gassign *gs, int spc, int flags)
>        pp_greater (buffer);
>        break;
>
> +    case GEN_MASK_EXPR:
> +      pp_string (buffer, "GEN_MASK <");
> +      dump_generic_node (buffer, rhs, spc, flags, false);
> +      pp_greater (buffer);
> +      break;
> +
>      default:
>        if (TREE_CODE_CLASS (rhs_code) == tcc_declaration
>           || TREE_CODE_CLASS (rhs_code) == tcc_constant
> diff --git a/gcc/gimple.c b/gcc/gimple.c
> index cca328a..93caf01 100644
> --- a/gcc/gimple.c
> +++ b/gcc/gimple.c
> @@ -2005,7 +2005,8 @@ get_gimple_rhs_num_ops (enum tree_code code)
>     : ((SYM) == TRUTH_AND_EXPR                                              \
>        || (SYM) == TRUTH_OR_EXPR                                                    \
>        || (SYM) == TRUTH_XOR_EXPR) ? GIMPLE_BINARY_RHS                      \
> -   : (SYM) == TRUTH_NOT_EXPR ? GIMPLE_UNARY_RHS                                    \
> +   : ((SYM) == TRUTH_NOT_EXPR                                              \
> +      || (SYM) == GEN_MASK_EXPR) ? GIMPLE_UNARY_RHS                        \
>     : ((SYM) == COND_EXPR                                                   \
>        || (SYM) == WIDEN_MULT_PLUS_EXPR                                     \
>        || (SYM) == WIDEN_MULT_MINUS_EXPR                                            \
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index a6ca706..bf466ca 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "recog.h"
>  #include "reload.h"
>  #include "target.h"
> +#include "gimple.h"
>
>  struct target_optabs default_target_optabs;
>  struct target_libfuncs default_target_libfuncs;
> @@ -6488,11 +6489,13 @@ get_rtx_code (enum tree_code tcode, bool unsignedp)
>  }
>
>  /* Return comparison rtx for COND. Use UNSIGNEDP to select signed or
> -   unsigned operators. Do not generate compare instruction.  */
> +   unsigned operators.  OPNO holds an index of the first comparison
> +   operand in insn with code ICODE.  Do not generate compare instruction.  */
>
>  static rtx
>  vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
> -                   bool unsignedp, enum insn_code icode)
> +                   bool unsignedp, enum insn_code icode,
> +                   unsigned int opno)
>  {
>    struct expand_operand ops[2];
>    rtx rtx_op0, rtx_op1;
> @@ -6518,7 +6521,7 @@ vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
>
>    create_input_operand (&ops[0], rtx_op0, m0);
>    create_input_operand (&ops[1], rtx_op1, m1);
> -  if (!maybe_legitimize_operands (icode, 4, 2, ops))
> +  if (!maybe_legitimize_operands (icode, opno, 2, ops))
>      gcc_unreachable ();
>    return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
>  }
> @@ -6789,6 +6792,77 @@ expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
>    return tmp;
>  }
>
> +/* Return insn code for a comparison operator with MODE,
> +   unsigned if UNS is true.  */
> +
> +static inline enum insn_code
> +get_gen_mask_icode (machine_mode mode, bool uns)
> +{
> +  optab tab = uns ? gen_masku_optab : gen_mask_optab;

Aww, ok - now you need two optabs of course.  I suppose combine
wasn't able to combine the two compares?

> +  return optab_handler (tab, mode);
> +}
> +
> +/* Return TRUE if appropriate vector insn is available
> +   for vector comparison expr with vector type VALUE_TYPE.  */
> +
> +bool
> +expand_gen_mask_expr_p (tree value_type)
> +{
> +  enum insn_code icode = get_gen_mask_icode (TYPE_MODE (value_type),
> +                                            TYPE_UNSIGNED (value_type));
> +  return (icode != CODE_FOR_nothing);
> +}
> +
> +/* Generate insns for a GEN_MASK_EXPR, given its TYPE and operand.  */
> +
> +rtx
> +expand_gen_mask_expr (tree type, tree op0, rtx target)
> +{
> +  struct expand_operand ops[4];
> +  enum insn_code icode;
> +  rtx comparison;
> +  machine_mode mode = TYPE_MODE (type);
> +  machine_mode cmp_op_mode;
> +  bool unsignedp;
> +  tree op0a, op0b;
> +  enum tree_code tcode;
> +  gimple def_stmt;
> +
> +  /* Avoid double comparison.  */
> +  if (TREE_CODE (op0) == SSA_NAME
> +      && (def_stmt = SSA_NAME_DEF_STMT (op0))
> +      && is_gimple_assign (def_stmt)
> +      && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt)) == tcc_comparison)

You need to use get_gimple_for_ssa_name (or related functions) here, you can't
simply re-use the def stmts operands because of eventual coalescing
that happened
making their ops no longer valid.

> +    {
> +      op0a = gimple_assign_rhs1 (def_stmt);
> +      op0b = gimple_assign_rhs2 (def_stmt);
> +      tcode = gimple_assign_rhs_code (def_stmt);
> +    }
> +  else
> +    {
> +      op0a = op0;
> +      op0b = build_zero_cst (TREE_TYPE (op0));
> +      tcode = NE_EXPR;
> +    }
> +
> +  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
> +  cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
> +
> +  gcc_assert (GET_MODE_BITSIZE (mode) >= GET_MODE_NUNITS (cmp_op_mode));
> +
> +  icode = get_gen_mask_icode (cmp_op_mode, unsignedp);
> +  if (icode == CODE_FOR_nothing)
> +    return 0;
> +
> +  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 2);
> +  create_output_operand (&ops[0], target, mode);
> +  create_fixed_operand (&ops[1], comparison);
> +  create_fixed_operand (&ops[2], XEXP (comparison, 0));
> +  create_fixed_operand (&ops[3], XEXP (comparison, 1));
> +  expand_insn (icode, 4, ops);
> +  return ops[0].value;
> +}
> +
>  /* Return insn code for a conditional operator with a comparison in
>     mode CMODE, unsigned if UNS is true, resulting in a value of mode VMODE.  */
>
> @@ -6861,7 +6935,7 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
>    if (icode == CODE_FOR_nothing)
>      return 0;
>
> -  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode);
> +  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 4);
>    rtx_op1 = expand_normal (op1);
>    rtx_op2 = expand_normal (op2);
>
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 888b21c..dc664d1 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -266,6 +266,8 @@ OPTAB_D (usad_optab, "usad$I$a")
>  OPTAB_D (ssad_optab, "ssad$I$a")
>  OPTAB_D (maskload_optab, "maskload$a")
>  OPTAB_D (maskstore_optab, "maskstore$a")
> +OPTAB_D (gen_mask_optab, "gen_mask$a")
> +OPTAB_D (gen_masku_optab, "gen_masku$a")
>  OPTAB_D (vec_extract_optab, "vec_extract$a")
>  OPTAB_D (vec_init_optab, "vec_init$a")
>  OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
> diff --git a/gcc/optabs.h b/gcc/optabs.h
> index 95f5cbc..d7b7fb0 100644
> --- a/gcc/optabs.h
> +++ b/gcc/optabs.h
> @@ -495,6 +495,12 @@ extern bool can_vec_perm_p (machine_mode, bool, const unsigned char *);
>  /* Generate code for VEC_PERM_EXPR.  */
>  extern rtx expand_vec_perm (machine_mode, rtx, rtx, rtx, rtx);
>
> +/* Return tree if target supports vector operations for comparison.  */
> +extern bool expand_gen_mask_expr_p (tree);
> +
> +/* Generate code for GEN_MASK_EXPR.  */
> +extern rtx expand_gen_mask_expr (tree, tree, rtx);
> +
>  /* Return tree if target supports vector operations for COND_EXPR.  */
>  bool expand_vec_cond_expr_p (tree, tree);
>
> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index 588ab69..5bae411 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -3646,6 +3646,16 @@ verify_gimple_assign_unary (gassign *stmt)
>      case CONJ_EXPR:
>        break;
>
> +    case GEN_MASK_EXPR:
> +       if (!INTEGRAL_TYPE_P (lhs_type) || !VECTOR_TYPE_P (rhs1_type))
> +         {
> +           error ("invalid types in gen_mask");

Does the LHS need to have at least as many bits as the RHS has elements?
Also what about non-integral vector types on the RHS?  I suppose only
integral ones are valid?

> +           debug_generic_expr (lhs_type);
> +           debug_generic_expr (rhs1_type);
> +           return true;
> +         }
> +       return false;
> +
>      default:
>        gcc_unreachable ();
>      }
> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> index e1ceea4..052c055 100644
> --- a/gcc/tree-inline.c
> +++ b/gcc/tree-inline.c
> @@ -3850,6 +3850,7 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights,
>      case COMPLEX_EXPR:
>      case PAREN_EXPR:
>      case VIEW_CONVERT_EXPR:
> +    case GEN_MASK_EXPR:

eh...  is that important here?  After all expansion might fail to combine with
a previous compare.  I'd rather use the default.

>        return 0;
>
>      /* Assign cost of 1 to usual operations.
> diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
> index 7cd1fe7..5b6cdf0 100644
> --- a/gcc/tree-pretty-print.c
> +++ b/gcc/tree-pretty-print.c
> @@ -2485,6 +2485,12 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, int flags,
>        pp_greater (pp);
>        break;
>
> +    case GEN_MASK_EXPR:
> +      pp_string (pp, " GEN_MASK_EXPR < ");
> +      dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
> +      pp_string (pp, " > ");
> +      break;
> +
>      case VEC_COND_EXPR:
>        pp_string (pp, " VEC_COND_EXPR < ");
>        dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
> diff --git a/gcc/tree-ssa-operands.c b/gcc/tree-ssa-operands.c
> index b1e3f99..7151ad1 100644
> --- a/gcc/tree-ssa-operands.c
> +++ b/gcc/tree-ssa-operands.c
> @@ -837,6 +837,7 @@ get_expr_operands (struct function *fn, gimple stmt, tree *expr_p, int flags)
>         gimple_set_has_volatile_ops (stmt, true);
>        /* FALLTHRU */
>
> +    case GEN_MASK_EXPR:
>      case VIEW_CONVERT_EXPR:
>      do_unary:
>        get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 0), flags);
> diff --git a/gcc/tree.def b/gcc/tree.def
> index 56580af..42984ca 100644
> --- a/gcc/tree.def
> +++ b/gcc/tree.def
> @@ -536,6 +536,10 @@ DEFTREECODE (TARGET_EXPR, "target_expr", tcc_expression, 4)
>     1 and 2 are NULL.  The operands are then taken from the cfg edges. */
>  DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3)
>
> +/* Generate a scalar bitmask for a vector value.  Mask gets bits set
> +   for nonzero vector elements.  */
> +DEFTREECODE (GEN_MASK_EXPR, "gen_mask_expr", tcc_expression, 1)
> +
>  /* Vector conditional expression. It is like COND_EXPR, but with
>     vector operands.
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][Scalar masks 1/x] Introduce GEN_MASK_EXPR.
  2015-08-17 16:25 [RFC][Scalar masks 1/x] Introduce GEN_MASK_EXPR Ilya Enkovich
  2015-08-18 10:47 ` Richard Biener
@ 2015-08-20 18:46 ` Jeff Law
  2015-08-21 12:49   ` Ilya Enkovich
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff Law @ 2015-08-20 18:46 UTC (permalink / raw)
  To: Ilya Enkovich, gcc-patches

On 08/17/2015 10:22 AM, Ilya Enkovich wrote:
> Hi,
>
> This patch starts a series introducing scalar masks support in the vectorizer.  It was discussed on the recent Cauldron and changes overiew is available here: https://gcc.gnu.org/wiki/cauldron2015?action=AttachFile&do=view&target=Vectorization+for+Intel+AVX-512.pdf.  Here is shortly a list of changes introduced by this series:
>
>   - Add new tree expr to produce scalar masks in a vectorized code
>   - Fix-up if-conversion to use bool predicates instead of integer masks
>   - Disable some bool patterns to avoid bool to int conversion where masks can be used
>   - Support bool operands in vectorization factor computation
>   - Support scalar masks in MASK_LOAD, MASK_STORE and VEC_COND_EXPR by adding new optabs
>   - Support vectorization for statements which are now not transformed by bool patterns
>   - Add target support (hooks, optabs, expands)
>
> This patch introduces GEN_MASK_EXPR code.  Intitially I wanted to use a comparison as an operand for it directly mapping it into AVX-512 comparison instruction.  But a feedback was to simplify new code's semantics and use it for converting vectors into scalar masks.  Therefore if we want to compare two vectors into a scalar masks we use two statements:
>
>    vect.18_87 = vect__5.13_81 > vect__6.16_86;
>    mask__ifc__23.17_88 = GEN_MASK <vect.18_87>;
>
> Trying it in practice I found it producing worse code. The problem is that on target first comparison is expanded into two instructions: cmp with mask result + masked move to get a vector. GEN_MASK is then expanded into another comparison with zero vector.  Thus I get two comparisons + move instead of a single comparison and have to optimize this out on a target side (current optimizers can't handle it).  That's actually what I wanted to avoid.  For now I changed GEN_MASK_EXPR to get a vector value as an operand but didn't change expand pattern which has four opernads: two vectors to compare + cmp operator + result.  On expand I try to detect GEN_MASK uses a result of comparison and thus avoid double comparison generation.
>
> Patch series is not actually fully finished yet.  I still have several type conversion tests not being vectorized and it wasn't widely tested.  That's what I'm working on now.
>
> Will be glad to any comments.
>
> Thanks,
> Ilya
> --
> 2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>
>
> 	* expr.c (expand_expr_real_2): Support GEN_MASK_EXPR.
> 	* gimple-pretty-print.c (dump_unary_rhs): Likewise.
> 	* gimple.c (get_gimple_rhs_num_ops): Likewise.
> 	* optabs.c: Include gimple.h.
> 	(vector_compare_rtx): Add OPNO arg.
> 	(get_gen_mask_icode): New.
> 	(expand_gen_mask_expr_p): New.
> 	(expand_gen_mask_expr): New.
> 	(expand_vec_cond_expr): Adjust vector_compare_rtx call.
> 	* optabs.def (gen_mask_optab): New.
> 	(gen_masku_optab): New.
> 	* optabs.h (expand_gen_mask_expr_p): New.
> 	(expand_gen_mask_expr): New.
> 	* tree-cfg.c (verify_gimple_assign_unary): Support GEN_MASK_EXPR.
> 	* tree-inline.c (estimate_operator_cost): Likewise.
> 	* tree-pretty-print.c (dump_generic_node): Likewise.
> 	* tree-ssa-operands.c (get_expr_operands): Likewise.
> 	* tree.def (GEN_MASK_EXPR): New.
A general question, would any of this likely help Yuri's work to 
optimize MASK_STORES?

>
>
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index a6ca706..bf466ca 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
>   #include "recog.h"
>   #include "reload.h"
>   #include "target.h"
> +#include "gimple.h"
Hmm, part of me doesn't want to see optabs.c depending on gimple.h.

How painful would it be to have this stuff live in expr.c?

> +
> +/* Generate insns for a GEN_MASK_EXPR, given its TYPE and operand.  */
> +
> +rtx
> +expand_gen_mask_expr (tree type, tree op0, rtx target)
> +{
> +  struct expand_operand ops[4];
> +  enum insn_code icode;
> +  rtx comparison;
> +  machine_mode mode = TYPE_MODE (type);
> +  machine_mode cmp_op_mode;
> +  bool unsignedp;
> +  tree op0a, op0b;
> +  enum tree_code tcode;
> +  gimple def_stmt;
> +
> +  /* Avoid double comparison.  */
> +  if (TREE_CODE (op0) == SSA_NAME
> +      && (def_stmt = SSA_NAME_DEF_STMT (op0))
> +      && is_gimple_assign (def_stmt)
> +      && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt)) == tcc_comparison)
> +    {
> +      op0a = gimple_assign_rhs1 (def_stmt);
> +      op0b = gimple_assign_rhs2 (def_stmt);
> +      tcode = gimple_assign_rhs_code (def_stmt);
> +    }
> +  else
> +    {
> +      op0a = op0;
> +      op0b = build_zero_cst (TREE_TYPE (op0));
> +      tcode = NE_EXPR;
> +    }
> +
> +  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
> +  cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
> +
> +  gcc_assert (GET_MODE_BITSIZE (mode) >= GET_MODE_NUNITS (cmp_op_mode));
> +
> +  icode = get_gen_mask_icode (cmp_op_mode, unsignedp);
> +  if (icode == CODE_FOR_nothing)
> +    return 0;
So if the target doesn't have suitable insns, what happens?  I suspect 
the answer is nothing useful.  In which case the question becomes what 
prevents the optimizers from generating a GEN_MASK_EXPR?  Maybe that'll 
become clear as I read the rest of the patches.



>
> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> index e1ceea4..052c055 100644
> --- a/gcc/tree-inline.c
> +++ b/gcc/tree-inline.c
> @@ -3850,6 +3850,7 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights,
>       case COMPLEX_EXPR:
>       case PAREN_EXPR:
>       case VIEW_CONVERT_EXPR:
> +    case GEN_MASK_EXPR:
>         return 0;
That seems wrong to me :-)

Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][Scalar masks 1/x] Introduce GEN_MASK_EXPR.
  2015-08-20 18:46 ` Jeff Law
@ 2015-08-21 12:49   ` Ilya Enkovich
  2015-08-21 16:03     ` Jeff Law
  0 siblings, 1 reply; 8+ messages in thread
From: Ilya Enkovich @ 2015-08-21 12:49 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

2015-08-20 21:41 GMT+03:00 Jeff Law <law@redhat.com>:
> On 08/17/2015 10:22 AM, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> This patch starts a series introducing scalar masks support in the
>> vectorizer.  It was discussed on the recent Cauldron and changes overiew is
>> available here:
>> https://gcc.gnu.org/wiki/cauldron2015?action=AttachFile&do=view&target=Vectorization+for+Intel+AVX-512.pdf.
>> Here is shortly a list of changes introduced by this series:
>>
>>   - Add new tree expr to produce scalar masks in a vectorized code
>>   - Fix-up if-conversion to use bool predicates instead of integer masks
>>   - Disable some bool patterns to avoid bool to int conversion where masks
>> can be used
>>   - Support bool operands in vectorization factor computation
>>   - Support scalar masks in MASK_LOAD, MASK_STORE and VEC_COND_EXPR by
>> adding new optabs
>>   - Support vectorization for statements which are now not transformed by
>> bool patterns
>>   - Add target support (hooks, optabs, expands)
>>
>> This patch introduces GEN_MASK_EXPR code.  Intitially I wanted to use a
>> comparison as an operand for it directly mapping it into AVX-512 comparison
>> instruction.  But a feedback was to simplify new code's semantics and use it
>> for converting vectors into scalar masks.  Therefore if we want to compare
>> two vectors into a scalar masks we use two statements:
>>
>>    vect.18_87 = vect__5.13_81 > vect__6.16_86;
>>    mask__ifc__23.17_88 = GEN_MASK <vect.18_87>;
>>
>> Trying it in practice I found it producing worse code. The problem is that
>> on target first comparison is expanded into two instructions: cmp with mask
>> result + masked move to get a vector. GEN_MASK is then expanded into another
>> comparison with zero vector.  Thus I get two comparisons + move instead of a
>> single comparison and have to optimize this out on a target side (current
>> optimizers can't handle it).  That's actually what I wanted to avoid.  For
>> now I changed GEN_MASK_EXPR to get a vector value as an operand but didn't
>> change expand pattern which has four opernads: two vectors to compare + cmp
>> operator + result.  On expand I try to detect GEN_MASK uses a result of
>> comparison and thus avoid double comparison generation.
>>
>> Patch series is not actually fully finished yet.  I still have several
>> type conversion tests not being vectorized and it wasn't widely tested.
>> That's what I'm working on now.
>>
>> Will be glad to any comments.
>>
>> Thanks,
>> Ilya
>> --
>> 2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>
>>
>>         * expr.c (expand_expr_real_2): Support GEN_MASK_EXPR.
>>         * gimple-pretty-print.c (dump_unary_rhs): Likewise.
>>         * gimple.c (get_gimple_rhs_num_ops): Likewise.
>>         * optabs.c: Include gimple.h.
>>         (vector_compare_rtx): Add OPNO arg.
>>         (get_gen_mask_icode): New.
>>         (expand_gen_mask_expr_p): New.
>>         (expand_gen_mask_expr): New.
>>         (expand_vec_cond_expr): Adjust vector_compare_rtx call.
>>         * optabs.def (gen_mask_optab): New.
>>         (gen_masku_optab): New.
>>         * optabs.h (expand_gen_mask_expr_p): New.
>>         (expand_gen_mask_expr): New.
>>         * tree-cfg.c (verify_gimple_assign_unary): Support GEN_MASK_EXPR.
>>         * tree-inline.c (estimate_operator_cost): Likewise.
>>         * tree-pretty-print.c (dump_generic_node): Likewise.
>>         * tree-ssa-operands.c (get_expr_operands): Likewise.
>>         * tree.def (GEN_MASK_EXPR): New.
>
> A general question, would any of this likely help Yuri's work to optimize
> MASK_STORES?

If I remember correctly his optimization is for Haswell which doesn't
have scalar masks. But doing it with scalar masks would be simpler
because zero mask check becomes trivial.

>
>>
>>
>> diff --git a/gcc/optabs.c b/gcc/optabs.c
>> index a6ca706..bf466ca 100644
>> --- a/gcc/optabs.c
>> +++ b/gcc/optabs.c
>> @@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
>>   #include "recog.h"
>>   #include "reload.h"
>>   #include "target.h"
>> +#include "gimple.h"
>
> Hmm, part of me doesn't want to see optabs.c depending on gimple.h.
>
> How painful would it be to have this stuff live in expr.c?
>

Should be possible to refactor it somehow.

>
>> +
>> +/* Generate insns for a GEN_MASK_EXPR, given its TYPE and operand.  */
>> +
>> +rtx
>> +expand_gen_mask_expr (tree type, tree op0, rtx target)
>> +{
>> +  struct expand_operand ops[4];
>> +  enum insn_code icode;
>> +  rtx comparison;
>> +  machine_mode mode = TYPE_MODE (type);
>> +  machine_mode cmp_op_mode;
>> +  bool unsignedp;
>> +  tree op0a, op0b;
>> +  enum tree_code tcode;
>> +  gimple def_stmt;
>> +
>> +  /* Avoid double comparison.  */
>> +  if (TREE_CODE (op0) == SSA_NAME
>> +      && (def_stmt = SSA_NAME_DEF_STMT (op0))
>> +      && is_gimple_assign (def_stmt)
>> +      && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt)) ==
>> tcc_comparison)
>> +    {
>> +      op0a = gimple_assign_rhs1 (def_stmt);
>> +      op0b = gimple_assign_rhs2 (def_stmt);
>> +      tcode = gimple_assign_rhs_code (def_stmt);
>> +    }
>> +  else
>> +    {
>> +      op0a = op0;
>> +      op0b = build_zero_cst (TREE_TYPE (op0));
>> +      tcode = NE_EXPR;
>> +    }
>> +
>> +  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
>> +  cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
>> +
>> +  gcc_assert (GET_MODE_BITSIZE (mode) >= GET_MODE_NUNITS (cmp_op_mode));
>> +
>> +  icode = get_gen_mask_icode (cmp_op_mode, unsignedp);
>> +  if (icode == CODE_FOR_nothing)
>> +    return 0;
>
> So if the target doesn't have suitable insns, what happens?  I suspect the
> answer is nothing useful.  In which case the question becomes what prevents
> the optimizers from generating a GEN_MASK_EXPR?  Maybe that'll become clear
> as I read the rest of the patches.

I don't expect other optimizer than vectorizer to generate it. And as
usually optab is checked before generating it.

>
>
>
>>
>> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
>> index e1ceea4..052c055 100644
>> --- a/gcc/tree-inline.c
>> +++ b/gcc/tree-inline.c
>> @@ -3850,6 +3850,7 @@ estimate_operator_cost (enum tree_code code,
>> eni_weights *weights,
>>       case COMPLEX_EXPR:
>>       case PAREN_EXPR:
>>       case VIEW_CONVERT_EXPR:
>> +    case GEN_MASK_EXPR:
>>         return 0;
>
> That seems wrong to me :-)

Got it :)

Thanks,
Ilya

>
> Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][Scalar masks 1/x] Introduce GEN_MASK_EXPR.
  2015-08-21 12:49   ` Ilya Enkovich
@ 2015-08-21 16:03     ` Jeff Law
  2015-08-21 16:38       ` Ilya Enkovich
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Law @ 2015-08-21 16:03 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: gcc-patches

On 08/21/2015 06:30 AM, Ilya Enkovich wrote:
> 2015-08-20 21:41 GMT+03:00 Jeff Law <law@redhat.com>:
>>>> 2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>
>>>
>>>          * expr.c (expand_expr_real_2): Support GEN_MASK_EXPR.
>>>          * gimple-pretty-print.c (dump_unary_rhs): Likewise.
>>>          * gimple.c (get_gimple_rhs_num_ops): Likewise.
>>>          * optabs.c: Include gimple.h.
>>>          (vector_compare_rtx): Add OPNO arg.
>>>          (get_gen_mask_icode): New.
>>>          (expand_gen_mask_expr_p): New.
>>>          (expand_gen_mask_expr): New.
>>>          (expand_vec_cond_expr): Adjust vector_compare_rtx call.
>>>          * optabs.def (gen_mask_optab): New.
>>>          (gen_masku_optab): New.
>>>          * optabs.h (expand_gen_mask_expr_p): New.
>>>          (expand_gen_mask_expr): New.
>>>          * tree-cfg.c (verify_gimple_assign_unary): Support GEN_MASK_EXPR.
>>>          * tree-inline.c (estimate_operator_cost): Likewise.
>>>          * tree-pretty-print.c (dump_generic_node): Likewise.
>>>          * tree-ssa-operands.c (get_expr_operands): Likewise.
>>>          * tree.def (GEN_MASK_EXPR): New.
>>
>> A general question, would any of this likely help Yuri's work to optimize
>> MASK_STORES?
>
> If I remember correctly his optimization is for Haswell which doesn't
> have scalar masks. But doing it with scalar masks would be simpler
> because zero mask check becomes trivial.
I was thinking mostly about the mask check.



>> So if the target doesn't have suitable insns, what happens?  I suspect the
>> answer is nothing useful.  In which case the question becomes what prevents
>> the optimizers from generating a GEN_MASK_EXPR?  Maybe that'll become clear
>> as I read the rest of the patches.
>
> I don't expect other optimizer than vectorizer to generate it. And as
> usually optab is checked before generating it.
If we're checking an optab to drive an optimization, then we're probably 
on the wrong track.   I think this ties into the overall discussion 
about whether or not to represent these masks in gimple or try to handle 
them later during gimple->rtl expansion.

Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][Scalar masks 1/x] Introduce GEN_MASK_EXPR.
  2015-08-21 16:03     ` Jeff Law
@ 2015-08-21 16:38       ` Ilya Enkovich
  2015-08-25 21:18         ` Jeff Law
  0 siblings, 1 reply; 8+ messages in thread
From: Ilya Enkovich @ 2015-08-21 16:38 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

2015-08-21 18:57 GMT+03:00 Jeff Law <law@redhat.com>:
> On 08/21/2015 06:30 AM, Ilya Enkovich wrote:
>>
>> 2015-08-20 21:41 GMT+03:00 Jeff Law <law@redhat.com>:
>>>>>
>>>>> 2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>
>>>>
>>>>
>>>>          * expr.c (expand_expr_real_2): Support GEN_MASK_EXPR.
>>>>          * gimple-pretty-print.c (dump_unary_rhs): Likewise.
>>>>          * gimple.c (get_gimple_rhs_num_ops): Likewise.
>>>>          * optabs.c: Include gimple.h.
>>>>          (vector_compare_rtx): Add OPNO arg.
>>>>          (get_gen_mask_icode): New.
>>>>          (expand_gen_mask_expr_p): New.
>>>>          (expand_gen_mask_expr): New.
>>>>          (expand_vec_cond_expr): Adjust vector_compare_rtx call.
>>>>          * optabs.def (gen_mask_optab): New.
>>>>          (gen_masku_optab): New.
>>>>          * optabs.h (expand_gen_mask_expr_p): New.
>>>>          (expand_gen_mask_expr): New.
>>>>          * tree-cfg.c (verify_gimple_assign_unary): Support
>>>> GEN_MASK_EXPR.
>>>>          * tree-inline.c (estimate_operator_cost): Likewise.
>>>>          * tree-pretty-print.c (dump_generic_node): Likewise.
>>>>          * tree-ssa-operands.c (get_expr_operands): Likewise.
>>>>          * tree.def (GEN_MASK_EXPR): New.
>>>
>>>
>>> A general question, would any of this likely help Yuri's work to optimize
>>> MASK_STORES?
>>
>>
>> If I remember correctly his optimization is for Haswell which doesn't
>> have scalar masks. But doing it with scalar masks would be simpler
>> because zero mask check becomes trivial.
>
> I was thinking mostly about the mask check.
>
>
>
>>> So if the target doesn't have suitable insns, what happens?  I suspect
>>> the
>>> answer is nothing useful.  In which case the question becomes what
>>> prevents
>>> the optimizers from generating a GEN_MASK_EXPR?  Maybe that'll become
>>> clear
>>> as I read the rest of the patches.
>>
>>
>> I don't expect other optimizer than vectorizer to generate it. And as
>> usually optab is checked before generating it.
>
> If we're checking an optab to drive an optimization, then we're probably on
> the wrong track.

That's totally similar to VEC_COND_EXPR which we generate comparison into.

>I think this ties into the overall discussion about
> whether or not to represent these masks in gimple or try to handle them
> later during gimple->rtl expansion.

Currently we don't have any abstraction for masks, it is supported
using vector of integers. When I expand it I have no idea whether it
is just a vector of integers to be stored or a mask to be used for
MASK_LOAD. Actually it may be used in both cases at the same time.

Handling it in RTL means we have to undo bool->int transformation made
in GIMPLE. For trivial cases it may be easy but in generic it can be
challenging.  I want to avoid it from the beginning.

Thanks,
Ilya

>
> Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][Scalar masks 1/x] Introduce GEN_MASK_EXPR.
  2015-08-21 16:38       ` Ilya Enkovich
@ 2015-08-25 21:18         ` Jeff Law
  2015-09-18 12:17           ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Law @ 2015-08-25 21:18 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: gcc-patches

On 08/21/2015 10:30 AM, Ilya Enkovich wrote:
>> If we're checking an optab to drive an optimization, then we're probably on
>> the wrong track.
>
> That's totally similar to VEC_COND_EXPR which we generate comparison into.
It is.  The vectorizer is riddled with this stuff.  Sigh.  So I won't 
consider this a negative for the scalar  mask support.

>
>> I think this ties into the overall discussion about
>> whether or not to represent these masks in gimple or try to handle them
>> later during gimple->rtl expansion.
>
> Currently we don't have any abstraction for masks, it is supported
> using vector of integers. When I expand it I have no idea whether it
> is just a vector of integers to be stored or a mask to be used for
> MASK_LOAD. Actually it may be used in both cases at the same time.
>
> Handling it in RTL means we have to undo bool->int transformation made
> in GIMPLE. For trivial cases it may be easy but in generic it can be
> challenging.  I want to avoid it from the beginning.
I wasn't suggesting handling them in RTL, but at the border between 
gimple and RTL.

But if we can't reliably determine how a particular mask is going to be 
used at that point, then doing things at the gimple/RTL border may be a 
spectacularly bad idea ;-)

jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][Scalar masks 1/x] Introduce GEN_MASK_EXPR.
  2015-08-25 21:18         ` Jeff Law
@ 2015-09-18 12:17           ` Richard Biener
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Biener @ 2015-09-18 12:17 UTC (permalink / raw)
  To: Jeff Law; +Cc: Ilya Enkovich, gcc-patches

On Tue, Aug 25, 2015 at 11:02 PM, Jeff Law <law@redhat.com> wrote:
> On 08/21/2015 10:30 AM, Ilya Enkovich wrote:
>>>
>>> If we're checking an optab to drive an optimization, then we're probably
>>> on
>>> the wrong track.
>>
>>
>> That's totally similar to VEC_COND_EXPR which we generate comparison into.
>
> It is.  The vectorizer is riddled with this stuff.  Sigh.  So I won't
> consider this a negative for the scalar  mask support.

Hey, just because it's already bad don't make it worse!

>>
>>> I think this ties into the overall discussion about
>>> whether or not to represent these masks in gimple or try to handle them
>>> later during gimple->rtl expansion.
>>
>>
>> Currently we don't have any abstraction for masks, it is supported
>> using vector of integers. When I expand it I have no idea whether it
>> is just a vector of integers to be stored or a mask to be used for
>> MASK_LOAD. Actually it may be used in both cases at the same time.
>>
>> Handling it in RTL means we have to undo bool->int transformation made
>> in GIMPLE. For trivial cases it may be easy but in generic it can be
>> challenging.  I want to avoid it from the beginning.
>
> I wasn't suggesting handling them in RTL, but at the border between gimple
> and RTL.
>
> But if we can't reliably determine how a particular mask is going to be used
> at that point, then doing things at the gimple/RTL border may be a
> spectacularly bad idea ;-)

Yeah, I also see Ilyas point here.  Basically the new integer masks are
generated by if-conversion only.  What I was suggesting is to differentiate
both cases by making vector comparisons always result in vector<bool>.
Thus if you have

  if (a < b || d > e)
   x = c;

then vectorized if-converted code would generate

  vector<bool> cnd = a < b;
  vector<bool> cnd2 = d > e;
  vector<bool> cnd3 = cnd | cnd2;
  x = cnd3 ? c : x;

when the target supports vector<bool> and otherwise

  vector<int> cnd = a < b ? -1 : 0;
  vector<int> cnd2 = d > e ? -1 : 0;
  vector<int> cnd3 = cnd | cnd2;
  x = cnd3 ? c : x;

which means only VEC_COND expr would support both kind of masks
(well, and the IFNs for masked loads/stores).  Basically the vectorizer
needs to handle vectorizing conditions like

   cnd = a < b;

dependent on target support for masks (that is vectorizing of 'bool'
operations).

Reworking the existing vectorization of bool operations to not depend
on pattern detection would be a first step (and already soving a few
PRs on the way).

Richard.

>
> jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-09-18 12:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-17 16:25 [RFC][Scalar masks 1/x] Introduce GEN_MASK_EXPR Ilya Enkovich
2015-08-18 10:47 ` Richard Biener
2015-08-20 18:46 ` Jeff Law
2015-08-21 12:49   ` Ilya Enkovich
2015-08-21 16:03     ` Jeff Law
2015-08-21 16:38       ` Ilya Enkovich
2015-08-25 21:18         ` Jeff Law
2015-09-18 12:17           ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).