Hi Richard, I send you updated version of patch which contains fixes you mentioned and additional early exit in register_edge_assert_for() for gcond with vector comparison - it tries to produce assert for if (vect != {0,0,0,0}) but can't create such constant. This is not essential since this is applied to very specialized context. My answers are below. 2015-11-12 16:58 GMT+03:00 Richard Biener : > On Wed, Nov 11, 2015 at 2:13 PM, Yuri Rumyantsev wrote: >> Richard, >> >> What we should do to cope with this problem (structure size increasing)? >> Should we return to vector comparison version? > > Ok, given this constraint I think the cleanest approach is to allow > integer(!) vector equality(!) compares with scalar result. This should then > expand via cmp_optab and not via vec_cmp_optab. In fact it is expanded through cbranch_optab since the only use of such comparison is for masked store motion > > On gimple you can then have > > if (mask_vec_1 != {0, 0, .... }) > ... > > Note that a fallback expansion (for optabs.c to try) would be > the suggested view-conversion (aka, subreg) variant using > a same-sized integer mode. > > Target maintainers can then choose what is a better fit for > their target (and instruction set as register set constraints may apply). > > The patch you posted seems to do this but not restrict the compares > to integer ones (please do that). > > if (TREE_CODE (op0_type) == VECTOR_TYPE > || TREE_CODE (op1_type) == VECTOR_TYPE) > { > - error ("vector comparison returning a boolean"); > - debug_generic_expr (op0_type); > - debug_generic_expr (op1_type); > - return true; > + /* Allow vector comparison returning boolean if operand types > + are equal and CODE is EQ/NE. */ > + if ((code != EQ_EXPR && code != NE_EXPR) > + || TREE_CODE (op0_type) != TREE_CODE (op1_type) > + || TYPE_VECTOR_SUBPARTS (op0_type) > + != TYPE_VECTOR_SUBPARTS (op1_type) > + || GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op0_type))) > + != GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op1_type)))) > > These are all checked with the useless_type_conversion_p checks done earlier. > > As said I'd like to see a > > || ! VECTOR_INTEGER_TYPE_P (op0_type) I added check on VECTOR_BOOLEAN_TYPE_P (op0_type) instead since type of mask was changed. > > check added so we and targets do not need to worry about using EQ/NE vs. CMP > and worry about signed zeros and friends. > > + { > + error ("type mismatch for vector comparison returning a boolean"); > + debug_generic_expr (op0_type); > + debug_generic_expr (op1_type); > + return true; > + } > > > > --- a/gcc/tree-ssa-forwprop.c > +++ b/gcc/tree-ssa-forwprop.c > @@ -422,6 +422,15 @@ forward_propagate_into_comparison_1 (gimple *stmt, > enum tree_code def_code = gimple_assign_rhs_code (def_stmt); > bool invariant_only_p = !single_use0_p; > > + /* Can't combine vector comparison with scalar boolean type of > + the result and VEC_COND_EXPR having vector type of comparison. */ > + if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE > + && INTEGRAL_TYPE_P (type) > + && (TREE_CODE (type) == BOOLEAN_TYPE > + || TYPE_PRECISION (type) == 1) > + && def_code == VEC_COND_EXPR) > + return NULL_TREE; > > this hints at larger fallout you paper over here. So this effectively > means we're trying combining (vec1 != vec2) != 0 for example > and that fails miserably? If so then the solution is to fix whatever > does not expect this (valid) GENERIC tree. I changed it to the following check in combine_cond_expr_cond: /* Do not perform combining it types are not compatible. */ if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0)))) return NULL_TREE; > > + if (ENABLE_ZERO_TEST_FOR_MASK_STORE == 0) > + return; > > not sure if I like a param more than a target hook ... :/ I introduced the param instead of a target hook to make this transformation user visible, i.e. to switch it on/off for different targets. > > + /* Create vector comparison with boolean result. */ > + vectype = TREE_TYPE (mask); > + zero = build_zero_cst (TREE_TYPE (vectype)); > + zero = build_vector_from_val (vectype, zero); > > build_zero_cst (vectype); Done. > > + stmt = gimple_build_cond (EQ_EXPR, mask, zero, NULL_TREE, NULL_TREE); > > you can omit the NULL_TREE operands. I did not find such definition for it. > > + gcc_assert (vdef && TREE_CODE (vdef) == SSA_NAME); > > please omit the assert. Done. > > + gimple_set_vdef (last, new_vdef); > > do this before you create the PHI. > Done. > + /* Put definition statement of stored value in STORE_BB > + if possible. */ > + arg3 = gimple_call_arg (last, 3); > + if (TREE_CODE (arg3) == SSA_NAME && has_single_use (arg3)) > + { > ... > > is this really necessary? It looks incomplete to me anyway. I'd rather have > a late sink pass if this shows necessary. Btw,.. I tried to avoid creation of multiple ne basic blocks for the same mask and also I don't want to put all semi-hammock guarded by this mask to separate block to keep it small enough since x86 chips prefer short branches. Note also that icc does almost the same. > > + it is legal. */ > + if (gimple_bb (def_stmt) == bb > + && is_valid_sink (def_stmt, last_store)) > > with the implementation of is_valid_sink this is effectively > > && (!gimple_vuse (def_stmt) > || gimple_vuse (def_stmt) == gimple_vdef (last_store)) I did inlining of correspondent part of "is_valif_sink" to this place. > > I still think this "pass" is quite a hack, esp. as it appears as generic > code in a GIMPLE pass. And esp. as this hack seems to be needed > for Haswell only, not Boradwell or Skylake. This is not truth since for all them this transformation is performed for skylake and broadwell since both them belong to HASWELL family. > > Thanks, > Richard. > ChangeLog: 2015-11-19 Yuri Rumyantsev * config/i386/i386.c: Add conditional initialization of PARAM_ZERO_TEST_FOR_MASK_STORE. (ix86_expand_branch): Implement vector comparison with boolean result. * config/i386/i386.h: New macros TARGET_OPTIMIZE_MASK_STORE. * config/i386/sse.md (define_expand "cbranch4): Add define-expand for vector comparion with eq/ne only. * config/i386/x86-tune.def: New macros X86_TUNE_OPTIMIZE_MASK_STORE. * fold-const.c (fold_relational_const): Add handling of vector comparison with boolean result. * params.def (PARAM_ZERO_TEST_FOR_MASK_STORE): New DEFPARAM. * params.h (ENABLE_ZERO_TEST_FOR_MASK_STORE): New macros. * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow comparison of vector operands with boolean result for EQ/NE only. (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison. (verify_gimple_cond): Likewise. * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform combining for non-compatible vector types. * tree-vect-loop.c (is_valid_sink): New function. (optimize_mask_stores): Likewise. * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize has_mask_store field of vect_info. * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for vectorized loops having masked stores. * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and correspondent macros. (optimize_mask_stores): Add prototype. * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector type. gcc/testsuite/ChangeLog: * gcc.target/i386/avx2-vect-mask-store-move1.c: New test. * gcc.target/i386/avx2-vect-mask-store-move2.c: Likewise.