Hi Richard, Here is updated patch for middle-end part of the whole patch which fixes all your remarks I hope. Regression testing and bootstrapping did not show any new failures. Is it OK for trunk? Yuri. ChangeLog: 2015-12-18 Yuri Rumyantsev PR middle-end/68542 * fold-const.c (fold_binary_op_with_conditional_arg): Bail out for case of mixind vector and scalar types. (fold_relational_const): Add handling of vector comparison with boolean result. * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow comparison of vector operands with boolean result for EQ/NE only. (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison. (verify_gimple_cond): Likewise. * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform combining for non-compatible vector types. * tree-vrp.c (register_edge_assert_for): VRP does not track ranges for vector types. 2015-12-16 16:37 GMT+03:00 Richard Biener : > On Fri, Dec 11, 2015 at 3:03 PM, Yuri Rumyantsev wrote: >> Richard. >> Thanks for your review. >> I re-designed fix for assert by adding additional checks for vector >> comparison with boolean result to fold_binary_op_with_conditional_arg >> and remove early exit to combine_cond_expr_cond. >> Unfortunately, I am not able to provide you with test-case since it is >> in my second patch related to back-end patch which I sent earlier >> (12-08). >> >> Bootstrapping and regression testing did not show any new failures. >> Is it OK for trunk? > > + else if (TREE_CODE (type) == VECTOR_TYPE) > { > tree testtype = TREE_TYPE (cond); > test = cond; > true_value = constant_boolean_node (true, testtype); > false_value = constant_boolean_node (false, testtype); > } > + else > + { > + test = cond; > + cond_type = type; > + true_value = boolean_true_node; > + false_value = boolean_false_node; > + } > > So this is, say, vec1 != vec2 with scalar vs. vector result. If we have > scalar result and thus, say, scalar + vec1 != vec2. I believe rather > than doing the above (not seeing how this not would generate wrong > code eventually) we should simply detect the case of mixing vector > and scalar types and bail out. At least without some comments > your patch makes the function even more difficult to understand than > it is already. > > @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1) > if (TREE_CODE (op0_type) == VECTOR_TYPE > || TREE_CODE (op1_type) == VECTOR_TYPE) > { > - error ("vector comparison returning a boolean"); > - debug_generic_expr (op0_type); > - debug_generic_expr (op1_type); > - return true; > + /* Allow vector comparison returning boolean if operand types > + are boolean or integral and CODE is EQ/NE. */ > + if (code != EQ_EXPR && code != NE_EXPR > + && !VECTOR_BOOLEAN_TYPE_P (op0_type) > + && !VECTOR_INTEGER_TYPE_P (op0_type)) > + { > + error ("type mismatch for vector comparison returning a boolean"); > + debug_generic_expr (op0_type); > + debug_generic_expr (op1_type); > + return true; > + } > } > } > /* Or a boolean vector type with the same element count > > as said before please merge the cascaded if()s. Better wording for > the error is "unsupported operation or type for vector comparison > returning a boolean" > > Otherwise the patch looks sensible to me though it shows that overloading of > EQ/NE_EXPR for scalar result and vector operands might have some more unexpected > fallout (which is why I originally prefered the view-convert to large > integer type variant). > > Thanks, > Richard. > > >> ChangeLog: >> 2015-12-11 Yuri Rumyantsev >> >> PR middle-end/68542 >> * fold-const.c (fold_binary_op_with_conditional_arg): Add checks oh >> vector comparison with boolean result to avoid ICE. >> (fold_relational_const): Add handling of vector >> comparison with boolean result. >> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow >> comparison of vector operands with boolean result for EQ/NE only. >> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison. >> (verify_gimple_cond): Likewise. >> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform >> combining for non-compatible vector types. >> * tree-vrp.c (register_edge_assert_for): VRP does not track ranges for >> vector types. >> >> 2015-12-10 16:36 GMT+03:00 Richard Biener : >>> On Fri, Dec 4, 2015 at 4:07 PM, Yuri Rumyantsev wrote: >>>> Hi Richard. >>>> >>>> Thanks a lot for your review. >>>> Below are my answers. >>>> >>>> You asked why I inserted additional check to >>>> ++ b/gcc/tree-ssa-forwprop.c >>>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum >>>> tree_code code, tree type, >>>> >>>> gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison); >>>> >>>> + /* Do not perform combining it types are not compatible. */ >>>> + if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE >>>> + && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0)))) >>>> + return NULL_TREE; >>>> + >>>> >>>> again, how does this happen? >>>> >>>> This is because without it I've got assert in fold_convert_loc >>>> gcc_assert (TREE_CODE (orig) == VECTOR_TYPE >>>> && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig))); >>>> >>>> since it tries to convert vector of bool to scalar bool. >>>> Here is essential part of call-stack: >>>> >>>> #0 internal_error (gmsgid=0x1e48397 "in %s, at %s:%d") >>>> at ../../gcc/diagnostic.c:1259 >>>> #1 0x0000000001743ada in fancy_abort ( >>>> file=0x1847fc3 "../../gcc/fold-const.c", line=2217, >>>> function=0x184b9d0 >>> tree_node*)::__FUNCTION__> "fold_convert_loc") at >>>> ../../gcc/diagnostic.c:1332 >>>> #2 0x00000000009c8330 in fold_convert_loc (loc=0, type=0x7ffff18a9d20, >>>> arg=0x7ffff1a7f488) at ../../gcc/fold-const.c:2216 >>>> #3 0x00000000009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR, >>>> type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000, >>>> op2=0x7ffff18c2030) at ../../gcc/fold-const.c:11453 >>>> #4 0x00000000009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR, >>>> type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000, >>>> op2=0x7ffff18c2030) at ../../gcc/fold-const.c:12394 >>>> #5 0x00000000009d870c in fold_binary_op_with_conditional_arg (loc=0, >>>> code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460, >>>> op1=0x7ffff1a48780, cond=0x7ffff1a7f460, arg=0x7ffff1a48780, >>>> cond_first_p=1) at ../../gcc/fold-const.c:6465 >>>> #6 0x00000000009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR, >>>> type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff1a48780) >>>> at ../../gcc/fold-const.c:9211 >>>> #7 0x0000000000ecb8fa in combine_cond_expr_cond (stmt=0x7ffff1a487d0, >>>> code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460, >>>> op1=0x7ffff1a48780, invariant_only=true) >>>> at ../../gcc/tree-ssa-forwprop.c:382 >>> >>> Ok, but that only shows that >>> >>> /* Convert A ? 1 : 0 to simply A. */ >>> if ((code == VEC_COND_EXPR ? integer_all_onesp (op1) >>> : (integer_onep (op1) >>> && !VECTOR_TYPE_P (type))) >>> && integer_zerop (op2) >>> /* If we try to convert OP0 to our type, the >>> call to fold will try to move the conversion inside >>> a COND, which will recurse. In that case, the COND_EXPR >>> is probably the best choice, so leave it alone. */ >>> && type == TREE_TYPE (arg0)) >>> return pedantic_non_lvalue_loc (loc, arg0); >>> >>> /* Convert A ? 0 : 1 to !A. This prefers the use of NOT_EXPR >>> over COND_EXPR in cases such as floating point comparisons. */ >>> if (integer_zerop (op1) >>> && (code == VEC_COND_EXPR ? integer_all_onesp (op2) >>> : (integer_onep (op2) >>> && !VECTOR_TYPE_P (type))) >>> && truth_value_p (TREE_CODE (arg0))) >>> return pedantic_non_lvalue_loc (loc, >>> fold_convert_loc (loc, type, >>> invert_truthvalue_loc (loc, >>> arg0))); >>> >>> are wrong? I can't say for sure without a testcase. >>> >>> That said, papering over this in tree-ssa-forwprop.c is not the >>> correct thing to do. >>> >>>> Secondly, I did not catch your idea to implement GCC Vector Extension >>>> for vector comparison with bool result since >>>> such extension completely depends on comparison context, e.g. for your >>>> example, result type of comparison depends on using - for >>>> if-comparison it is scalar, but for c = (a==b) - result type is >>>> vector. I don't think that this is reasonable for current release. >>> >>> The idea was to be able to write testcases exercising different EQ/NE vector >>> compares. But yes, if that's non-trivial the it's not appropriate for stage3. >>> >>> Can you add a testcase for the forwprop issue and try to fix the offending >>> bogus folders instead? >>> >>> Thanks, >>> Richard. >>> >>>> And finally about AMD performance. I checked that this transformation >>>> works for "-march=bdver4" option and regression for 481.wrf must >>>> disappear too. >>>> >>>> Thanks. >>>> Yuri. >>>> >>>> 2015-12-04 15:18 GMT+03:00 Richard Biener : >>>>> On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev wrote: >>>>>> Hi All, >>>>>> >>>>>> Here is a patch for 481.wrf preformance regression for avx2 which is >>>>>> sligthly modified mask store optimization. This transformation allows >>>>>> perform unpredication for semi-hammock containing masked stores, other >>>>>> words if we have a loop like >>>>>> for (i=0; i>>>>> if (c[i]) { >>>>>> p1[i] += 1; >>>>>> p2[i] = p3[i] +2; >>>>>> } >>>>>> >>>>>> then it will be transformed to >>>>>> if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) { >>>>>> vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165); >>>>>> vect__12.22_172 = vect__11.19_170 + vect_cst__171; >>>>>> MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172); >>>>>> vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165); >>>>>> vect__19.28_184 = vect__18.25_182 + vect_cst__183; >>>>>> MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184); >>>>>> } >>>>>> i.e. it will put all computations related to masked stores to semi-hammock. >>>>>> >>>>>> Bootstrapping and regression testing did not show any new failures. >>>>> >>>>> Can you please split out the middle-end support for vector equality compares? >>>>> >>>>> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1) >>>>> if (TREE_CODE (op0_type) == VECTOR_TYPE >>>>> || TREE_CODE (op1_type) == VECTOR_TYPE) >>>>> { >>>>> - error ("vector comparison returning a boolean"); >>>>> - debug_generic_expr (op0_type); >>>>> - debug_generic_expr (op1_type); >>>>> - return true; >>>>> + /* Allow vector comparison returning boolean if operand types >>>>> + are equal and CODE is EQ/NE. */ >>>>> + if ((code != EQ_EXPR && code != NE_EXPR) >>>>> + || !(VECTOR_BOOLEAN_TYPE_P (op0_type) >>>>> + || VECTOR_INTEGER_TYPE_P (op0_type))) >>>>> + { >>>>> + error ("type mismatch for vector comparison returning a boolean"); >>>>> + debug_generic_expr (op0_type); >>>>> + debug_generic_expr (op1_type); >>>>> + return true; >>>>> + } >>>>> } >>>>> } >>>>> >>>>> please merge the conditions with a && >>>>> >>>>> @@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code, >>>>> tree type, tree op0, tree op1) >>>>> >>>>> if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST) >>>>> { >>>>> + if (INTEGRAL_TYPE_P (type) >>>>> + && (TREE_CODE (type) == BOOLEAN_TYPE >>>>> + || TYPE_PRECISION (type) == 1)) >>>>> + { >>>>> + /* Have vector comparison with scalar boolean result. */ >>>>> + bool result = true; >>>>> + gcc_assert (code == EQ_EXPR || code == NE_EXPR); >>>>> + gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1)); >>>>> + for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++) >>>>> + { >>>>> + tree elem0 = VECTOR_CST_ELT (op0, i); >>>>> + tree elem1 = VECTOR_CST_ELT (op1, i); >>>>> + tree tmp = fold_relational_const (code, type, elem0, elem1); >>>>> + result &= integer_onep (tmp); >>>>> + if (code == NE_EXPR) >>>>> + result = !result; >>>>> + return constant_boolean_node (result, type); >>>>> >>>>> ... just assumes it is either EQ_EXPR or NE_EXPR. I believe you want >>>>> to change the >>>>> guarding condition to just >>>>> >>>>> if (! VECTOR_TYPE_P (type)) >>>>> >>>>> and assert the boolean/precision. Please also merge the asserts into >>>>> one with && >>>>> >>>>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c >>>>> index b82ae3c..73ee3be 100644 >>>>> --- a/gcc/tree-ssa-forwprop.c >>>>> +++ b/gcc/tree-ssa-forwprop.c >>>>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum >>>>> tree_code code, tree type, >>>>> >>>>> gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison); >>>>> >>>>> + /* Do not perform combining it types are not compatible. */ >>>>> + if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE >>>>> + && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0)))) >>>>> + return NULL_TREE; >>>>> + >>>>> >>>>> again, how does this happen? >>>>> >>>>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c >>>>> index e67048e..1605520c 100644 >>>>> --- a/gcc/tree-vrp.c >>>>> +++ b/gcc/tree-vrp.c >>>>> @@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e, >>>>> gimple_stmt_iterator si, >>>>> &comp_code, &val)) >>>>> return; >>>>> >>>>> + /* Use of vector comparison in gcond is very restricted and used to check >>>>> + that the mask in masked store is zero, so assert for such comparison >>>>> + is not implemented yet. */ >>>>> + if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE) >>>>> + return; >>>>> + >>>>> >>>>> VECTOR_TYPE_P >>>>> >>>>> I believe the comment should simply say that VRP doesn't track ranges for >>>>> vector types. >>>>> >>>>> In the previous review I suggested you should make sure that RTL expansion >>>>> ends up using a well-defined optab for these compares. To make sure >>>>> this happens across targets I suggest you make these comparisons available >>>>> via the GCC vector extension. Thus allow >>>>> >>>>> typedef int v4si __attribute__((vector_size(16))); >>>>> >>>>> int foo (v4si a, v4si b) >>>>> { >>>>> if (a == b) >>>>> return 4; >>>>> } >>>>> >>>>> and != and also using floating point vectors. >>>>> >>>>> Otherwise it's hard to see the impact of this change. Obvious choices >>>>> are the eq/ne optabs for FP compares and [u]cmp optabs for integer >>>>> compares. >>>>> >>>>> A half-way implementation like your VRP comment suggests (only >>>>> ==/!= zero against integer vectors is implemented?!) this doesn't sound >>>>> good without also limiting the feature this way in the verifier. >>>>> >>>>> Btw, the regression with WRF is >50% on AMD Bulldozer (which only >>>>> has AVX, not AVX2). >>>>> >>>>> Thanks, >>>>> Richard. >>>>> >>>>>> ChangeLog: >>>>>> 2015-11-30 Yuri Rumyantsev >>>>>> >>>>>> PR middle-end/68542 >>>>>> * config/i386/i386.c (ix86_expand_branch): Implement integral vector >>>>>> comparison with boolean result. >>>>>> * config/i386/sse.md (define_expand "cbranch4): Add define-expand >>>>>> for vector comparion with eq/ne only. >>>>>> * fold-const.c (fold_relational_const): Add handling of vector >>>>>> comparison with boolean result. >>>>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow >>>>>> comparison of vector operands with boolean result for EQ/NE only. >>>>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison. >>>>>> (verify_gimple_cond): Likewise. >>>>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform >>>>>> combining for non-compatible vector types. >>>>>> * tree-vect-loop.c (is_valid_sink): New function. >>>>>> (optimize_mask_stores): Likewise. >>>>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize >>>>>> has_mask_store field of vect_info. >>>>>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for >>>>>> vectorized loops having masked stores. >>>>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and >>>>>> correspondent macros. >>>>>> (optimize_mask_stores): Add prototype. >>>>>> * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector >>>>>> type. >>>>>> >>>>>> gcc/testsuite/ChangeLog: >>>>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.