> > + vectype = truth_type_for (comp_type); > > so this leaves the producer of the mask in the GIMPLE_COND and we > vectorize the GIMPLE_COND as > > mask_1 = ...; > if (mask_1 != {-1,-1...}) > .. > > ? In principle only the mask producer needs a vector type and that > adjusted by bool handling, the branch itself doesn't need any > STMT_VINFO_VECTYPE. > > As said I believe if you recognize a GIMPLE_COND pattern for conds > that aren't bool != 0 producing the mask stmt this should be picked > up by bool handling correctly already. > > Also as said piggy-backing on the COND_EXPR handling in this function > which has the condition split out into a separate stmt(!) might not > completely handle things correctly and you are likely missing > the tcc_comparison handling of the embedded compare. > Ok, I've stopped piggy-backing on the COND_EXPR handling and created vect_recog_gcond_pattern. As you said in the previous email I've also stopped setting the vectype for the gcond and instead use the type of the operand. Note that because the pattern doesn't apply if you were already an NE_EXPR I do need the extra truth_type_for for that case. Because in the case of e.g. a = b > 4; If (a != 0) The producer of the mask is already outside of the cond but will not trigger Boolean recognition. That means that while the integral type is correct it Won't be a Boolean one and vectorable_comparison expects a Boolean vector. Alternatively, we can remove that assert? But that seems worse. Additionally in the previous email you mention "adjusted Boolean statement". I'm guessing you were referring to generating a COND_EXPR from the gcond. So vect_recog_bool_pattern detects it? The problem with that this gets folded to x & 1 and doesn't trigger. It also then blocks vectorization. So instead I've not forced it. > > + /* Determine if we need to reduce the final value. */ > > + if (stmts.length () > 1) > > + { > > + /* We build the reductions in a way to maintain as much parallelism as > > + possible. */ > > + auto_vec workset (stmts.length ()); > > + > > + /* Mask the statements as we queue them up. */ > > + if (masked_loop_p) > > + for (auto stmt : stmts) > > + workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), > > + mask, stmt, &cond_gsi)); > > + else > > + workset.splice (stmts); > > + > > + while (workset.length () > 1) > > + { > > + new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc"); > > + tree arg0 = workset.pop (); > > + tree arg1 = workset.pop (); > > + new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1); > > + vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt, > > + &cond_gsi); > > + workset.quick_insert (0, new_temp); > > + } > > + } > > + else > > + new_temp = stmts[0]; > > + > > + gcc_assert (new_temp); > > + > > + tree cond = new_temp; > > + /* If we have multiple statements after reduction we should check all the > > + lanes and treat it as a full vector. */ > > + if (masked_loop_p) > > + cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond, > > + &cond_gsi); > > You didn't fix any of the code above it seems, it's still wrong. > Apologies, I hadn't realized that the last argument to get_loop_mask was the index. Should be fixed now. Is this closer to what you wanted? The individual ops are now masked with separate masks. (See testcase when N=865). Ok for master? Thanks, Tamar gcc/ChangeLog: * tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds. (vect_recog_gcond_pattern): New. (vect_vect_recog_func_ptrs): Use it. * tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without lhs. (vectorizable_early_exit): New. (vect_analyze_stmt, vect_transform_stmt): Use it. (vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-early-break_88.c: New test. --- inline copy of patch --- diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c new file mode 100644 index 0000000000000000000000000000000000000000..b64becd588973f58601196bfcb15afbe4bab60f2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c @@ -0,0 +1,36 @@ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast --param vect-partial-vector-usage=2" } */ +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 5 +#endif +float vect_a[N] = { 5.1f, 4.2f, 8.0f, 4.25f, 6.5f }; +unsigned vect_b[N] = { 0 }; + +__attribute__ ((noinline, noipa)) +unsigned test4(double x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + if (vect_a[i] > x) + break; + vect_a[i] = x; + + } + return ret; +} + +extern void abort (); + +int main () +{ + if (test4 (7.0) != 0) + abort (); + + if (vect_b[2] != 0 && vect_b[1] == 0) + abort (); +} diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..359d30b5991a50717c269df577c08adffa44e71b 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt, if (!STMT_VINFO_VECTYPE (pattern_stmt_info)) { gcc_assert (!vectype + || is_a (pattern_stmt) || (VECTOR_BOOLEAN_TYPE_P (vectype) == vect_use_mask_type_p (orig_stmt_info))); STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype; @@ -5553,6 +5554,83 @@ integer_type_for_mask (tree var, vec_info *vinfo) return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1); } +/* Function vect_recog_gcond_pattern + + Try to find pattern like following: + + if (a op b) + + where operator 'op' is not != and convert it to an adjusted boolean pattern + + mask = a op b + if (mask != 0) + + and set the mask type on MASK. + + Input: + + * STMT_VINFO: The stmt at the end from which the pattern + search begins, i.e. cast of a bool to + an integer type. + + Output: + + * TYPE_OUT: The type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the pattern. */ + +static gimple * +vect_recog_gcond_pattern (vec_info *vinfo, + stmt_vec_info stmt_vinfo, tree *type_out) +{ + gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo); + gcond* cond = NULL; + if (!(cond = dyn_cast (last_stmt))) + return NULL; + + auto lhs = gimple_cond_lhs (cond); + auto rhs = gimple_cond_rhs (cond); + auto code = gimple_cond_code (cond); + + tree scalar_type = TREE_TYPE (lhs); + if (VECTOR_TYPE_P (scalar_type)) + return NULL; + + if (code == NE_EXPR && zerop (rhs)) + return NULL; + + tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type); + if (vecitype == NULL_TREE) + return NULL; + + /* Build a scalar type for the boolean result that when vectorized matches the + vector type of the result in size and number of elements. */ + unsigned prec + = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vecitype)), + TYPE_VECTOR_SUBPARTS (vecitype)); + + scalar_type + = build_nonstandard_integer_type (prec, TYPE_UNSIGNED (scalar_type)); + + vecitype = get_vectype_for_scalar_type (vinfo, scalar_type); + if (vecitype == NULL_TREE) + return NULL; + + tree vectype = truth_type_for (vecitype); + + tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL); + gimple *new_stmt = gimple_build_assign (new_lhs, code, lhs, rhs); + append_pattern_def_seq (vinfo, stmt_vinfo, new_stmt, vectype, scalar_type); + + gimple *pattern_stmt + = gimple_build_cond (NE_EXPR, new_lhs, + build_int_cst (TREE_TYPE (new_lhs), 0), + NULL_TREE, NULL_TREE); + *type_out = vectype; + vect_pattern_detected ("vect_recog_gcond_pattern", last_stmt); + return pattern_stmt; +} + /* Function vect_recog_bool_pattern Try to find pattern like following: @@ -6860,6 +6938,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = { { vect_recog_divmod_pattern, "divmod" }, { vect_recog_mult_pattern, "mult" }, { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" }, + { vect_recog_gcond_pattern, "gcond" }, { vect_recog_bool_pattern, "bool" }, /* This must come before mask conversion, and includes the parts of mask conversion that are needed for gather and scatter diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 582c5e678fad802d6e76300fe3c939b9f2978f17..7c50ee37f2ade24eccf7a7d1ea2e00b4450023f9 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype, vec vec_oprnds0 = vNULL; vec vec_oprnds1 = vNULL; tree mask_type; - tree mask; + tree mask = NULL_TREE; if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) return false; @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype, /* Transform. */ /* Handle def. */ - lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info)); - mask = vect_create_destination_var (lhs, mask_type); + lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info)); + if (lhs) + mask = vect_create_destination_var (lhs, mask_type); vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies, rhs1, &vec_oprnds0, vectype, @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype, gimple *new_stmt; vec_rhs2 = vec_oprnds1[i]; - new_temp = make_ssa_name (mask); + if (lhs) + new_temp = make_ssa_name (mask); + else + new_temp = make_temp_ssa_name (mask_type, NULL, "cmp"); if (bitop1 == NOP_EXPR) { new_stmt = gimple_build_assign (new_temp, code, @@ -12723,6 +12727,211 @@ vectorizable_comparison (vec_info *vinfo, return true; } +/* Check to see if the current early break given in STMT_INFO is valid for + vectorization. */ + +static bool +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info, + gimple_stmt_iterator *gsi, gimple **vec_stmt, + slp_tree slp_node, stmt_vector_for_cost *cost_vec) +{ + loop_vec_info loop_vinfo = dyn_cast (vinfo); + if (!loop_vinfo + || !is_a (STMT_VINFO_STMT (stmt_info))) + return false; + + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def) + return false; + + if (!STMT_VINFO_RELEVANT_P (stmt_info)) + return false; + + DUMP_VECT_SCOPE ("vectorizable_early_exit"); + + auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info)); + + tree vectype_op0 = NULL_TREE; + slp_tree slp_op0; + tree op0; + enum vect_def_type dt0; + if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0, + &vectype_op0)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "use not simple.\n"); + return false; + } + + stmt_vec_info op0_info = vinfo->lookup_def (op0); + tree vectype = truth_type_for (STMT_VINFO_VECTYPE (op0_info)); + gcc_assert (vectype); + + machine_mode mode = TYPE_MODE (vectype); + int ncopies; + + if (slp_node) + ncopies = 1; + else + ncopies = vect_get_num_copies (loop_vinfo, vectype); + + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); + bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo); + + /* Analyze only. */ + if (!vec_stmt) + { + if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "can't vectorize early exit because the " + "target doesn't support flag setting vector " + "comparisons.\n"); + return false; + } + + if (ncopies > 1 + && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "can't vectorize early exit because the " + "target does not support boolean vector OR for " + "type %T.\n", vectype); + return false; + } + + if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi, + vec_stmt, slp_node, cost_vec)) + return false; + + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) + { + if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype, + OPTIMIZE_FOR_SPEED)) + return false; + else + vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL); + } + + + return true; + } + + /* Tranform. */ + + tree new_temp = NULL_TREE; + gimple *new_stmt = NULL; + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n"); + + if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi, + vec_stmt, slp_node, cost_vec)) + gcc_unreachable (); + + gimple *stmt = STMT_VINFO_STMT (stmt_info); + basic_block cond_bb = gimple_bb (stmt); + gimple_stmt_iterator cond_gsi = gsi_last_bb (cond_bb); + + auto_vec stmts; + + tree mask = NULL_TREE; + if (masked_loop_p) + mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0); + + if (slp_node) + stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node)); + else + { + auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info); + stmts.reserve_exact (vec_stmts.length ()); + for (auto stmt : vec_stmts) + stmts.quick_push (gimple_assign_lhs (stmt)); + } + + /* Determine if we need to reduce the final value. */ + if (stmts.length () > 1) + { + /* We build the reductions in a way to maintain as much parallelism as + possible. */ + auto_vec workset (stmts.length ()); + + /* Mask the statements as we queue them up. Normally we loop over + vec_num, but since we inspect the exact results of vectorization + we don't need to and instead can just use the stmts themselves. */ + if (masked_loop_p) + for (unsigned i = 0; i < stmts.length (); i++) + { + tree stmt_mask + = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, + i); + stmt_mask + = prepare_vec_mask (loop_vinfo, TREE_TYPE (stmt_mask), stmt_mask, + stmts[i], &cond_gsi); + workset.quick_push (stmt_mask); + } + else + workset.splice (stmts); + + while (workset.length () > 1) + { + new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc"); + tree arg0 = workset.pop (); + tree arg1 = workset.pop (); + new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1); + vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt, + &cond_gsi); + workset.quick_insert (0, new_temp); + } + } + else + new_temp = stmts[0]; + + gcc_assert (new_temp); + + tree cond = new_temp; + /* If we have multiple statements after reduction we should check all the + lanes and treat it as a full vector. */ + if (masked_loop_p) + cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond, + &cond_gsi); + + /* Now build the new conditional. Pattern gimple_conds get dropped during + codegen so we must replace the original insn. */ + stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info)); + gcond *cond_stmt = as_a (stmt); + /* When vectorizing we assume that if the branch edge is taken that we're + exiting the loop. This is not however always the case as the compiler will + rewrite conditions to always be a comparison against 0. To do this it + sometimes flips the edges. This is fine for scalar, but for vector we + then have to flip the test, as we're still assuming that if you take the + branch edge that we found the exit condition. */ + auto new_code = NE_EXPR; + tree cst = build_zero_cst (vectype); + if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo), + BRANCH_EDGE (gimple_bb (cond_stmt))->dest)) + { + new_code = EQ_EXPR; + cst = build_minus_one_cst (vectype); + } + + gimple_cond_set_condition (cond_stmt, new_code, cond, cst); + update_stmt (stmt); + + if (slp_node) + SLP_TREE_VEC_DEFS (slp_node).truncate (0); + else + STMT_VINFO_VEC_STMTS (stmt_info).truncate (0); + + + if (!slp_node) + *vec_stmt = stmt; + + return true; +} + /* If SLP_NODE is nonnull, return true if vectorizable_live_operation can handle all live statements in the node. Otherwise return true if STMT_INFO is not live or if vectorizable_live_operation can handle it. @@ -12949,7 +13158,9 @@ vect_analyze_stmt (vec_info *vinfo, || vectorizable_lc_phi (as_a (vinfo), stmt_info, NULL, node) || vectorizable_recurr (as_a (vinfo), - stmt_info, NULL, node, cost_vec)); + stmt_info, NULL, node, cost_vec) + || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node, + cost_vec)); else { if (bb_vinfo) @@ -12972,7 +13183,10 @@ vect_analyze_stmt (vec_info *vinfo, NULL, NULL, node, cost_vec) || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node, cost_vec) - || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)); + || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec) + || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node, + cost_vec)); + } if (node) @@ -13131,6 +13345,12 @@ vect_transform_stmt (vec_info *vinfo, gcc_assert (done); break; + case loop_exit_ctrl_vec_info_type: + done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt, + slp_node, NULL); + gcc_assert (done); + break; + default: if (!STMT_VINFO_LIVE_P (stmt_info)) { @@ -14321,10 +14541,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info, } else { + gcond *cond = NULL; if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info)) scalar_type = TREE_TYPE (DR_REF (dr)); else if (gimple_call_internal_p (stmt, IFN_MASK_STORE)) scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3)); + else if ((cond = dyn_cast (stmt))) + { + /* We can't convert the scalar type to boolean yet, since booleans have a + single bit precision and we need the vector boolean to be a + representation of the integer mask. So set the correct integer type and + convert to boolean vector once we have a vectype. */ + scalar_type = TREE_TYPE (gimple_cond_lhs (cond)); + } else scalar_type = TREE_TYPE (gimple_get_lhs (stmt)); @@ -14339,12 +14568,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info, "get vectype for scalar type: %T\n", scalar_type); } vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size); + if (!vectype) return opt_result::failure_at (stmt, "not vectorized:" " unsupported data-type %T\n", scalar_type); + /* If we were a gcond, convert the resulting type to a vector boolean type now + that we have the correct integer mask type. */ + if (cond) + vectype = truth_type_for (vectype); + if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype); }