From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by sourceware.org (Postfix) with ESMTPS id DBDB83858C54 for ; Wed, 6 Dec 2023 09:36:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DBDB83858C54 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DBDB83858C54 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701855416; cv=none; b=xaGmdFfLRR3IQBGyajL8xJIkbEAEimqsNt/Pwg4KKMCm8gn8t0MpwJXZF5WBu4Ht8T9pgC/SQbN/QwPm9ejfUVXDhtU80LukW7/FeqBi0RVxdq18JhDfgNPyxVt+ud+Zng7gJc7ZwdelTnw5VY0PvDAJ0mg+LnE8/Vr3G/rbLEI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701855416; c=relaxed/simple; bh=5y/PHINrZgl+roUkm4v+C1ht9ybkSEfY6uByTMo3x38=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=NnTRgSTe8NoHlb3mftVSrzw+cE3frE5RdVM/Z8sUzgYmiTMKs2EXzqJuekqhmA5NH2IZKxw8TXWRYfIK7g9dj+TTUpD/OHoivO6L8hFExxmZF/KpNNesnyJiNoJ9Q/7EoxXX9cPsq9f63E2dwUDCuZR74KnVHqnl/mag0lvf0t8= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from [10.168.4.150] (unknown [10.168.4.150]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id E26E121ECC; Wed, 6 Dec 2023 09:36:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1701855411; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=8EqgfxrqE3WWMhe8xh/mWgX632mnUjZzuWOeDdweL50=; b=zeW92+OFaFhKIOo59GN92NnTkDTgnYN1HAO3GPkTQnMsvI9dVZOEQYVv8deeBwZnMMrlZz 0KlVdtv2KP/DGfPX3GKGoer/FOcYHSTTZirLp8oCUQKFRn9fyyEeozNZ9c0TryrW7dCmsA ERGqqtPgMiflJRNna+XHzOQ2twhMfdU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1701855411; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=8EqgfxrqE3WWMhe8xh/mWgX632mnUjZzuWOeDdweL50=; b=f0iahOeT6Wvj42PhCAhRb8nEM7VZZH/SjAh7DFlYlwbcl35fWLvc00JayiFEAG368wAgAU 0psVQh8SZRBOLICA== Date: Wed, 6 Dec 2023 10:33:06 +0100 (CET) From: Richard Biener To: Tamar Christina cc: "gcc-patches@gcc.gnu.org" , nd , "jlaw@ventanamicro.com" Subject: RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits In-Reply-To: Message-ID: <337r0po6-79qo-6r13-7o32-1n9796430872@fhfr.qr> References: <3p13osn9-n4qp-1s6r-545q-r1or36n8s23q@fhfr.qr> <024530p2-q575-onnr-5696-sq90p6520o29@fhfr.qr> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spamd-Result: default: False [-4.30 / 50.00]; ARC_NA(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; NEURAL_HAM_LONG(-1.00)[-1.000]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; NEURAL_HAM_SHORT(-0.20)[-0.995]; DBL_BLOCKED_OPENRESOLVER(0.00)[tree-vect-loop.cc:url,suse.de:email]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_COUNT_ZERO(0.00)[0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; BAYES_HAM(-3.00)[100.00%] X-Spam-Score: -4.30 Authentication-Results: smtp-out1.suse.de; none X-Spam-Level: X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_LOTSOFHASH,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, 6 Dec 2023, Tamar Christina wrote: > > > > is the exit edge you are looking for without iterating over all loop exits. > > > > > > > > > + gimple *tmp_vec_stmt = vec_stmt; > > > > > + tree tmp_vec_lhs = vec_lhs; > > > > > + tree tmp_bitstart = bitstart; > > > > > + /* For early exit where the exit is not in the BB that leads > > > > > + to the latch then we're restarting the iteration in the > > > > > + scalar loop. So get the first live value. */ > > > > > + restart_loop = restart_loop || exit_e != main_e; > > > > > + if (restart_loop) > > > > > + { > > > > > + tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0]; > > > > > + tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt); > > > > > + tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart)); > > > > > > > > Hmm, that gets you the value after the first iteration, not the one before which > > > > would be the last value of the preceeding vector iteration? > > > > (but we don't keep those, we'd need a PHI) > > > > > > I don't fully follow. The comment on top of this hunk under if (loop_vinfo) states > > > that lhs should be pointing to a PHI. > > > > > > When I inspect the statement I see > > > > > > i_14 = PHI > > > > > > so i_14 is the value at the start of the current iteration. If we're coming from the > > > header 0, otherwise i_11 which is the value of the previous iteration? > > > > > > The peeling code explicitly leaves i_14 in the merge block and not i_11 for this > > exact reason. > > > So I'm confused, my understanding is that we're already *at* the right PHI. > > > > > > Is it perhaps that you thought we put i_11 here for the early exits? In which case > > > Yes I'd agree that that would be wrong, and there we would have had to look at > > > The defs, but i_11 is the def. > > > > > > I already kept this in mind and leveraged peeling to make this part easier. > > > i_11 is used in the main exit and i_14 in the early one. > > > > I think the important detail is that this code is only executed for > > vect_induction_defs which are indeed PHIs and so we're sure the > > value live is before any modification so fine to feed as initial > > value for the PHI in the epilog. > > > > Maybe we can assert the def type here? > > We can't assert because until cfg cleanup the dead value is still seen and still > vectorized. That said I've added a guard here. We vectorize the non-induction > value as normal now and if it's ever used it'll fail. > > > > > > > > > > > Why again do we need (non-induction) live values from the vector loop to the > > > > epilogue loop again? > > > > > > They can appear as the result value of the main exit. > > > > > > e.g. in testcase (vect-early-break_17.c) > > > > > > #define N 1024 > > > unsigned vect_a[N]; > > > unsigned vect_b[N]; > > > > > > unsigned test4(unsigned x) > > > { > > > unsigned ret = 0; > > > for (int i = 0; i < N; i++) > > > { > > > vect_b[i] = x + i; > > > if (vect_a[i] > x) > > > return vect_a[i]; > > > vect_a[i] = x; > > > ret = vect_a[i] + vect_b[i]; > > > } > > > return ret; > > > } > > > > > > The only situation they can appear in the as an early-break is when > > > we have a case where main exit != latch connected exit. > > > > > > However in these cases they are unused, and only there because > > > normally you would have exited (i.e. there was a return) but the > > > vector loop needs to start over so we ignore it. > > > > > > These happen in testcase vect-early-break_74.c and > > > vect-early-break_78.c > > > > Hmm, so in that case their value is incorrect (but doesn't matter, > > we ignore it)? > > > > Correct, they're placed there due to exit redirection, but in these inverted > testcases where we've peeled the vector iteration you can't ever skip the > epilogue. So they are guaranteed not to be used. > > > > > > + gimple_stmt_iterator exit_gsi; > > > > > + tree new_tree > > > > > + = vectorizable_live_operation_1 (loop_vinfo, stmt_info, > > > > > + exit_e, vectype, ncopies, > > > > > + slp_node, bitsize, > > > > > + tmp_bitstart, tmp_vec_lhs, > > > > > + lhs_type, restart_loop, > > > > > + &exit_gsi); > > > > > + > > > > > + /* Use the empty block on the exit to materialize the new > > > > stmts > > > > > + so we can use update the PHI here. */ > > > > > + if (gimple_phi_num_args (use_stmt) == 1) > > > > > + { > > > > > + auto gsi = gsi_for_stmt (use_stmt); > > > > > + remove_phi_node (&gsi, false); > > > > > + tree lhs_phi = gimple_phi_result (use_stmt); > > > > > + gimple *copy = gimple_build_assign (lhs_phi, new_tree); > > > > > + gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT); > > > > > + } > > > > > + else > > > > > + SET_PHI_ARG_DEF (use_stmt, dest_e->dest_idx, new_tree); > > > > > > > > if the else case works, why not use it always? > > > > > > Because it doesn't work for main exit. The early exit have a intermediate block > > > that is used to generate the statements on, so for them we are fine updating the > > > use in place. > > > > > > The main exits don't. and so the existing trick the vectorizer uses is to materialize > > > the statements in the same block and then dissolves the phi node. However you > > > can't do that for the early exit because the phi node isn't singular. > > > > But if the PHI has a single arg you can replace that? By making a > > copy stmt from it don't you break LC SSA? > > > > Yeah, what the existing code is sneakily doing is this: > > It has to vectorize > > x = PHI > y gets vectorized a z but > > x = PHI > z = ... > > would be invalid, so what it does, since it doesn't have a predecessor note to place stuff in, > it'll do > > z = ... > x = z > > and removed the PHI. The PHI was only placed there for vectorization so it's not needed > after this point. It's also for this reason why the code passes around a gimpe_seq since > it needs to make sure it gets the order right when inserting statements. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? OK. > Thanks, > Tamar > > gcc/ChangeLog: > > * tree-vect-loop.cc (vectorizable_live_operation, > vectorizable_live_operation_1): Support early exits. > (can_vectorize_live_stmts): Call vectorizable_live_operation for non-live > inductions or reductions. > (find_connected_edge, vect_get_vect_def): New. > (vect_create_epilog_for_reduction): Support reductions in early break. > * tree-vect-stmts.cc (perm_mask_for_reverse): Expose. > (vect_stmt_relevant_p): Mark all inductions when early break as being > live. > * tree-vectorizer.h (perm_mask_for_reverse): Expose. > > --- inline copy of patch --- > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > index f38cc47551488525b15c2be758cac8291dbefb3a..4e48217a31e59318c2ea8e5ab63b06ba19840cbd 100644 > --- a/gcc/tree-vect-loop-manip.cc > +++ b/gcc/tree-vect-loop-manip.cc > @@ -3346,6 +3346,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, > bb_before_epilog->count = single_pred_edge (bb_before_epilog)->count (); > bb_before_epilog = loop_preheader_edge (epilog)->src; > } > + > /* If loop is peeled for non-zero constant times, now niters refers to > orig_niters - prolog_peeling, it won't overflow even the orig_niters > overflows. */ > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index df5e1d28fac2ce35e71decdec0d8e31fb75557f5..2f922b42f6d567dfd5da9b276b1c9d37bc681876 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -5831,6 +5831,34 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code, > return new_temp; > } > > +/* Retrieves the definining statement to be used for a reduction. > + For MAIN_EXIT_P we use the current VEC_STMTs and otherwise we look at > + the reduction definitions. */ > + > +tree > +vect_get_vect_def (stmt_vec_info reduc_info, slp_tree slp_node, > + slp_instance slp_node_instance, bool main_exit_p, unsigned i, > + vec &vec_stmts) > +{ > + tree def; > + > + if (slp_node) > + { > + if (!main_exit_p) > + slp_node = slp_node_instance->reduc_phis; > + def = vect_get_slp_vect_def (slp_node, i); > + } > + else > + { > + if (!main_exit_p) > + reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (reduc_info)); > + vec_stmts = STMT_VINFO_VEC_STMTS (reduc_info); > + def = gimple_get_lhs (vec_stmts[0]); > + } > + > + return def; > +} > + > /* Function vect_create_epilog_for_reduction > > Create code at the loop-epilog to finalize the result of a reduction > @@ -5842,6 +5870,8 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code, > SLP_NODE_INSTANCE is the SLP node instance containing SLP_NODE > REDUC_INDEX says which rhs operand of the STMT_INFO is the reduction phi > (counting from 0) > + LOOP_EXIT is the edge to update in the merge block. In the case of a single > + exit this edge is always the main loop exit. > > This function: > 1. Completes the reduction def-use cycles. > @@ -5882,7 +5912,8 @@ static void > vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, > stmt_vec_info stmt_info, > slp_tree slp_node, > - slp_instance slp_node_instance) > + slp_instance slp_node_instance, > + edge loop_exit) > { > stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info); > gcc_assert (reduc_info->is_reduc_info); > @@ -5891,6 +5922,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, > loop-closed PHI of the inner loop which we remember as > def for the reduction PHI generation. */ > bool double_reduc = false; > + bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit; > stmt_vec_info rdef_info = stmt_info; > if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def) > { > @@ -6053,7 +6085,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, > /* Create an induction variable. */ > gimple_stmt_iterator incr_gsi; > bool insert_after; > - standard_iv_increment_position (loop, &incr_gsi, &insert_after); > + vect_iv_increment_position (loop_exit, &incr_gsi, &insert_after); > create_iv (series_vect, PLUS_EXPR, vec_step, NULL_TREE, loop, &incr_gsi, > insert_after, &indx_before_incr, &indx_after_incr); > > @@ -6132,23 +6164,23 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, > Store them in NEW_PHIS. */ > if (double_reduc) > loop = outer_loop; > - exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; > + /* We need to reduce values in all exits. */ > + exit_bb = loop_exit->dest; > exit_gsi = gsi_after_labels (exit_bb); > reduc_inputs.create (slp_node ? vec_num : ncopies); > + vec vec_stmts; > for (unsigned i = 0; i < vec_num; i++) > { > gimple_seq stmts = NULL; > - if (slp_node) > - def = vect_get_slp_vect_def (slp_node, i); > - else > - def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]); > + def = vect_get_vect_def (rdef_info, slp_node, slp_node_instance, > + main_exit_p, i, vec_stmts); > for (j = 0; j < ncopies; j++) > { > tree new_def = copy_ssa_name (def); > phi = create_phi_node (new_def, exit_bb); > if (j) > - def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]); > - SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def); > + def = gimple_get_lhs (vec_stmts[j]); > + SET_PHI_ARG_DEF (phi, loop_exit->dest_idx, def); > new_def = gimple_convert (&stmts, vectype, new_def); > reduc_inputs.quick_push (new_def); > } > @@ -10481,17 +10513,18 @@ vectorizable_induction (loop_vec_info loop_vinfo, > return true; > } > > - > /* Function vectorizable_live_operation_1. > + > helper function for vectorizable_live_operation. */ > + > tree > vectorizable_live_operation_1 (loop_vec_info loop_vinfo, > - stmt_vec_info stmt_info, edge exit_e, > + stmt_vec_info stmt_info, basic_block exit_bb, > tree vectype, int ncopies, slp_tree slp_node, > tree bitsize, tree bitstart, tree vec_lhs, > - tree lhs_type, gimple_stmt_iterator *exit_gsi) > + tree lhs_type, bool restart_loop, > + gimple_stmt_iterator *exit_gsi) > { > - basic_block exit_bb = exit_e->dest; > gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS (loop_vinfo)); > > tree vec_lhs_phi = copy_ssa_name (vec_lhs); > @@ -10504,7 +10537,9 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo, > if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) > { > /* Emit: > + > SCALAR_RES = VEC_EXTRACT > + > where VEC_LHS is the vectorized live-out result and MASK is > the loop mask for the final iteration. */ > gcc_assert (ncopies == 1 && !slp_node); > @@ -10513,15 +10548,18 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo, > tree len = vect_get_loop_len (loop_vinfo, &gsi, > &LOOP_VINFO_LENS (loop_vinfo), > 1, vectype, 0, 0); > + > /* BIAS - 1. */ > signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); > tree bias_minus_one > = int_const_binop (MINUS_EXPR, > build_int_cst (TREE_TYPE (len), biasval), > build_one_cst (TREE_TYPE (len))); > + > /* LAST_INDEX = LEN + (BIAS - 1). */ > tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len), > len, bias_minus_one); > + > /* This needs to implement extraction of the first index, but not sure > how the LEN stuff works. At the moment we shouldn't get here since > there's no LEN support for early breaks. But guard this so there's > @@ -10532,13 +10570,16 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo, > tree scalar_res > = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype), > vec_lhs_phi, last_index); > + > /* Convert the extracted vector element to the scalar type. */ > new_tree = gimple_convert (&stmts, lhs_type, scalar_res); > } > else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) > { > /* Emit: > + > SCALAR_RES = EXTRACT_LAST > + > where VEC_LHS is the vectorized live-out result and MASK is > the loop mask for the final iteration. */ > gcc_assert (!slp_node); > @@ -10548,10 +10589,38 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo, > tree mask = vect_get_loop_mask (loop_vinfo, &gsi, > &LOOP_VINFO_MASKS (loop_vinfo), > 1, vectype, 0); > + tree scalar_res; > + > + /* For an inverted control flow with early breaks we want EXTRACT_FIRST > + instead of EXTRACT_LAST. Emulate by reversing the vector and mask. */ > + if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) > + { > + /* First create the permuted mask. */ > + tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask)); > + tree perm_dest = copy_ssa_name (mask); > + gimple *perm_stmt > + = gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask, > + mask, perm_mask); > + vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt, > + &gsi); > + mask = perm_dest; > + > + /* Then permute the vector contents. */ > + tree perm_elem = perm_mask_for_reverse (vectype); > + perm_dest = copy_ssa_name (vec_lhs_phi); > + perm_stmt > + = gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi, > + vec_lhs_phi, perm_elem); > + vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt, > + &gsi); > + vec_lhs_phi = perm_dest; > + } > > gimple_seq_add_seq (&stmts, tem); > - tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type, > - mask, vec_lhs_phi); > + > + scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type, > + mask, vec_lhs_phi); > + > /* Convert the extracted vector element to the scalar type. */ > new_tree = gimple_convert (&stmts, lhs_type, scalar_res); > } > @@ -10564,12 +10633,26 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo, > new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree), > &stmts, true, NULL_TREE); > } > + > *exit_gsi = gsi_after_labels (exit_bb); > if (stmts) > gsi_insert_seq_before (exit_gsi, stmts, GSI_SAME_STMT); > + > return new_tree; > } > > +/* Find the edge that's the final one in the path from SRC to DEST and > + return it. This edge must exist in at most one forwarder edge between. */ > + > +static edge > +find_connected_edge (edge src, basic_block dest) > +{ > + if (src->dest == dest) > + return src; > + > + return find_edge (src->dest, dest); > +} > + > /* Function vectorizable_live_operation. > > STMT_INFO computes a value that is used outside the loop. Check if > @@ -10590,11 +10673,13 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info, > poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); > int ncopies; > gimple *use_stmt; > + use_operand_p use_p; > auto_vec vec_oprnds; > int vec_entry = 0; > poly_uint64 vec_index = 0; > > - gcc_assert (STMT_VINFO_LIVE_P (stmt_info)); > + gcc_assert (STMT_VINFO_LIVE_P (stmt_info) > + || LOOP_VINFO_EARLY_BREAKS (loop_vinfo)); > > /* If a stmt of a reduction is live, vectorize it via > vect_create_epilog_for_reduction. vectorizable_reduction assessed > @@ -10619,8 +10704,25 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info, > if (STMT_VINFO_REDUC_TYPE (reduc_info) == FOLD_LEFT_REDUCTION > || STMT_VINFO_REDUC_TYPE (reduc_info) == EXTRACT_LAST_REDUCTION) > return true; > + > vect_create_epilog_for_reduction (loop_vinfo, stmt_info, slp_node, > - slp_node_instance); > + slp_node_instance, > + LOOP_VINFO_IV_EXIT (loop_vinfo)); > + > + /* If early break we only have to materialize the reduction on the merge > + block, but we have to find an alternate exit first. */ > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) > + { > + for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo))) > + if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo)) > + { > + vect_create_epilog_for_reduction (loop_vinfo, stmt_info, > + slp_node, slp_node_instance, > + exit); > + break; > + } > + } > + > return true; > } > > @@ -10772,37 +10874,62 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info, > lhs' = new_tree; */ > > class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > - basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; > - gcc_assert (single_pred_p (exit_bb)); > - > - tree vec_lhs_phi = copy_ssa_name (vec_lhs); > - gimple *phi = create_phi_node (vec_lhs_phi, exit_bb); > - SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs); > - > - gimple_stmt_iterator exit_gsi; > - tree new_tree > - = vectorizable_live_operation_1 (loop_vinfo, stmt_info, > - LOOP_VINFO_IV_EXIT (loop_vinfo), > - vectype, ncopies, slp_node, bitsize, > - bitstart, vec_lhs, lhs_type, > - &exit_gsi); > - > - /* Remove existing phis that copy from lhs and create copies > - from new_tree. */ > - gimple_stmt_iterator gsi; > - for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);) > - { > - gimple *phi = gsi_stmt (gsi); > - if ((gimple_phi_arg_def (phi, 0) == lhs)) > + /* Check if we have a loop where the chosen exit is not the main exit, > + in these cases for an early break we restart the iteration the vector code > + did. For the live values we want the value at the start of the iteration > + rather than at the end. */ > + edge main_e = LOOP_VINFO_IV_EXIT (loop_vinfo); > + bool restart_loop = LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo); > + FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs) > + if (!is_gimple_debug (use_stmt) > + && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))) > + FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) > { > - remove_phi_node (&gsi, false); > - tree lhs_phi = gimple_phi_result (phi); > - gimple *copy = gimple_build_assign (lhs_phi, new_tree); > - gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT); > - } > - else > - gsi_next (&gsi); > - } > + edge e = gimple_phi_arg_edge (as_a (use_stmt), > + phi_arg_index_from_use (use_p)); > + bool main_exit_edge = e == main_e > + || find_connected_edge (main_e, e->src); > + > + /* Early exits have an merge block, we want the merge block itself > + so use ->src. For main exit the merge block is the > + destination. */ > + basic_block dest = main_exit_edge ? main_e->dest : e->src; > + gimple *tmp_vec_stmt = vec_stmt; > + tree tmp_vec_lhs = vec_lhs; > + tree tmp_bitstart = bitstart; > + > + /* For early exit where the exit is not in the BB that leads > + to the latch then we're restarting the iteration in the > + scalar loop. So get the first live value. */ > + restart_loop = restart_loop || !main_exit_edge; > + if (restart_loop > + && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def) > + { > + tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0]; > + tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt); > + tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart)); > + } > + > + gimple_stmt_iterator exit_gsi; > + tree new_tree > + = vectorizable_live_operation_1 (loop_vinfo, stmt_info, > + dest, vectype, ncopies, > + slp_node, bitsize, > + tmp_bitstart, tmp_vec_lhs, > + lhs_type, restart_loop, > + &exit_gsi); > + > + if (gimple_phi_num_args (use_stmt) == 1) > + { > + auto gsi = gsi_for_stmt (use_stmt); > + remove_phi_node (&gsi, false); > + tree lhs_phi = gimple_phi_result (use_stmt); > + gimple *copy = gimple_build_assign (lhs_phi, new_tree); > + gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT); > + } > + else > + SET_PHI_ARG_DEF (use_stmt, e->dest_idx, new_tree); > + } > > /* There a no further out-of-loop uses of lhs by LC-SSA construction. */ > FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs) > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index b3a09c0a804a38e17ef32b6ce13b98b077459fc7..582c5e678fad802d6e76300fe3c939b9f2978f17 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -342,6 +342,7 @@ is_simple_and_all_uses_invariant (stmt_vec_info stmt_info, > - it has uses outside the loop. > - it has vdefs (it alters memory). > - control stmts in the loop (except for the exit condition). > + - it is an induction and we have multiple exits. > > CHECKME: what other side effects would the vectorizer allow? */ > > @@ -399,6 +400,19 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, > } > } > > + /* Check if it's an induction and multiple exits. In this case there will be > + a usage later on after peeling which is needed for the alternate exit. */ > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo) > + && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, > + "vec_stmt_relevant_p: induction forced for " > + "early break.\n"); > + *live_p = true; > + > + } > + > if (*live_p && *relevant == vect_unused_in_scope > && !is_simple_and_all_uses_invariant (stmt_info, loop_vinfo)) > { > @@ -1774,7 +1788,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info) > /* If the target supports a permute mask that reverses the elements in > a vector of type VECTYPE, return that mask, otherwise return null. */ > > -static tree > +tree > perm_mask_for_reverse (tree vectype) > { > poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); > @@ -12720,20 +12734,27 @@ can_vectorize_live_stmts (vec_info *vinfo, stmt_vec_info stmt_info, > bool vec_stmt_p, > stmt_vector_for_cost *cost_vec) > { > + loop_vec_info loop_vinfo = dyn_cast (vinfo); > if (slp_node) > { > stmt_vec_info slp_stmt_info; > unsigned int i; > FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, slp_stmt_info) > { > - if (STMT_VINFO_LIVE_P (slp_stmt_info) > + if ((STMT_VINFO_LIVE_P (slp_stmt_info) > + || (loop_vinfo > + && LOOP_VINFO_EARLY_BREAKS (loop_vinfo) > + && STMT_VINFO_DEF_TYPE (slp_stmt_info) > + == vect_induction_def)) > && !vectorizable_live_operation (vinfo, slp_stmt_info, slp_node, > slp_node_instance, i, > vec_stmt_p, cost_vec)) > return false; > } > } > - else if (STMT_VINFO_LIVE_P (stmt_info) > + else if ((STMT_VINFO_LIVE_P (stmt_info) > + || (LOOP_VINFO_EARLY_BREAKS (loop_vinfo) > + && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)) > && !vectorizable_live_operation (vinfo, stmt_info, > slp_node, slp_node_instance, -1, > vec_stmt_p, cost_vec)) > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > index 15c7f75b1f3c61ab469f1b1970dae9c6ac1a9f55..974f617d54a14c903894dd20d60098ca259c96f2 100644 > --- a/gcc/tree-vectorizer.h > +++ b/gcc/tree-vectorizer.h > @@ -2248,6 +2248,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree, > enum vect_def_type *, > tree *, stmt_vec_info * = NULL); > extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree); > +extern tree perm_mask_for_reverse (tree); > extern bool supportable_widening_operation (vec_info*, code_helper, > stmt_vec_info, tree, tree, > code_helper*, code_helper*, > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)