From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 4342A3858C33 for ; Tue, 10 Oct 2023 11:13:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4342A3858C33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 4114F211C5; Tue, 10 Oct 2023 11:13:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1696936438; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=3x2PddZonGBMzV41l+ASlSwO50s7x4Fwj1sxSAPs77w=; b=rJss3CSFmGSB1zi4OLTbk+48go5waLy6MA0wi10NSY8pc3ZAGwq/VsXTFNAL6/ohBn+mFr t+btJNK4y+TkMYfbcwPov/I3KUCI4AFzMTzPMc2uD9YtvxMoJChRBlvZ0EP7qWiyJy43hm mfmMrV1ihwGAbOwziC4tt+zD/kDa8Go= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1696936438; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=3x2PddZonGBMzV41l+ASlSwO50s7x4Fwj1sxSAPs77w=; b=tPH9EOMKEF3WVVA5c6O3aA6My/GniaQUm2fQ6IbCCzJS1ALAIUP376yFSPjLl8AUwVu7Rm NG3usLsmtgS/uwBw== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 09D6D2C417; Tue, 10 Oct 2023 11:13:57 +0000 (UTC) Date: Tue, 10 Oct 2023 11:13:58 +0000 (UTC) From: Richard Biener To: Tamar Christina cc: gcc-patches@gcc.gnu.org, nd@arm.com, jlaw@ventanamicro.com Subject: Re: [PATCH 2/3]middle-end: updated niters analysis to handle multiple exits. In-Reply-To: Message-ID: References: User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_LOTSOFHASH,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 2 Oct 2023, Tamar Christina wrote: > Hi All, > > This second part updates niters analysis to be able to analyze any number of > exits. If we have multiple exits we determine the main exit by finding the > first counting IV. > > The change allows the vectorizer to pass analysis for multiple loops, but we > later gracefully reject them. It does however allow us to test if the exit > handling is using the right exit everywhere. > > Additionally since we analyze all exits, we now return all conditions for them > and determine which condition belongs to the main exit. > > The main condition is needed because the vectorizer needs to ignore the main IV > condition during vectorization as it will replace it during codegen. > > To track versioned loops we extend the contract between ifcvt and the vectorizer > to store the exit number in aux so that we can match it up again during peeling. > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu, and > no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * tree-if-conv.cc (tree_if_conversion): Record exits in aux. > * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Use > it. > * tree-vect-loop.cc (vect_get_loop_niters): Determine main exit. > (vec_init_loop_exit_info): Extend analysis when multiple exits. > (vect_analyze_loop_form): Record conds and determine main cond. > (vect_create_loop_vinfo): Extend bookkeeping of conds. > (vect_analyze_loop): Release conds. > * tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS, > LOOP_VINFO_LOOP_IV_COND): New. > (struct vect_loop_form_info): Add conds, alt_loop_conds; > (struct loop_vec_info): Add conds, loop_iv_cond. > > --- inline copy of patch -- > diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc > index 799f071965e5c41eb352b5530cf1d9c7ecf7bf25..3dc2290467797ebbfcef55903531b22829f4fdbd 100644 > --- a/gcc/tree-if-conv.cc > +++ b/gcc/tree-if-conv.cc > @@ -3795,6 +3795,13 @@ tree_if_conversion (class loop *loop, vec *preds) > } > if (need_to_ifcvt) > { > + /* Before we rewrite edges we'll record their original position in the > + edge map such that we can map the edges between the ifcvt and the > + non-ifcvt loop during peeling. */ > + uintptr_t idx = 0; > + for (edge exit : get_loop_exit_edges (loop)) > + exit->aux = (void*)idx++; > + > /* Now all statements are if-convertible. Combine all the basic > blocks into one huge basic block doing the if-conversion > on-the-fly. */ > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > index e06717272aafc6d31cbdcb94840ac25de616da6d..77f8e668bcc8beca99ba4052e1b12e0d17300262 100644 > --- a/gcc/tree-vect-loop-manip.cc > +++ b/gcc/tree-vect-loop-manip.cc > @@ -1470,6 +1470,18 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit, > scalar_loop = loop; > scalar_exit = loop_exit; > } > + else if (scalar_loop == loop) > + scalar_exit = loop_exit; > + else > + { > + /* Loop has been version, match exits up using the aux index. */ > + for (edge exit : get_loop_exit_edges (scalar_loop)) > + if (exit->aux == loop_exit->aux) > + { > + scalar_exit = exit; > + break; > + } > + } > > bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1); > pbbs = bbs + 1; > @@ -1501,6 +1513,8 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit, > exit = loop_exit; > basic_block new_preheader = new_bbs[0]; > > + /* Record the new loop exit information. new_loop doesn't have SCEV data and > + so we must initialize the exit information. */ > if (new_e) > *new_e = new_exit; > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index 6e60d84143626a8e1d801bb580f4dcebc73c7ba7..f1caa5f207d3b13da58c3a313b11d1ef98374349 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -851,79 +851,106 @@ vect_fixup_scalar_cycles_with_patterns (loop_vec_info loop_vinfo) > in NUMBER_OF_ITERATIONSM1. Place the condition under which the > niter information holds in ASSUMPTIONS. > > - Return the loop exit condition. */ > + Return the loop exit conditions. */ > > > -static gcond * > -vect_get_loop_niters (class loop *loop, edge exit, tree *assumptions, > +static vec > +vect_get_loop_niters (class loop *loop, tree *assumptions, const_edge main_exit, > tree *number_of_iterations, tree *number_of_iterationsm1) Any reason you swap exit and main_exit? IMHO the input better pairs with the other input 'loop'. > { > + auto_vec exits = get_loop_exit_edges (loop); > + vec conds; > + conds.create (exits.length ()); > class tree_niter_desc niter_desc; > tree niter_assumptions, niter, may_be_zero; > - gcond *cond = get_loop_exit_condition (loop); > > *assumptions = boolean_true_node; > *number_of_iterationsm1 = chrec_dont_know; > *number_of_iterations = chrec_dont_know; > + > DUMP_VECT_SCOPE ("get_loop_niters"); > > - if (!exit) > - return cond; > + if (exits.is_empty ()) > + return conds; > + > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, "Loop has %d exits.\n", > + exits.length ()); > + > + edge exit; > + unsigned int i; > + FOR_EACH_VEC_ELT (exits, i, exit) > + { > + gcond *cond = get_loop_exit_condition (exit); > + if (cond) > + conds.safe_push (cond); > + > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, "Analyzing exit %d...\n", i); > > - may_be_zero = NULL_TREE; > - if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL) > - || chrec_contains_undetermined (niter_desc.niter)) > - return cond; > + may_be_zero = NULL_TREE; > + if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL) > + || chrec_contains_undetermined (niter_desc.niter)) > + continue; > > - niter_assumptions = niter_desc.assumptions; > - may_be_zero = niter_desc.may_be_zero; > - niter = niter_desc.niter; > + niter_assumptions = niter_desc.assumptions; > + may_be_zero = niter_desc.may_be_zero; > + niter = niter_desc.niter; > > - if (may_be_zero && integer_zerop (may_be_zero)) > - may_be_zero = NULL_TREE; > + if (may_be_zero && integer_zerop (may_be_zero)) > + may_be_zero = NULL_TREE; > > - if (may_be_zero) > - { > - if (COMPARISON_CLASS_P (may_be_zero)) > + if (may_be_zero) > { > - /* Try to combine may_be_zero with assumptions, this can simplify > - computation of niter expression. */ > - if (niter_assumptions && !integer_nonzerop (niter_assumptions)) > - niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, > - niter_assumptions, > - fold_build1 (TRUTH_NOT_EXPR, > - boolean_type_node, > - may_be_zero)); > + if (COMPARISON_CLASS_P (may_be_zero)) > + { > + /* Try to combine may_be_zero with assumptions, this can simplify > + computation of niter expression. */ > + if (niter_assumptions && !integer_nonzerop (niter_assumptions)) > + niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, > + niter_assumptions, > + fold_build1 (TRUTH_NOT_EXPR, > + boolean_type_node, > + may_be_zero)); > + else > + niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero, > + build_int_cst (TREE_TYPE (niter), 0), > + rewrite_to_non_trapping_overflow (niter)); > + > + may_be_zero = NULL_TREE; > + } > + else if (integer_nonzerop (may_be_zero) && exit == main_exit) > + { > + *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0); > + *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1); > + continue; > + } > else > - niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero, > - build_int_cst (TREE_TYPE (niter), 0), > - rewrite_to_non_trapping_overflow (niter)); > + continue; > + } > > - may_be_zero = NULL_TREE; > - } > - else if (integer_nonzerop (may_be_zero)) > + /* Loop assumptions are based off the normal exit. */ > + if (exit == main_exit) It's a bit hard to follow in patch form but I wonder why you even analyze the number of iterations of the non-main exits riskying possibly clobbering the *number_* outputs which we later assume to be for the main exit? > { > - *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0); > - *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1); > - return cond; > + *assumptions = niter_assumptions; > + *number_of_iterationsm1 = niter; > + > + /* We want the number of loop header executions which is the number > + of latch executions plus one. > + ??? For UINT_MAX latch executions this number overflows to zero > + for loops like do { n++; } while (n != 0); */ > + if (niter && !chrec_contains_undetermined (niter)) > + niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), > + unshare_expr (niter), > + build_int_cst (TREE_TYPE (niter), 1)); > + *number_of_iterations = niter; > } > - else > - return cond; > } > > - *assumptions = niter_assumptions; > - *number_of_iterationsm1 = niter; > - > - /* We want the number of loop header executions which is the number > - of latch executions plus one. > - ??? For UINT_MAX latch executions this number overflows to zero > - for loops like do { n++; } while (n != 0); */ > - if (niter && !chrec_contains_undetermined (niter)) > - niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), unshare_expr (niter), > - build_int_cst (TREE_TYPE (niter), 1)); > - *number_of_iterations = niter; > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, "All loop exits successfully analyzed.\n"); > > - return cond; > + return conds; > } > > /* Determine the main loop exit for the vectorizer. */ > @@ -936,8 +963,25 @@ vec_init_loop_exit_info (class loop *loop) > auto_vec exits = get_loop_exit_edges (loop); > if (exits.length () == 1) > return exits[0]; > - else > - return NULL; > + > + /* If we have multiple exits we only support counting IV at the moment. Analyze > + all exits and return one */ > + class tree_niter_desc niter_desc; > + edge candidate = NULL; > + for (edge exit : exits) > + { > + if (!get_loop_exit_condition (exit)) > + continue; > + > + if (number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL) > + && !chrec_contains_undetermined (niter_desc.niter)) > + { > + if (!niter_desc.may_be_zero || !candidate) > + candidate = exit; > + } > + } > + > + return candidate; > } > > /* Function bb_in_loop_p > @@ -1788,21 +1832,31 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info) > "not vectorized: latch block not empty.\n"); > > /* Make sure the exit is not abnormal. */ > - edge e = single_exit (loop); > - if (e->flags & EDGE_ABNORMAL) > + if (exit_e->flags & EDGE_ABNORMAL) > return opt_result::failure_at (vect_location, > "not vectorized:" > " abnormal loop exit edge.\n"); > > - info->loop_cond > - = vect_get_loop_niters (loop, e, &info->assumptions, > + info->conds > + = vect_get_loop_niters (loop, &info->assumptions, exit_e, > &info->number_of_iterations, > &info->number_of_iterationsm1); > - if (!info->loop_cond) > + > + if (info->conds.is_empty ()) > return opt_result::failure_at > (vect_location, > "not vectorized: complicated exit condition.\n"); > > + /* Determine what the primary and alternate exit conds are. */ > + info->alt_loop_conds.create (info->conds.length () - 1); > + for (gcond *cond : info->conds) > + { > + if (exit_e->src != gimple_bb (cond)) > + info->alt_loop_conds.quick_push (cond); > + else > + info->loop_cond = cond; > + } > + IMHO it would be simpler to have the primary exit condition in info->conds[0] and the rest after that? That avoids having two arrays and one scalar in vect_loop_form_info. > if (integer_zerop (info->assumptions) > || !info->number_of_iterations > || chrec_contains_undetermined (info->number_of_iterations)) > @@ -1847,8 +1901,13 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared, > if (!integer_onep (info->assumptions) && !main_loop_info) > LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions; > > - stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (info->loop_cond); > - STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type; > + for (gcond *cond : info->conds) > + { > + stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (cond); > + STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type; > + } > + LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->alt_loop_conds); > + LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond; > LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit; > > @@ -3594,7 +3653,11 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) > && LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo) > && !loop->simduid); > if (!vect_epilogues) > - return first_loop_vinfo; > + { > + loop_form_info.conds.release (); > + loop_form_info.alt_loop_conds.release (); > + return first_loop_vinfo; > + } I think there's 'inner' where you leak these. Maybe use auto_vec<> in vect_loop_form_info instead? Otherwise looks OK. Thanks, Richard. > /* Now analyze first_loop_vinfo for epilogue vectorization. */ > poly_uint64 lowest_th = LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo); > @@ -3694,6 +3757,9 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) > (first_loop_vinfo->epilogue_vinfos[0]->vector_mode)); > } > > + loop_form_info.conds.release (); > + loop_form_info.alt_loop_conds.release (); > + > return first_loop_vinfo; > } > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > index afa7a8e30891c782a0e5e3740ecc4377f5a31e54..55b6771b271d5072fa1327d595e1dddb112cfdf6 100644 > --- a/gcc/tree-vectorizer.h > +++ b/gcc/tree-vectorizer.h > @@ -882,6 +882,12 @@ public: > we need to peel off iterations at the end to form an epilogue loop. */ > bool peeling_for_niter; > > + /* List of loop additional IV conditionals found in the loop. */ > + auto_vec conds; > + > + /* Main loop IV cond. */ > + gcond* loop_iv_cond; > + > /* True if there are no loop carried data dependencies in the loop. > If loop->safelen <= 1, then this is always true, either the loop > didn't have any loop carried data dependencies, or the loop is being > @@ -984,6 +990,8 @@ public: > #define LOOP_VINFO_REDUCTION_CHAINS(L) (L)->reduction_chains > #define LOOP_VINFO_PEELING_FOR_GAPS(L) (L)->peeling_for_gaps > #define LOOP_VINFO_PEELING_FOR_NITER(L) (L)->peeling_for_niter > +#define LOOP_VINFO_LOOP_CONDS(L) (L)->conds > +#define LOOP_VINFO_LOOP_IV_COND(L) (L)->loop_iv_cond > #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies > #define LOOP_VINFO_SCALAR_LOOP(L) (L)->scalar_loop > #define LOOP_VINFO_SCALAR_LOOP_SCALING(L) (L)->scalar_loop_scaling > @@ -2373,7 +2381,9 @@ struct vect_loop_form_info > tree number_of_iterations; > tree number_of_iterationsm1; > tree assumptions; > + vec conds; > gcond *loop_cond; > + vec alt_loop_conds; > gcond *inner_loop_cond; > edge loop_exit; > }; > > > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)