From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 269053858C3A for ; Wed, 11 Oct 2023 12:08:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 269053858C3A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 3C9C2216DA; Wed, 11 Oct 2023 12:08:24 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id F38CA2C6A1; Wed, 11 Oct 2023 12:08:23 +0000 (UTC) Date: Wed, 11 Oct 2023 12:08:24 +0000 (UTC) From: Richard Biener To: Tamar Christina cc: "gcc-patches@gcc.gnu.org" , nd , "jlaw@ventanamicro.com" Subject: RE: [PATCH 2/3]middle-end: updated niters analysis to handle multiple exits. In-Reply-To: Message-ID: References: User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Level: Authentication-Results: smtp-out1.suse.de; none X-Rspamd-Server: rspamd2 X-Spamd-Result: default: False [-3.00 / 50.00]; BAYES_HAM(-3.00)[100.00%] X-Spam-Score: -3.00 X-Rspamd-Queue-Id: 3C9C2216DA X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_STATUS,KAM_LOTSOFHASH,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, 11 Oct 2023, Tamar Christina wrote: > > -----Original Message----- > > From: Richard Biener > > Sent: Tuesday, October 10, 2023 12:14 PM > > To: Tamar Christina > > Cc: gcc-patches@gcc.gnu.org; nd ; jlaw@ventanamicro.com > > Subject: Re: [PATCH 2/3]middle-end: updated niters analysis to handle > > multiple exits. > > > > On Mon, 2 Oct 2023, Tamar Christina wrote: > > > > > Hi All, > > > > > > This second part updates niters analysis to be able to analyze any > > > number of exits. If we have multiple exits we determine the main exit > > > by finding the first counting IV. > > > > > > The change allows the vectorizer to pass analysis for multiple loops, > > > but we later gracefully reject them. It does however allow us to test > > > if the exit handling is using the right exit everywhere. > > > > > > Additionally since we analyze all exits, we now return all conditions > > > for them and determine which condition belongs to the main exit. > > > > > > The main condition is needed because the vectorizer needs to ignore > > > the main IV condition during vectorization as it will replace it during codegen. > > > > > > To track versioned loops we extend the contract between ifcvt and the > > > vectorizer to store the exit number in aux so that we can match it up again > > during peeling. > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu, > > > and no issues. > > > > > > Ok for master? > > > > > > Thanks, > > > Tamar > > > > > > gcc/ChangeLog: > > > > > > * tree-if-conv.cc (tree_if_conversion): Record exits in aux. > > > * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): > > Use > > > it. > > > * tree-vect-loop.cc (vect_get_loop_niters): Determine main exit. > > > (vec_init_loop_exit_info): Extend analysis when multiple exits. > > > (vect_analyze_loop_form): Record conds and determine main cond. > > > (vect_create_loop_vinfo): Extend bookkeeping of conds. > > > (vect_analyze_loop): Release conds. > > > * tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS, > > > LOOP_VINFO_LOOP_IV_COND): New. > > > (struct vect_loop_form_info): Add conds, alt_loop_conds; > > > (struct loop_vec_info): Add conds, loop_iv_cond. > > > > > > --- inline copy of patch -- > > > diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index > > > > > 799f071965e5c41eb352b5530cf1d9c7ecf7bf25..3dc2290467797ebbfcef55 > > 903531 > > > b22829f4fdbd 100644 > > > --- a/gcc/tree-if-conv.cc > > > +++ b/gcc/tree-if-conv.cc > > > @@ -3795,6 +3795,13 @@ tree_if_conversion (class loop *loop, > > vec *preds) > > > } > > > if (need_to_ifcvt) > > > { > > > + /* Before we rewrite edges we'll record their original position in the > > > + edge map such that we can map the edges between the ifcvt and the > > > + non-ifcvt loop during peeling. */ > > > + uintptr_t idx = 0; > > > + for (edge exit : get_loop_exit_edges (loop)) > > > + exit->aux = (void*)idx++; > > > + > > > /* Now all statements are if-convertible. Combine all the basic > > > blocks into one huge basic block doing the if-conversion > > > on-the-fly. */ > > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > > > index > > > > > e06717272aafc6d31cbdcb94840ac25de616da6d..77f8e668bcc8beca99ba4 > > 052e1b1 > > > 2e0d17300262 100644 > > > --- a/gcc/tree-vect-loop-manip.cc > > > +++ b/gcc/tree-vect-loop-manip.cc > > > @@ -1470,6 +1470,18 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class > > loop *loop, edge loop_exit, > > > scalar_loop = loop; > > > scalar_exit = loop_exit; > > > } > > > + else if (scalar_loop == loop) > > > + scalar_exit = loop_exit; > > > + else > > > + { > > > + /* Loop has been version, match exits up using the aux index. */ > > > + for (edge exit : get_loop_exit_edges (scalar_loop)) > > > + if (exit->aux == loop_exit->aux) > > > + { > > > + scalar_exit = exit; > > > + break; > > > + } > > > + } > > > > > > bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1); > > > pbbs = bbs + 1; > > > @@ -1501,6 +1513,8 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class > > loop *loop, edge loop_exit, > > > exit = loop_exit; > > > basic_block new_preheader = new_bbs[0]; > > > > > > + /* Record the new loop exit information. new_loop doesn't have SCEV > > data and > > > + so we must initialize the exit information. */ > > > if (new_e) > > > *new_e = new_exit; > > > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index > > > > > 6e60d84143626a8e1d801bb580f4dcebc73c7ba7..f1caa5f207d3b13da58c3 > > a313b11 > > > d1ef98374349 100644 > > > --- a/gcc/tree-vect-loop.cc > > > +++ b/gcc/tree-vect-loop.cc > > > @@ -851,79 +851,106 @@ vect_fixup_scalar_cycles_with_patterns > > (loop_vec_info loop_vinfo) > > > in NUMBER_OF_ITERATIONSM1. Place the condition under which the > > > niter information holds in ASSUMPTIONS. > > > > > > - Return the loop exit condition. */ > > > + Return the loop exit conditions. */ > > > > > > > > > -static gcond * > > > -vect_get_loop_niters (class loop *loop, edge exit, tree *assumptions, > > > +static vec > > > +vect_get_loop_niters (class loop *loop, tree *assumptions, const_edge > > > +main_exit, > > > tree *number_of_iterations, tree > > *number_of_iterationsm1) > > > > Any reason you swap exit and main_exit? IMHO the input better pairs with > > the other input 'loop'. > > > > No, I think I was just rearranging thing to fit more on a line. I'll put them next > to their exits. > > > > > > { > > > + auto_vec exits = get_loop_exit_edges (loop); vec > > > + conds; conds.create (exits.length ()); > > > class tree_niter_desc niter_desc; > > > tree niter_assumptions, niter, may_be_zero; > > > - gcond *cond = get_loop_exit_condition (loop); > > > > > > *assumptions = boolean_true_node; > > > *number_of_iterationsm1 = chrec_dont_know; > > > *number_of_iterations = chrec_dont_know; > > > + > > > DUMP_VECT_SCOPE ("get_loop_niters"); > > > > > > - if (!exit) > > > - return cond; > > > + if (exits.is_empty ()) > > > + return conds; > > > + > > > + if (dump_enabled_p ()) > > > + dump_printf_loc (MSG_NOTE, vect_location, "Loop has %d exits.\n", > > > + exits.length ()); > > > + > > > + edge exit; > > > + unsigned int i; > > > + FOR_EACH_VEC_ELT (exits, i, exit) > > > + { > > > + gcond *cond = get_loop_exit_condition (exit); > > > + if (cond) > > > + conds.safe_push (cond); > > > + > > > + if (dump_enabled_p ()) > > > + dump_printf_loc (MSG_NOTE, vect_location, "Analyzing exit %d...\n", > > > +i); > > > > > > - may_be_zero = NULL_TREE; > > > - if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, > > NULL) > > > - || chrec_contains_undetermined (niter_desc.niter)) > > > - return cond; > > > + may_be_zero = NULL_TREE; > > > + if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, > > NULL) > > > + || chrec_contains_undetermined (niter_desc.niter)) > > > + continue; > > > > > > - niter_assumptions = niter_desc.assumptions; > > > - may_be_zero = niter_desc.may_be_zero; > > > - niter = niter_desc.niter; > > > + niter_assumptions = niter_desc.assumptions; > > > + may_be_zero = niter_desc.may_be_zero; > > > + niter = niter_desc.niter; > > > > > > - if (may_be_zero && integer_zerop (may_be_zero)) > > > - may_be_zero = NULL_TREE; > > > + if (may_be_zero && integer_zerop (may_be_zero)) > > > + may_be_zero = NULL_TREE; > > > > > > - if (may_be_zero) > > > - { > > > - if (COMPARISON_CLASS_P (may_be_zero)) > > > + if (may_be_zero) > > > { > > > - /* Try to combine may_be_zero with assumptions, this can simplify > > > - computation of niter expression. */ > > > - if (niter_assumptions && !integer_nonzerop (niter_assumptions)) > > > - niter_assumptions = fold_build2 (TRUTH_AND_EXPR, > > boolean_type_node, > > > - niter_assumptions, > > > - fold_build1 (TRUTH_NOT_EXPR, > > > - boolean_type_node, > > > - may_be_zero)); > > > + if (COMPARISON_CLASS_P (may_be_zero)) > > > + { > > > + /* Try to combine may_be_zero with assumptions, this can simplify > > > + computation of niter expression. */ > > > + if (niter_assumptions && !integer_nonzerop (niter_assumptions)) > > > + niter_assumptions = fold_build2 (TRUTH_AND_EXPR, > > boolean_type_node, > > > + niter_assumptions, > > > + fold_build1 > > (TRUTH_NOT_EXPR, > > > + > > boolean_type_node, > > > + may_be_zero)); > > > + else > > > + niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), > > may_be_zero, > > > + build_int_cst (TREE_TYPE (niter), 0), > > > + rewrite_to_non_trapping_overflow (niter)); > > > + > > > + may_be_zero = NULL_TREE; > > > + } > > > + else if (integer_nonzerop (may_be_zero) && exit == main_exit) > > > + { > > > + *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0); > > > + *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1); > > > + continue; > > > + } > > > else > > > - niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero, > > > - build_int_cst (TREE_TYPE (niter), 0), > > > - rewrite_to_non_trapping_overflow (niter)); > > > + continue; > > > + } > > > > > > - may_be_zero = NULL_TREE; > > > - } > > > - else if (integer_nonzerop (may_be_zero)) > > > + /* Loop assumptions are based off the normal exit. */ > > > + if (exit == main_exit) > > > > It's a bit hard to follow in patch form but I wonder why you even analyze the > > number of iterations of the non-main exits riskying possibly clobbering the > > *number_* outputs which we later assume to be for the main exit? > > > > My original goal here was that if we can't analyze the other exits, we probably > can't vectorize them. So I don't really need the results but I thought it useful to > check. I can skip them. Please. I don't think they need to be countable. Richard. > Thanks, > Tamar > > > > { > > > - *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0); > > > - *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1); > > > - return cond; > > > + *assumptions = niter_assumptions; > > > + *number_of_iterationsm1 = niter; > > > + > > > + /* We want the number of loop header executions which is the > > number > > > + of latch executions plus one. > > > + ??? For UINT_MAX latch executions this number overflows to zero > > > + for loops like do { n++; } while (n != 0); */ > > > + if (niter && !chrec_contains_undetermined (niter)) > > > + niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), > > > + unshare_expr (niter), > > > + build_int_cst (TREE_TYPE (niter), 1)); > > > + *number_of_iterations = niter; > > > } > > > - else > > > - return cond; > > > } > > > > > > - *assumptions = niter_assumptions; > > > - *number_of_iterationsm1 = niter; > > > - > > > - /* We want the number of loop header executions which is the number > > > - of latch executions plus one. > > > - ??? For UINT_MAX latch executions this number overflows to zero > > > - for loops like do { n++; } while (n != 0); */ > > > - if (niter && !chrec_contains_undetermined (niter)) > > > - niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), unshare_expr (niter), > > > - build_int_cst (TREE_TYPE (niter), 1)); > > > - *number_of_iterations = niter; > > > + if (dump_enabled_p ()) > > > + dump_printf_loc (MSG_NOTE, vect_location, "All loop exits > > > + successfully analyzed.\n"); > > > > > > - return cond; > > > + return conds; > > > } > > > > > > /* Determine the main loop exit for the vectorizer. */ @@ -936,8 > > > +963,25 @@ vec_init_loop_exit_info (class loop *loop) > > > auto_vec exits = get_loop_exit_edges (loop); > > > if (exits.length () == 1) > > > return exits[0]; > > > - else > > > - return NULL; > > > + > > > + /* If we have multiple exits we only support counting IV at the moment. > > Analyze > > > + all exits and return one */ > > > + class tree_niter_desc niter_desc; > > > + edge candidate = NULL; > > > + for (edge exit : exits) > > > + { > > > + if (!get_loop_exit_condition (exit)) > > > + continue; > > > + > > > + if (number_of_iterations_exit_assumptions (loop, exit, &niter_desc, > > NULL) > > > + && !chrec_contains_undetermined (niter_desc.niter)) > > > + { > > > + if (!niter_desc.may_be_zero || !candidate) > > > + candidate = exit; > > > + } > > > + } > > > + > > > + return candidate; > > > } > > > > > > /* Function bb_in_loop_p > > > @@ -1788,21 +1832,31 @@ vect_analyze_loop_form (class loop *loop, > > vect_loop_form_info *info) > > > "not vectorized: latch block not empty.\n"); > > > > > > /* Make sure the exit is not abnormal. */ > > > - edge e = single_exit (loop); > > > - if (e->flags & EDGE_ABNORMAL) > > > + if (exit_e->flags & EDGE_ABNORMAL) > > > return opt_result::failure_at (vect_location, > > > "not vectorized:" > > > " abnormal loop exit edge.\n"); > > > > > > - info->loop_cond > > > - = vect_get_loop_niters (loop, e, &info->assumptions, > > > + info->conds > > > + = vect_get_loop_niters (loop, &info->assumptions, exit_e, > > > &info->number_of_iterations, > > > &info->number_of_iterationsm1); > > > - if (!info->loop_cond) > > > + > > > + if (info->conds.is_empty ()) > > > return opt_result::failure_at > > > (vect_location, > > > "not vectorized: complicated exit condition.\n"); > > > > > > + /* Determine what the primary and alternate exit conds are. */ > > > + info->alt_loop_conds.create (info->conds.length () - 1); > > > + for (gcond *cond : info->conds) > > > + { > > > + if (exit_e->src != gimple_bb (cond)) > > > + info->alt_loop_conds.quick_push (cond); > > > + else > > > + info->loop_cond = cond; > > > + } > > > + > > > > IMHO it would be simpler to have the primary exit condition in > > info->conds[0] and the rest after that? That avoids having two > > arrays and one scalar in vect_loop_form_info. > > > > > if (integer_zerop (info->assumptions) > > > || !info->number_of_iterations > > > || chrec_contains_undetermined (info->number_of_iterations)) @@ > > > -1847,8 +1901,13 @@ vect_create_loop_vinfo (class loop *loop, > > vec_info_shared *shared, > > > if (!integer_onep (info->assumptions) && !main_loop_info) > > > LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions; > > > > > > - stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt > > > (info->loop_cond); > > > - STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type; > > > + for (gcond *cond : info->conds) > > > + { > > > + stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (cond); > > > + STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type; > > > + } > > > + LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice > > > + (info->alt_loop_conds); LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = > > > + info->loop_cond; > > > LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit; > > > > > > @@ -3594,7 +3653,11 @@ vect_analyze_loop (class loop *loop, > > vec_info_shared *shared) > > > && LOOP_VINFO_PEELING_FOR_NITER > > (first_loop_vinfo) > > > && !loop->simduid); > > > if (!vect_epilogues) > > > - return first_loop_vinfo; > > > + { > > > + loop_form_info.conds.release (); > > > + loop_form_info.alt_loop_conds.release (); > > > + return first_loop_vinfo; > > > + } > > > > I think there's 'inner' where you leak these. Maybe use auto_vec<> in > > vect_loop_form_info instead? > > > > Otherwise looks OK. > > > > Thanks, > > Richard. > > > > > /* Now analyze first_loop_vinfo for epilogue vectorization. */ > > > poly_uint64 lowest_th = LOOP_VINFO_VERSIONING_THRESHOLD > > > (first_loop_vinfo); @@ -3694,6 +3757,9 @@ vect_analyze_loop (class loop > > *loop, vec_info_shared *shared) > > > (first_loop_vinfo->epilogue_vinfos[0]- > > >vector_mode)); > > > } > > > > > > + loop_form_info.conds.release (); > > > + loop_form_info.alt_loop_conds.release (); > > > + > > > return first_loop_vinfo; > > > } > > > > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index > > > > > afa7a8e30891c782a0e5e3740ecc4377f5a31e54..55b6771b271d5072fa132 > > 7d595e1 > > > dddb112cfdf6 100644 > > > --- a/gcc/tree-vectorizer.h > > > +++ b/gcc/tree-vectorizer.h > > > @@ -882,6 +882,12 @@ public: > > > we need to peel off iterations at the end to form an epilogue loop. */ > > > bool peeling_for_niter; > > > > > > + /* List of loop additional IV conditionals found in the loop. */ > > > + auto_vec conds; > > > + > > > + /* Main loop IV cond. */ > > > + gcond* loop_iv_cond; > > > + > > > /* True if there are no loop carried data dependencies in the loop. > > > If loop->safelen <= 1, then this is always true, either the loop > > > didn't have any loop carried data dependencies, or the loop is > > > being @@ -984,6 +990,8 @@ public: > > > #define LOOP_VINFO_REDUCTION_CHAINS(L) (L)->reduction_chains > > > #define LOOP_VINFO_PEELING_FOR_GAPS(L) (L)->peeling_for_gaps > > > #define LOOP_VINFO_PEELING_FOR_NITER(L) (L)->peeling_for_niter > > > +#define LOOP_VINFO_LOOP_CONDS(L) (L)->conds > > > +#define LOOP_VINFO_LOOP_IV_COND(L) (L)->loop_iv_cond > > > #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)- > > >no_data_dependencies > > > #define LOOP_VINFO_SCALAR_LOOP(L) (L)->scalar_loop > > > #define LOOP_VINFO_SCALAR_LOOP_SCALING(L) (L)->scalar_loop_scaling > > > @@ -2373,7 +2381,9 @@ struct vect_loop_form_info > > > tree number_of_iterations; > > > tree number_of_iterationsm1; > > > tree assumptions; > > > + vec conds; > > > gcond *loop_cond; > > > + vec alt_loop_conds; > > > gcond *inner_loop_cond; > > > edge loop_exit; > > > }; > > > > > > > > > > > > > > > > > > > -- > > Richard Biener > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > Nuernberg) > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)