From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id C8F603858D28 for ; Mon, 17 Jul 2023 09:17:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C8F603858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 3870A1FDAC; Mon, 17 Jul 2023 09:17:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1689585465; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jh+/fN/jlVL1ira60pKlu7eCZRHnM3l0/iM82t2mKag=; b=glEQmOPHEVi6y0mFbAgOuRHleSZmDP8qklV2D48ugYCn4M8DVhHY6n8yzaAOD6dHGGIIzz yBA49aGztNN/TczVOiLj+IEmZUlMazfj1XTqy1tRRNvboJl/RGZpJog/xcmQeC5eoE6cQ3 t607+pKrWnEpltd6BusRc7fiA+qFX0s= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1689585465; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jh+/fN/jlVL1ira60pKlu7eCZRHnM3l0/iM82t2mKag=; b=iiC86+0CXllehEFhzwsNJ6G2RkfSR25eQe9m2/J1Sp6IBRep4l+q1ymVB7g2EzDbZiEUi0 YfVZ/C2ZXRFMyzAQ== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 2C3672C145; Mon, 17 Jul 2023 09:17:45 +0000 (UTC) Date: Mon, 17 Jul 2023 09:17:45 +0000 (UTC) From: Richard Biener To: Jan Hubicka cc: gcc-patches@gcc.gnu.org Subject: Re: Loop-ch improvements, part 3 In-Reply-To: Message-ID: References: User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_NUMSUBJECT,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 14 Jul 2023, Jan Hubicka wrote: > Hi, > loop-ch currently does analysis using ranger for all loops to identify > candidates and then follows by phase where headers are duplicated (which > breaks SSA and ranger). The second stage does more analysis (to see how > many BBs we want to duplicate) but can't use ranger and thus misses > information about static conditionals. > > This patch pushes all analysis into the first stage. We record how many > BBs to duplicate and the second stage just duplicats as it is told so. > This makes it possible to also extend range query done also to basic > blocks that are not headers. This is easy to do, since we already do > path specific query so we only need to extend the path by headers we > decided to dulicate earlier. > > This makes it possible to track situations where exit that is always > false in the first iteration for tests not in the original loop header. > Doing so lets us to update profile better and do better heuristics. In > particular I changed logic as follows > 1) should_duplicate_loop_header_p counts size of duplicated region. When we > know that a given conditional will be constant true or constant false either > in the duplicated region, by range query, or in the loop body after > duplication (since it is loop invariant), we do not account it to code size > costs > 2) don't need account loop invariant compuations that will be duplicated > as they will become fully invariant > (maybe we want to have some cap for register pressure eventually?) > 3) optimize_size logic is now different. Originally we started duplicating > iff the first conditional was known to be true by ranger query, but then > we used same limits as for -O2. > > I now simply lower limits to 0. This means that every conditional > in duplicated sequence must be either loop invariant or constant when > duplicated and we only duplicate statements computing loop invariants > and those we account to 0 size anyway, > > This makes code IMO more streamlined (and hopefully will let us to merge > ibts with loop peeling logic), but makes little difference in practice. > The problem is that in loop: > > void test2(); > void test(int n) > { > for (int i = 0; n && i < 10; i++) > test2(); > } > > We produce: > [local count: 1073741824 freq: 9.090909]: > # i_4 = PHI <0(2), i_9(3)> > _1 = n_7(D) != 0; > _2 = i_4 <= 9; > _3 = _1 & _2; > if (_3 != 0) > goto ; [89.00%] > else > goto ; [11.00%] > > and do not understand that the final conditional is a combination of a conditional > that is always true in first iteration and a conditional that is loop invariant. > > This is also the case of > void test2(); > void test(int n) > { > for (int i = 0; n; i++) > { > if (i > 10) > break; > test2(); > } > } > Which we turn to the earlier case in ifcombine. > > With disabled ifcombine things however works as exepcted. This is something > I plan to handle incrementally. However extending loop-ch and peeling passes > to understand such combined conditionals is still not good enough: at the time ifcombine > merged the two conditionals we lost profile information on how often n is 0, > so we can't recover correct profile or know what is expected number of iterations > after the transofrm. > > Bootstrapped/regtested x86_64-linux, OK? OK. Thanks, Richard. > Honza > > > gcc/ChangeLog: > > * tree-ssa-loop-ch.cc (edge_range_query): Take loop argument; be ready > for queries not in headers. > (static_loop_exit): Add basic blck parameter; update use of > edge_range_query > (should_duplicate_loop_header_p): Add ranger and static_exits > parameter. Do not account statements that will be optimized > out after duplicaiton in overall size. Add ranger query to > find static exits. > (update_profile_after_ch): Take static_exits has set instead of > single eliminated_edge. > (ch_base::copy_headers): Do all analysis in the first pass; > remember invariant_exits and static_exits. > > diff --git a/gcc/tree-ssa-loop-ch.cc b/gcc/tree-ssa-loop-ch.cc > index 24e7fbc805a..e0139cb432c 100644 > --- a/gcc/tree-ssa-loop-ch.cc > +++ b/gcc/tree-ssa-loop-ch.cc > @@ -49,11 +49,13 @@ along with GCC; see the file COPYING3. If not see > the range of the solved conditional in R. */ > > static void > -edge_range_query (irange &r, edge e, gcond *cond, gimple_ranger &ranger) > +edge_range_query (irange &r, class loop *loop, gcond *cond, gimple_ranger &ranger) > { > - auto_vec path (2); > - path.safe_push (e->dest); > - path.safe_push (e->src); > + auto_vec path; > + for (basic_block bb = gimple_bb (cond); bb != loop->header; bb = single_pred_edge (bb)->src) > + path.safe_push (bb); > + path.safe_push (loop->header); > + path.safe_push (loop_preheader_edge (loop)->src); > path_range_query query (ranger, path); > if (!query.range_of_stmt (r, cond)) > r.set_varying (boolean_type_node); > @@ -63,17 +65,16 @@ edge_range_query (irange &r, edge e, gcond *cond, gimple_ranger &ranger) > and NULL otherwise. */ > > static edge > -static_loop_exit (class loop *l, gimple_ranger *ranger) > +static_loop_exit (class loop *l, basic_block bb, gimple_ranger *ranger) > { > - edge e = loop_preheader_edge (l); > - gcond *last = safe_dyn_cast (*gsi_last_bb (e->dest)); > + gcond *last = safe_dyn_cast (*gsi_last_bb (bb)); > edge ret_e; > > if (!last) > return NULL; > > edge true_e, false_e; > - extract_true_false_edges_from_block (e->dest, &true_e, &false_e); > + extract_true_false_edges_from_block (bb, &true_e, &false_e); > > /* If neither edge is the exit edge, this is not a case we'd like to > special-case. */ > @@ -93,7 +94,7 @@ static_loop_exit (class loop *l, gimple_ranger *ranger) > } > > int_range<2> r; > - edge_range_query (r, e, last, *ranger); > + edge_range_query (r, l, last, *ranger); > return r == desired_static_range ? ret_e : NULL; > } > > @@ -131,8 +132,10 @@ loop_iv_derived_p (class loop *loop, > > static bool > should_duplicate_loop_header_p (basic_block header, class loop *loop, > + gimple_ranger *ranger, > int *limit, > - hash_set *invariant_exits) > + hash_set *invariant_exits, > + hash_set *static_exits) > { > gimple_stmt_iterator bsi; > > @@ -209,6 +212,9 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop, > if (is_gimple_debug (last)) > continue; > > + if (gimple_code (last) == GIMPLE_COND) > + break; > + > if (gimple_code (last) == GIMPLE_CALL > && (!gimple_inexpensive_call_p (as_a (last)) > /* IFN_LOOP_DIST_ALIAS means that inner loop is distributed > @@ -222,16 +228,6 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop, > return false; > } > > - *limit -= estimate_num_insns (last, &eni_size_weights); > - if (*limit < 0) > - { > - if (dump_file && (dump_flags & TDF_DETAILS)) > - fprintf (dump_file, > - " Not duplicating bb %i contains too many insns.\n", > - header->index); > - return false; > - } > - > /* Classify the stmt based on whether its computation is based > on a IV or whether it is invariant in the loop. */ > gimple_set_uid (last, 0); > @@ -257,9 +253,36 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop, > inv = false; > } > gimple_set_uid (last, (iv ? 1 : 0) | (inv ? 2 : 0)); > + /* Loop invariants will be optimized out in loop body after > + duplication; do not account invariant computation in code > + size costs. */ > + if (inv) > + continue; > + } > + > + *limit -= estimate_num_insns (last, &eni_size_weights); > + if (*limit < 0) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, > + " Not duplicating bb %i contains too many insns.\n", > + header->index); > + return false; > } > } > > + edge static_exit = static_loop_exit (loop, header, ranger); > + > + if (static_exit && static_exits) > + { > + static_exits->add (static_exit); > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, > + " Will eliminate peeled conditional in bb %d.\n", > + static_exit->src->index); > + /* Still look for invariant exits; exit may be both. */ > + } > + > /* If the condition tests a non-IV loop variant we do not want to rotate > the loop further. Unless this is the original loop header. */ > tree lhs = gimple_cond_lhs (last); > @@ -284,12 +307,28 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop, > } > return true; > } > + > + if (static_exit) > + return true; > + > + /* We was not able to prove that conditional will be eliminated. */ > + *limit -= estimate_num_insns (last, &eni_size_weights); > + if (*limit < 0) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, > + " Not duplicating bb %i contains too many insns.\n", > + header->index); > + return false; > + } > + > /* TODO: This is heuristics that claims that IV based ocnditionals will > likely be optimized out in duplicated header. We could use ranger > query instead to tell this more precisely. */ > - if (header != loop->header > - && ((!lhs_invariant && !loop_iv_derived_p (loop, lhs)) > - || (!rhs_invariant && !loop_iv_derived_p (loop, rhs)))) > + if ((lhs_invariant || loop_iv_derived_p (loop, lhs)) > + && (rhs_invariant || loop_iv_derived_p (loop, rhs))) > + return true; > + if (header != loop->header) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, > @@ -386,8 +425,9 @@ do_while_loop_p (class loop *loop) > static void > update_profile_after_ch (class loop *loop, > basic_block *region, basic_block *region_copy, > - unsigned n_region, edge eliminated_edge, > + unsigned n_region, > hash_set *invariant_exits, > + hash_set *static_exits, > profile_count entry_count) > { > for (unsigned int i = 0; i < n_region; i++) > @@ -421,10 +461,13 @@ update_profile_after_ch (class loop *loop, > && e_copy->dest == region_copy[i + 1])); > region_copy[i]->count = entry_count; > profile_count exit_e_count = exit_e->count (); > - if (eliminated_edge == exit_e) > + bool was_static = false; > + if (static_exits->contains (exit_e)) > { > /* Update profile and the conditional. > CFG update is done by caller. */ > + static_exits->remove (exit_e); > + was_static = true; > e_copy->probability = profile_probability::always (); > exit_e_copy->probability = profile_probability::never (); > gcond *cond_stmt > @@ -437,7 +480,6 @@ update_profile_after_ch (class loop *loop, > /* Header copying is a special case of jump threading, so use > common code to update loop body exit condition. */ > update_bb_profile_for_threading (region[i], entry_count, e); > - eliminated_edge = NULL; > } > else > region[i]->count -= region_copy[i]->count; > @@ -448,7 +490,7 @@ update_profile_after_ch (class loop *loop, > loop, so increase probability accordingly. > If the edge is eliminated_edge we already corrected > profile above. */ > - if (entry_count.nonzero_p () && eliminated_edge != exit_e) > + if (entry_count.nonzero_p () && !was_static) > set_edge_probability_and_rescale_others > (exit_e_copy, exit_e_count.probability_in > (entry_count)); > @@ -465,7 +507,7 @@ update_profile_after_ch (class loop *loop, > entry_count = e_copy->count (); > } > /* Be sure that we seen all edges we are supposed to update. */ > - gcc_checking_assert (!eliminated_edge > + gcc_checking_assert (static_exits->is_empty () > && invariant_exits->is_empty ()); > } > > @@ -564,10 +606,7 @@ protected: > unsigned int > ch_base::copy_headers (function *fun) > { > - basic_block header; > - edge exit, nonexit, entry; > basic_block *bbs, *copied_bbs; > - unsigned n_bbs; > unsigned bbs_size; > bool changed = false; > > @@ -578,7 +617,12 @@ ch_base::copy_headers (function *fun) > copied_bbs = XNEWVEC (basic_block, n_basic_blocks_for_fn (fun)); > bbs_size = n_basic_blocks_for_fn (fun); > > - struct candidate {class loop *loop; edge static_exit;}; > + struct candidate > + { > + class loop *loop; > + unsigned int nheaders; > + hash_set *invariant_exits, *static_exits; > + }; > auto_vec candidates; > auto_vec > copied; > auto_vec loops_to_unloop; > @@ -588,13 +632,14 @@ ch_base::copy_headers (function *fun) > gimple_ranger *ranger = new gimple_ranger; > for (auto loop : loops_list (cfun, 0)) > { > - int initial_limit = param_max_loop_header_insns; > + int initial_limit = optimize_loop_for_speed_p (loop) > + ? param_max_loop_header_insns : 0; > int remaining_limit = initial_limit; > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, > "Analyzing loop %i\n", loop->num); > > - header = loop->header; > + basic_block header = loop->header; > if (!get_max_loop_iterations_int (loop)) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > @@ -613,24 +658,38 @@ ch_base::copy_headers (function *fun) > || !process_loop_p (loop)) > continue; > > - edge static_exit = static_loop_exit (loop, ranger); > - > - /* Avoid loop header copying when optimizing for size unless we can > - determine that the loop condition is static in the first > - iteration. */ > - if (optimize_loop_for_size_p (loop) > - && !loop->force_vectorize > - && !static_exit) > + /* Iterate the header copying up to limit; this takes care of the cases > + like while (a && b) {...}, where we want to have both of the conditions > + copied. TODO -- handle while (a || b) - like cases, by not requiring > + the header to have just a single successor and copying up to > + postdominator. */ > + int nheaders = 0; > + hash_set *invariant_exits = new hash_set ; > + hash_set *static_exits = new hash_set ; > + while (should_duplicate_loop_header_p (header, loop, ranger, > + &remaining_limit, > + invariant_exits, > + static_exits)) > { > + nheaders++; > if (dump_file && (dump_flags & TDF_DETAILS)) > - fprintf (dump_file, > - " Not duplicating bb %i: optimizing for size.\n", > - header->index); > - continue; > + fprintf (dump_file, " Will duplicate bb %i\n", header->index); > + > + /* Find a successor of header that is inside a loop; i.e. the new > + header after the condition is copied. */ > + if (flow_bb_inside_loop_p (loop, EDGE_SUCC (header, 0)->dest)) > + header = EDGE_SUCC (header, 0)->dest; > + else > + header = EDGE_SUCC (header, 1)->dest; > } > > - if (should_duplicate_loop_header_p (header, loop, &remaining_limit, NULL)) > - candidates.safe_push ({loop, static_exit}); > + if (nheaders) > + candidates.safe_push ({loop, nheaders, invariant_exits, static_exits}); > + else > + { > + delete invariant_exits; > + delete static_exits; > + } > } > /* Do not use ranger after we change the IL and not have updated SSA. */ > delete ranger; > @@ -638,37 +697,25 @@ ch_base::copy_headers (function *fun) > for (auto candidate : candidates) > { > class loop *loop = candidate.loop; > - int initial_limit = param_max_loop_header_insns; > - int remaining_limit = initial_limit; > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, > "Copying headers of loop %i\n", loop->num); > > - header = loop->header; > - > - /* Iterate the header copying up to limit; this takes care of the cases > - like while (a && b) {...}, where we want to have both of the conditions > - copied. TODO -- handle while (a || b) - like cases, by not requiring > - the header to have just a single successor and copying up to > - postdominator. */ > - > - nonexit = NULL; > - n_bbs = 0; > + basic_block header = loop->header; > + edge nonexit = NULL; > + edge exit = NULL; > + unsigned int n_bbs = 0; > int nexits = 0; > profile_count exit_count = profile_count::zero (); > profile_count entry_count = profile_count::zero (); > edge e; > edge_iterator ei; > - hash_set invariant_exits; > + > FOR_EACH_EDGE (e, ei, loop->header->preds) > if (e->src != loop->latch) > entry_count += e->count (); > - while (should_duplicate_loop_header_p (header, loop, &remaining_limit, > - &invariant_exits)) > + while (n_bbs < candidate.nheaders) > { > - if (dump_file && (dump_flags & TDF_DETAILS)) > - fprintf (dump_file, " Will duplicate bb %i\n", header->index); > - > /* Find a successor of header that is inside a loop; i.e. the new > header after the condition is copied. */ > if (flow_bb_inside_loop_p (loop, EDGE_SUCC (header, 0)->dest)) > @@ -688,21 +735,10 @@ ch_base::copy_headers (function *fun) > header = nonexit->dest; > } > > - if (!nonexit) > - continue; > - > if (dump_file && (dump_flags & TDF_DETAILS)) > - { > - fprintf (dump_file, > - "Duplicating header of the loop %d up to edge %d->%d," > - " %i insns.\n", > - loop->num, exit->src->index, exit->dest->index, > - initial_limit - remaining_limit); > - if (candidate.static_exit) > - fprintf (dump_file, > - " Will eliminate peeled conditional in bb %d.\n", > - candidate.static_exit->src->index); > - } > + fprintf (dump_file, > + "Duplicating header of the loop %d up to edge %d->%d\n", > + loop->num, exit->src->index, exit->dest->index); > > /* Ensure that the header will have just the latch as a predecessor > inside the loop. */ > @@ -712,20 +748,25 @@ ch_base::copy_headers (function *fun) > exit = single_pred_edge (header); > } > > - entry = loop_preheader_edge (loop); > + edge entry = loop_preheader_edge (loop); > > propagate_threaded_block_debug_into (exit->dest, entry->dest); > if (!gimple_duplicate_seme_region (entry, exit, bbs, n_bbs, copied_bbs, > true)) > { > + delete candidate.static_exits; > + delete candidate.invariant_exits; > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, "Duplication failed.\n"); > continue; > } > if (loop->header->count.initialized_p ()) > update_profile_after_ch (loop, bbs, copied_bbs, n_bbs, > - candidate.static_exit, &invariant_exits, > + candidate.invariant_exits, > + candidate.static_exits, > entry_count); > + delete candidate.static_exits; > + delete candidate.invariant_exits; > copied.safe_push (std::make_pair (entry, loop)); > > /* If the loop has the form "for (i = j; i < j + 10; i++)" then > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)