From: Hongtao Liu <crazylht@gmail.com>
To: Richard Biener <rguenther@suse.de>
Cc: Jan Hubicka <hubicka@ucw.cz>, gcc-patches@gcc.gnu.org
Subject: Re: Loop-ch improvements, part 3
Date: Tue, 22 Aug 2023 13:24:09 +0800 [thread overview]
Message-ID: <CAMZc-bxpkOmDttrXG200g4jb1aRs6pMOhZrcnGRmgdE6yrc3_g@mail.gmail.com> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2307170917330.4723@jbgna.fhfr.qr>
On Mon, Jul 17, 2023 at 5:18 PM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Fri, 14 Jul 2023, Jan Hubicka wrote:
>
> > Hi,
> > loop-ch currently does analysis using ranger for all loops to identify
> > candidates and then follows by phase where headers are duplicated (which
> > breaks SSA and ranger). The second stage does more analysis (to see how
> > many BBs we want to duplicate) but can't use ranger and thus misses
> > information about static conditionals.
> >
> > This patch pushes all analysis into the first stage. We record how many
> > BBs to duplicate and the second stage just duplicats as it is told so.
> > This makes it possible to also extend range query done also to basic
> > blocks that are not headers. This is easy to do, since we already do
> > path specific query so we only need to extend the path by headers we
> > decided to dulicate earlier.
> >
> > This makes it possible to track situations where exit that is always
> > false in the first iteration for tests not in the original loop header.
> > Doing so lets us to update profile better and do better heuristics. In
> > particular I changed logic as follows
> > 1) should_duplicate_loop_header_p counts size of duplicated region. When we
> > know that a given conditional will be constant true or constant false either
> > in the duplicated region, by range query, or in the loop body after
> > duplication (since it is loop invariant), we do not account it to code size
> > costs
> > 2) don't need account loop invariant compuations that will be duplicated
> > as they will become fully invariant
> > (maybe we want to have some cap for register pressure eventually?)
> > 3) optimize_size logic is now different. Originally we started duplicating
> > iff the first conditional was known to be true by ranger query, but then
> > we used same limits as for -O2.
> >
> > I now simply lower limits to 0. This means that every conditional
> > in duplicated sequence must be either loop invariant or constant when
> > duplicated and we only duplicate statements computing loop invariants
> > and those we account to 0 size anyway,
> >
> > This makes code IMO more streamlined (and hopefully will let us to merge
> > ibts with loop peeling logic), but makes little difference in practice.
> > The problem is that in loop:
> >
> > void test2();
> > void test(int n)
> > {
> > for (int i = 0; n && i < 10; i++)
> > test2();
> > }
> >
> > We produce:
> > <bb 4> [local count: 1073741824 freq: 9.090909]:
> > # i_4 = PHI <0(2), i_9(3)>
> > _1 = n_7(D) != 0;
> > _2 = i_4 <= 9;
> > _3 = _1 & _2;
> > if (_3 != 0)
> > goto <bb 3>; [89.00%]
> > else
> > goto <bb 5>; [11.00%]
> >
> > and do not understand that the final conditional is a combination of a conditional
> > that is always true in first iteration and a conditional that is loop invariant.
> >
> > This is also the case of
> > void test2();
> > void test(int n)
> > {
> > for (int i = 0; n; i++)
> > {
> > if (i > 10)
> > break;
> > test2();
> > }
> > }
> > Which we turn to the earlier case in ifcombine.
> >
> > With disabled ifcombine things however works as exepcted. This is something
> > I plan to handle incrementally. However extending loop-ch and peeling passes
> > to understand such combined conditionals is still not good enough: at the time ifcombine
> > merged the two conditionals we lost profile information on how often n is 0,
> > so we can't recover correct profile or know what is expected number of iterations
> > after the transofrm.
This regressed
FAIL: gcc.target/i386/pr93089-2.c scan-assembler vmulps[^\n\r]*zmm
FAIL: gcc.target/i386/pr93089-3.c scan-assembler vmulps[^\n\r]*zmm
The testcase is quite simple, not sure why it's regressed.
1/* PR target/93089 */
2/* { dg-do compile } */
3/* { dg-options "-O2 -fopenmp-simd -mtune=znver1" } */
4/* { dg-final { scan-assembler "vmulps\[^\n\r]*zmm" } } */
5/* { dg-final { scan-assembler "vmulps\[^\n\r]*ymm" } } */
6
7#pragma omp declare simd notinbranch
8float
9foo (float x, float y)
10{
11 return x * y;
12}
> >
> > Bootstrapped/regtested x86_64-linux, OK?
>
> OK.
>
> Thanks,
> Richard.
>
> > Honza
> >
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-loop-ch.cc (edge_range_query): Take loop argument; be ready
> > for queries not in headers.
> > (static_loop_exit): Add basic blck parameter; update use of
> > edge_range_query
> > (should_duplicate_loop_header_p): Add ranger and static_exits
> > parameter. Do not account statements that will be optimized
> > out after duplicaiton in overall size. Add ranger query to
> > find static exits.
> > (update_profile_after_ch): Take static_exits has set instead of
> > single eliminated_edge.
> > (ch_base::copy_headers): Do all analysis in the first pass;
> > remember invariant_exits and static_exits.
> >
> > diff --git a/gcc/tree-ssa-loop-ch.cc b/gcc/tree-ssa-loop-ch.cc
> > index 24e7fbc805a..e0139cb432c 100644
> > --- a/gcc/tree-ssa-loop-ch.cc
> > +++ b/gcc/tree-ssa-loop-ch.cc
> > @@ -49,11 +49,13 @@ along with GCC; see the file COPYING3. If not see
> > the range of the solved conditional in R. */
> >
> > static void
> > -edge_range_query (irange &r, edge e, gcond *cond, gimple_ranger &ranger)
> > +edge_range_query (irange &r, class loop *loop, gcond *cond, gimple_ranger &ranger)
> > {
> > - auto_vec<basic_block> path (2);
> > - path.safe_push (e->dest);
> > - path.safe_push (e->src);
> > + auto_vec<basic_block, 8> path;
> > + for (basic_block bb = gimple_bb (cond); bb != loop->header; bb = single_pred_edge (bb)->src)
> > + path.safe_push (bb);
> > + path.safe_push (loop->header);
> > + path.safe_push (loop_preheader_edge (loop)->src);
> > path_range_query query (ranger, path);
> > if (!query.range_of_stmt (r, cond))
> > r.set_varying (boolean_type_node);
> > @@ -63,17 +65,16 @@ edge_range_query (irange &r, edge e, gcond *cond, gimple_ranger &ranger)
> > and NULL otherwise. */
> >
> > static edge
> > -static_loop_exit (class loop *l, gimple_ranger *ranger)
> > +static_loop_exit (class loop *l, basic_block bb, gimple_ranger *ranger)
> > {
> > - edge e = loop_preheader_edge (l);
> > - gcond *last = safe_dyn_cast <gcond *> (*gsi_last_bb (e->dest));
> > + gcond *last = safe_dyn_cast <gcond *> (*gsi_last_bb (bb));
> > edge ret_e;
> >
> > if (!last)
> > return NULL;
> >
> > edge true_e, false_e;
> > - extract_true_false_edges_from_block (e->dest, &true_e, &false_e);
> > + extract_true_false_edges_from_block (bb, &true_e, &false_e);
> >
> > /* If neither edge is the exit edge, this is not a case we'd like to
> > special-case. */
> > @@ -93,7 +94,7 @@ static_loop_exit (class loop *l, gimple_ranger *ranger)
> > }
> >
> > int_range<2> r;
> > - edge_range_query (r, e, last, *ranger);
> > + edge_range_query (r, l, last, *ranger);
> > return r == desired_static_range ? ret_e : NULL;
> > }
> >
> > @@ -131,8 +132,10 @@ loop_iv_derived_p (class loop *loop,
> >
> > static bool
> > should_duplicate_loop_header_p (basic_block header, class loop *loop,
> > + gimple_ranger *ranger,
> > int *limit,
> > - hash_set <edge> *invariant_exits)
> > + hash_set <edge> *invariant_exits,
> > + hash_set <edge> *static_exits)
> > {
> > gimple_stmt_iterator bsi;
> >
> > @@ -209,6 +212,9 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop,
> > if (is_gimple_debug (last))
> > continue;
> >
> > + if (gimple_code (last) == GIMPLE_COND)
> > + break;
> > +
> > if (gimple_code (last) == GIMPLE_CALL
> > && (!gimple_inexpensive_call_p (as_a <gcall *> (last))
> > /* IFN_LOOP_DIST_ALIAS means that inner loop is distributed
> > @@ -222,16 +228,6 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop,
> > return false;
> > }
> >
> > - *limit -= estimate_num_insns (last, &eni_size_weights);
> > - if (*limit < 0)
> > - {
> > - if (dump_file && (dump_flags & TDF_DETAILS))
> > - fprintf (dump_file,
> > - " Not duplicating bb %i contains too many insns.\n",
> > - header->index);
> > - return false;
> > - }
> > -
> > /* Classify the stmt based on whether its computation is based
> > on a IV or whether it is invariant in the loop. */
> > gimple_set_uid (last, 0);
> > @@ -257,9 +253,36 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop,
> > inv = false;
> > }
> > gimple_set_uid (last, (iv ? 1 : 0) | (inv ? 2 : 0));
> > + /* Loop invariants will be optimized out in loop body after
> > + duplication; do not account invariant computation in code
> > + size costs. */
> > + if (inv)
> > + continue;
> > + }
> > +
> > + *limit -= estimate_num_insns (last, &eni_size_weights);
> > + if (*limit < 0)
> > + {
> > + if (dump_file && (dump_flags & TDF_DETAILS))
> > + fprintf (dump_file,
> > + " Not duplicating bb %i contains too many insns.\n",
> > + header->index);
> > + return false;
> > }
> > }
> >
> > + edge static_exit = static_loop_exit (loop, header, ranger);
> > +
> > + if (static_exit && static_exits)
> > + {
> > + static_exits->add (static_exit);
> > + if (dump_file && (dump_flags & TDF_DETAILS))
> > + fprintf (dump_file,
> > + " Will eliminate peeled conditional in bb %d.\n",
> > + static_exit->src->index);
> > + /* Still look for invariant exits; exit may be both. */
> > + }
> > +
> > /* If the condition tests a non-IV loop variant we do not want to rotate
> > the loop further. Unless this is the original loop header. */
> > tree lhs = gimple_cond_lhs (last);
> > @@ -284,12 +307,28 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop,
> > }
> > return true;
> > }
> > +
> > + if (static_exit)
> > + return true;
> > +
> > + /* We was not able to prove that conditional will be eliminated. */
> > + *limit -= estimate_num_insns (last, &eni_size_weights);
> > + if (*limit < 0)
> > + {
> > + if (dump_file && (dump_flags & TDF_DETAILS))
> > + fprintf (dump_file,
> > + " Not duplicating bb %i contains too many insns.\n",
> > + header->index);
> > + return false;
> > + }
> > +
> > /* TODO: This is heuristics that claims that IV based ocnditionals will
> > likely be optimized out in duplicated header. We could use ranger
> > query instead to tell this more precisely. */
> > - if (header != loop->header
> > - && ((!lhs_invariant && !loop_iv_derived_p (loop, lhs))
> > - || (!rhs_invariant && !loop_iv_derived_p (loop, rhs))))
> > + if ((lhs_invariant || loop_iv_derived_p (loop, lhs))
> > + && (rhs_invariant || loop_iv_derived_p (loop, rhs)))
> > + return true;
> > + if (header != loop->header)
> > {
> > if (dump_file && (dump_flags & TDF_DETAILS))
> > fprintf (dump_file,
> > @@ -386,8 +425,9 @@ do_while_loop_p (class loop *loop)
> > static void
> > update_profile_after_ch (class loop *loop,
> > basic_block *region, basic_block *region_copy,
> > - unsigned n_region, edge eliminated_edge,
> > + unsigned n_region,
> > hash_set <edge> *invariant_exits,
> > + hash_set <edge> *static_exits,
> > profile_count entry_count)
> > {
> > for (unsigned int i = 0; i < n_region; i++)
> > @@ -421,10 +461,13 @@ update_profile_after_ch (class loop *loop,
> > && e_copy->dest == region_copy[i + 1]));
> > region_copy[i]->count = entry_count;
> > profile_count exit_e_count = exit_e->count ();
> > - if (eliminated_edge == exit_e)
> > + bool was_static = false;
> > + if (static_exits->contains (exit_e))
> > {
> > /* Update profile and the conditional.
> > CFG update is done by caller. */
> > + static_exits->remove (exit_e);
> > + was_static = true;
> > e_copy->probability = profile_probability::always ();
> > exit_e_copy->probability = profile_probability::never ();
> > gcond *cond_stmt
> > @@ -437,7 +480,6 @@ update_profile_after_ch (class loop *loop,
> > /* Header copying is a special case of jump threading, so use
> > common code to update loop body exit condition. */
> > update_bb_profile_for_threading (region[i], entry_count, e);
> > - eliminated_edge = NULL;
> > }
> > else
> > region[i]->count -= region_copy[i]->count;
> > @@ -448,7 +490,7 @@ update_profile_after_ch (class loop *loop,
> > loop, so increase probability accordingly.
> > If the edge is eliminated_edge we already corrected
> > profile above. */
> > - if (entry_count.nonzero_p () && eliminated_edge != exit_e)
> > + if (entry_count.nonzero_p () && !was_static)
> > set_edge_probability_and_rescale_others
> > (exit_e_copy, exit_e_count.probability_in
> > (entry_count));
> > @@ -465,7 +507,7 @@ update_profile_after_ch (class loop *loop,
> > entry_count = e_copy->count ();
> > }
> > /* Be sure that we seen all edges we are supposed to update. */
> > - gcc_checking_assert (!eliminated_edge
> > + gcc_checking_assert (static_exits->is_empty ()
> > && invariant_exits->is_empty ());
> > }
> >
> > @@ -564,10 +606,7 @@ protected:
> > unsigned int
> > ch_base::copy_headers (function *fun)
> > {
> > - basic_block header;
> > - edge exit, nonexit, entry;
> > basic_block *bbs, *copied_bbs;
> > - unsigned n_bbs;
> > unsigned bbs_size;
> > bool changed = false;
> >
> > @@ -578,7 +617,12 @@ ch_base::copy_headers (function *fun)
> > copied_bbs = XNEWVEC (basic_block, n_basic_blocks_for_fn (fun));
> > bbs_size = n_basic_blocks_for_fn (fun);
> >
> > - struct candidate {class loop *loop; edge static_exit;};
> > + struct candidate
> > + {
> > + class loop *loop;
> > + unsigned int nheaders;
> > + hash_set <edge> *invariant_exits, *static_exits;
> > + };
> > auto_vec<struct candidate> candidates;
> > auto_vec<std::pair<edge, loop_p> > copied;
> > auto_vec<class loop *> loops_to_unloop;
> > @@ -588,13 +632,14 @@ ch_base::copy_headers (function *fun)
> > gimple_ranger *ranger = new gimple_ranger;
> > for (auto loop : loops_list (cfun, 0))
> > {
> > - int initial_limit = param_max_loop_header_insns;
> > + int initial_limit = optimize_loop_for_speed_p (loop)
> > + ? param_max_loop_header_insns : 0;
> > int remaining_limit = initial_limit;
> > if (dump_file && (dump_flags & TDF_DETAILS))
> > fprintf (dump_file,
> > "Analyzing loop %i\n", loop->num);
> >
> > - header = loop->header;
> > + basic_block header = loop->header;
> > if (!get_max_loop_iterations_int (loop))
> > {
> > if (dump_file && (dump_flags & TDF_DETAILS))
> > @@ -613,24 +658,38 @@ ch_base::copy_headers (function *fun)
> > || !process_loop_p (loop))
> > continue;
> >
> > - edge static_exit = static_loop_exit (loop, ranger);
> > -
> > - /* Avoid loop header copying when optimizing for size unless we can
> > - determine that the loop condition is static in the first
> > - iteration. */
> > - if (optimize_loop_for_size_p (loop)
> > - && !loop->force_vectorize
> > - && !static_exit)
> > + /* Iterate the header copying up to limit; this takes care of the cases
> > + like while (a && b) {...}, where we want to have both of the conditions
> > + copied. TODO -- handle while (a || b) - like cases, by not requiring
> > + the header to have just a single successor and copying up to
> > + postdominator. */
> > + int nheaders = 0;
> > + hash_set <edge> *invariant_exits = new hash_set <edge>;
> > + hash_set <edge> *static_exits = new hash_set <edge>;
> > + while (should_duplicate_loop_header_p (header, loop, ranger,
> > + &remaining_limit,
> > + invariant_exits,
> > + static_exits))
> > {
> > + nheaders++;
> > if (dump_file && (dump_flags & TDF_DETAILS))
> > - fprintf (dump_file,
> > - " Not duplicating bb %i: optimizing for size.\n",
> > - header->index);
> > - continue;
> > + fprintf (dump_file, " Will duplicate bb %i\n", header->index);
> > +
> > + /* Find a successor of header that is inside a loop; i.e. the new
> > + header after the condition is copied. */
> > + if (flow_bb_inside_loop_p (loop, EDGE_SUCC (header, 0)->dest))
> > + header = EDGE_SUCC (header, 0)->dest;
> > + else
> > + header = EDGE_SUCC (header, 1)->dest;
> > }
> >
> > - if (should_duplicate_loop_header_p (header, loop, &remaining_limit, NULL))
> > - candidates.safe_push ({loop, static_exit});
> > + if (nheaders)
> > + candidates.safe_push ({loop, nheaders, invariant_exits, static_exits});
> > + else
> > + {
> > + delete invariant_exits;
> > + delete static_exits;
> > + }
> > }
> > /* Do not use ranger after we change the IL and not have updated SSA. */
> > delete ranger;
> > @@ -638,37 +697,25 @@ ch_base::copy_headers (function *fun)
> > for (auto candidate : candidates)
> > {
> > class loop *loop = candidate.loop;
> > - int initial_limit = param_max_loop_header_insns;
> > - int remaining_limit = initial_limit;
> > if (dump_file && (dump_flags & TDF_DETAILS))
> > fprintf (dump_file,
> > "Copying headers of loop %i\n", loop->num);
> >
> > - header = loop->header;
> > -
> > - /* Iterate the header copying up to limit; this takes care of the cases
> > - like while (a && b) {...}, where we want to have both of the conditions
> > - copied. TODO -- handle while (a || b) - like cases, by not requiring
> > - the header to have just a single successor and copying up to
> > - postdominator. */
> > -
> > - nonexit = NULL;
> > - n_bbs = 0;
> > + basic_block header = loop->header;
> > + edge nonexit = NULL;
> > + edge exit = NULL;
> > + unsigned int n_bbs = 0;
> > int nexits = 0;
> > profile_count exit_count = profile_count::zero ();
> > profile_count entry_count = profile_count::zero ();
> > edge e;
> > edge_iterator ei;
> > - hash_set <edge> invariant_exits;
> > +
> > FOR_EACH_EDGE (e, ei, loop->header->preds)
> > if (e->src != loop->latch)
> > entry_count += e->count ();
> > - while (should_duplicate_loop_header_p (header, loop, &remaining_limit,
> > - &invariant_exits))
> > + while (n_bbs < candidate.nheaders)
> > {
> > - if (dump_file && (dump_flags & TDF_DETAILS))
> > - fprintf (dump_file, " Will duplicate bb %i\n", header->index);
> > -
> > /* Find a successor of header that is inside a loop; i.e. the new
> > header after the condition is copied. */
> > if (flow_bb_inside_loop_p (loop, EDGE_SUCC (header, 0)->dest))
> > @@ -688,21 +735,10 @@ ch_base::copy_headers (function *fun)
> > header = nonexit->dest;
> > }
> >
> > - if (!nonexit)
> > - continue;
> > -
> > if (dump_file && (dump_flags & TDF_DETAILS))
> > - {
> > - fprintf (dump_file,
> > - "Duplicating header of the loop %d up to edge %d->%d,"
> > - " %i insns.\n",
> > - loop->num, exit->src->index, exit->dest->index,
> > - initial_limit - remaining_limit);
> > - if (candidate.static_exit)
> > - fprintf (dump_file,
> > - " Will eliminate peeled conditional in bb %d.\n",
> > - candidate.static_exit->src->index);
> > - }
> > + fprintf (dump_file,
> > + "Duplicating header of the loop %d up to edge %d->%d\n",
> > + loop->num, exit->src->index, exit->dest->index);
> >
> > /* Ensure that the header will have just the latch as a predecessor
> > inside the loop. */
> > @@ -712,20 +748,25 @@ ch_base::copy_headers (function *fun)
> > exit = single_pred_edge (header);
> > }
> >
> > - entry = loop_preheader_edge (loop);
> > + edge entry = loop_preheader_edge (loop);
> >
> > propagate_threaded_block_debug_into (exit->dest, entry->dest);
> > if (!gimple_duplicate_seme_region (entry, exit, bbs, n_bbs, copied_bbs,
> > true))
> > {
> > + delete candidate.static_exits;
> > + delete candidate.invariant_exits;
> > if (dump_file && (dump_flags & TDF_DETAILS))
> > fprintf (dump_file, "Duplication failed.\n");
> > continue;
> > }
> > if (loop->header->count.initialized_p ())
> > update_profile_after_ch (loop, bbs, copied_bbs, n_bbs,
> > - candidate.static_exit, &invariant_exits,
> > + candidate.invariant_exits,
> > + candidate.static_exits,
> > entry_count);
> > + delete candidate.static_exits;
> > + delete candidate.invariant_exits;
> > copied.safe_push (std::make_pair (entry, loop));
> >
> > /* If the loop has the form "for (i = j; i < j + 10; i++)" then
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> HRB 36809 (AG Nuernberg)
--
BR,
Hongtao
next prev parent reply other threads:[~2023-08-22 5:24 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-14 12:22 Jan Hubicka
2023-07-17 9:17 ` Richard Biener
2023-08-22 5:24 ` Hongtao Liu [this message]
2023-08-22 7:53 ` Richard Biener
2023-08-22 12:03 ` Jan Hubicka
2023-08-23 8:34 ` Jan Hubicka
2023-08-23 8:58 ` Richard Biener
2023-07-20 7:09 loop-ch " Jan Hubicka
2023-07-20 13:03 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMZc-bxpkOmDttrXG200g4jb1aRs6pMOhZrcnGRmgdE6yrc3_g@mail.gmail.com \
--to=crazylht@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=hubicka@ucw.cz \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).