From: Richard Biener <rguenther@suse.de>
To: Xionghu Luo <luoxhu@linux.ibm.com>
Cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org,
dje.gcc@gmail.com, wschmidt@linux.ibm.com,
guojiufu@linux.ibm.com, linkw@gcc.gnu.org
Subject: Re: [PATCH v2] Fix incomplete computation in fill_always_executed_in_1
Date: Tue, 24 Aug 2021 10:20:57 +0200 (CEST) [thread overview]
Message-ID: <nycvar.YFH.7.76.2108241009380.11781@zhemvz.fhfr.qr> (raw)
In-Reply-To: <016969d3-80e2-8556-a8ac-42f166f04c19@linux.ibm.com>
On Tue, 24 Aug 2021, Xionghu Luo wrote:
>
>
> On 2021/8/19 20:11, Richard Biener wrote:
> >> - class loop *inn_loop = loop;
> >>
> >> if (ALWAYS_EXECUTED_IN (loop->header) == NULL)
> >> {
> >> @@ -3232,19 +3231,6 @@ fill_always_executed_in_1 (class loop *loop, sbitmap contains_call)
> >> to disprove this if possible). */
> >> if (bb->flags & BB_IRREDUCIBLE_LOOP)
> >> break;
> >> -
> >> - if (!flow_bb_inside_loop_p (inn_loop, bb))
> >> - break;
> >> -
> >> - if (bb->loop_father->header == bb)
> >> - {
> >> - if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> >> - break;
> >> -
> >> - /* In a loop that is always entered we may proceed anyway.
> >> - But record that we entered it and stop once we leave it. */
> >> - inn_loop = bb->loop_father;
> >> - }
> >> }
> >>
> >> while (1)
> > I'm not sure this will work correct (I'm not sure how the existing
> > code makes it so either...). That said, I can't poke any hole
> > into the change. What I see is that definitely
> >
> > if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> > last = bb;
> >
> > if (bitmap_bit_p (contains_call, bb->index))
> > break;
> >
> > doesn't work reliably since the DOM ordering will process blocks
> > A B and C in random order for
> >
> > for (;;)
> > {
> > if (cond)
> > {
> > A: foo ();
> > }
> > else B:;
> > C:;
> > }
> >
> > and thus we can end up setting 'last' to C_before_ processing
> > 'A' and thus arriving at the call foo () ...
> >
> > get_loop_body_in_dom_order does some "special sauce" but not
> > to address the above problem - but it might be that a subtle
> > issue like the above is the reason for the inner loop handling.
> > The inner loop block order does_not_ adhere to this "special sauce",
> > that is - the "Additionally, if a basic block s dominates
> > the latch, then only blocks dominated by s are be after it."
> > guarantee holds for the outer loop latch, not for the inner.
> >
> > Digging into the history of fill_always_executed_in_1 doesn't
> > reveal anything - the inner loop handling has been present
> > since introduction by Zdenek - but usually Zdenek has a reason
> > for doing things as he does;)
>
> Yes, this is really complicated usage, thanks for point it out. :)
> I constructed two cases to verify this with inner loop includes "If A; else B; C".
> Finding that fill_sons_in_loop in get_loop_body_in_dom_order will also checks
> whether the bb domintes outer loop’s latch, if C dominate outer loop’s latch,
> C is postponed, the access order is ABC, 'last' won’t be set to C if A or B contains call;
But it depends on the order of visiting ABC and that's hard to put into
a testcase since it depends on the order of edges and the processing
of the dominance computation. ABC are simply unordered with respect
to a dominator walk.
> Otherwise if C doesn’t dominate outer loop’s latch in fill_sons_in_loop,
> the access order is CAB, but 'last' also won’t be updated to C in fill_always_executed_in_1
> since there is also dominate check, then if A or B contains call, it could break
> successfully.
>
> C won't be set to ALWAYS EXECUTED for both circumstance.
>
> >
> > Note it might be simply a measure against quadratic complexity,
> > esp. since with your patch we also dive into not always executed
> > subloops as you remove the
> >
> > if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> > break;
> >
> > check. I suggest to evaluate behavior of the patch on a testcase
> > like
> >
> > void foo (int n, int **k)
> > {
> > for (int i = 0; i < n; ++i)
> > if (k[0][i])
> > for (int j = 0; j < n; ++j)
> > if (k[1][j])
> > for (int l = 0; l < n; ++l)
> > if (k[2][l])
> > ...
> > }
>
> Theoretically the complexity is changing from L1(bbs) to L1(bbs)+L2(bbs)+L3(bbs)+…+Ln(bbs),
> so fill_always_executed_in_1's execution time is supposed to be increase from
> O(n) to O(n2)? The time should depend on loop depth and bb counts. I also drafted a
> test case has 73-depth loop function with 25 no-ipa function copies each compiled
> in lim2 and lim4 dependently. Total execution time of fill_always_executed_in_1 is
> increased from 32ms to 58ms, almost doubled but not quadratic?
It's more like n + (n-1) + (n-2) + ... + 1 which is n^2/2 but that's still
O(n^2).
> It seems reasonable to see compiling time getting longer since most bbs are checked
> more but a MUST to ensure early break correctly in every loop level...
> Though loop nodes could be huge, loop depth will never be so large in actual code?
The "in practice" argument is almost always defeated by automatic
program generators ;)
> >
> > I suspect you'll see quadratic behavior with your patch. You
> > should be at least able to preserve a check like
> >
> > /* Do not process not always executed subloops to avoid
> > quadratic behavior. */
> > if (bb->loop_father->header == bb
> > && !dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> > break;
> >
> > which is of course not optimistic for cases like
> >
> > for (..)
> > {
> > if (cond)
> > for (..)
> > x = 1; // this is always executed if the inner loop is finite
> > }
> >
> > but we need to have an eye on the complexity of this function. I would
> > have suggested to do greedy visiting of the loop header successors,
> > processing further blocks if all entries (ignoring backedges) are
> > processed, setting SET_ALWAYS_EXECUTED_IN. When the worklist
> > is empty proceed to inner loops as the current code does. For
> > bookkeeping simply keep a to-visit-incoming-edges counter per BB.
> > Pseudo-code:
> >
> > bitmap_set_bit (worklist, loop-header-bb);
> > while (!bitmap_empty_p (worklist))
> > {
> > bb = pop (worklist);
>
> Need check whether bb dominates latch before SET_ALWAYS_EXECUTED_IN?
Ah, sure.
> > SET_ALWAYS_EXECUTED_IN (bb, loop);
> > if (bitmap_bit_p (contains_call, bb->index))
> > continue;
> > FOR_EACH_EDGE (e, ei, bb->succs)
> > {
> > if (!flow_bb_inside_loop_p (loop, e->dest))
> > continue;
> > if (incoming_count[e->dest->index]-- == 0)
> > push (worklist, e->dest);
> > }
> > }
>
> Sorry I don't fully understand your algorithm. worklist should be
> auto_vec<basic_block> don't support bitmap operations? Is incoming_count
> the bb's preds count, why need retain it since it it decreased to 0?
the worklist is a auto_bitmap, pop () would be
bitmap_first_set_bit/clear_bit. One could use a vector with the
caveat of eventually adding duplicate members to the worklist.
But as said, it's pseudo-code ;)
> >
> > iterate over inner loops (incoming_count can be retained,
> > we just force the inner loop header onto the worklist).
>
> Is this same to ?
>
> for (loop = loop->inner; loop; loop = loop->next)
> fill_always_executed_in_1 (loop, contains_call)
Yes, but I'd simply use for (loop : loops_list (cfun, 0)) which
should iterate over loops in PRE order.
Richard.
next prev parent reply other threads:[~2021-08-24 8:20 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-16 8:46 [PATCH] Fix incorrect " Xiong Hu Luo
2021-08-16 11:46 ` Richard Biener
2021-08-17 5:17 ` Xionghu Luo
2021-08-17 5:24 ` Xionghu Luo
2021-08-17 7:12 ` Richard Biener
2021-08-17 9:10 ` [PATCH v2] Fix incomplete " Xionghu Luo
2021-08-19 5:23 ` Xionghu Luo
2021-08-19 12:11 ` Richard Biener
2021-08-24 7:44 ` Xionghu Luo
2021-08-24 8:20 ` Richard Biener [this message]
2021-08-26 5:50 ` [PATCH v3] " Xionghu Luo
2021-08-27 7:45 ` Richard Biener
2021-08-30 8:49 ` Xionghu Luo
2021-08-30 9:19 ` Richard Biener
2021-08-31 7:43 ` Xionghu Luo
2021-08-31 11:29 ` Richard Biener
2021-08-17 20:59 ` [PATCH] Fix incorrect " Segher Boessenkool
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=nycvar.YFH.7.76.2108241009380.11781@zhemvz.fhfr.qr \
--to=rguenther@suse.de \
--cc=dje.gcc@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=guojiufu@linux.ibm.com \
--cc=linkw@gcc.gnu.org \
--cc=luoxhu@linux.ibm.com \
--cc=segher@kernel.crashing.org \
--cc=wschmidt@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).