From: Richard Biener <richard.guenther@gmail.com>
To: Aldy Hernandez <aldyh@redhat.com>
Cc: Michael Matz <matz@suse.de>, Jeff Law <jeffreyalaw@gmail.com>,
GCC Mailing List <gcc@gcc.gnu.org>,
Andrew MacLeod <amacleod@redhat.com>
Subject: Re: More aggressive threading causing loop-interchange-9.c regression
Date: Wed, 8 Sep 2021 15:13:38 +0200 [thread overview]
Message-ID: <CAFiYyc3ZBApnN3ks3YExLTLQ8tvEdFV-uuU2MXVVqXFYn+cdRw@mail.gmail.com> (raw)
In-Reply-To: <cb4191f1-0432-8740-b327-078c2aaff0fa@redhat.com>
On Wed, Sep 8, 2021 at 12:44 PM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> First of all. Thanks so much for this incredibly useful explanation.
> It helps a lot.
>
> I'm still trying to wrap my head around this, so please bear with me.
>
> On 9/7/21 4:45 PM, Michael Matz wrote:
> > Hello,
> >
> > On Tue, 7 Sep 2021, Aldy Hernandez via Gcc wrote:
> >
> >> The regression comes from the simple_reduc_1() function in
> >> tree-ssa/loop-interchange-9.c, and it involves the following path:
> >>
> >> =========== BB 5 ============
> >> Imports: n_10(D) j_19
> >> Exports: n_10(D) j_13 j_19
> >> j_13 : j_19(I)
> >> n_10(D) int [1, 257]
> >> j_19 int [0, 256]
> >> Relational : (j_13 > j_19)
> >> <bb 5> [local count: 118111600]:
> >> # sum_21 = PHI <sum_14(4), sum_11(3)>
> >> c[j_19] = sum_21;
> >> j_13 = j_19 + 1;
> >> if (n_10(D) > j_13)
> >> goto <bb 9>; [89.00%]
> >> else
> >> goto <bb 6>; [11.00%]
> >
> > So, this is the outer loops exit block ...
> >
> >> =========== BB 9 ============
> >> n_10(D) int [2, 257]
> >> j_13 int [1, 256]
> >> Relational : (n_10(D) > j_19)
> >> Relational : (n_10(D) > j_13)
> >> <bb 9> [local count: 105119324]:
> >> goto <bb 3>; [100.00%]
> >
> > ... this the outer loops latch block ...
> >
> >> =========== BB 3 ============
> >> Imports: n_10(D)
> >> Exports: n_10(D)
> >> n_10(D) int [1, +INF]
> >> <bb 3> [local count: 118111600]:
> >> # j_19 = PHI <j_13(9), 0(7)>
> >> sum_11 = c[j_19];
> >> if (n_10(D) > 0)
> >> goto <bb 8>; [89.00%]
> >> else
> >> goto <bb 5>; [11.00%]
> >
> > ... and this the outer loops header, as well as inner loops pre-header.
>
> Actually, the inner loop's pre-header seems to be BB8, which is empty,
> and leads into the BB4 which is the header of the inner loop.
>
> > The attempted thread hence involves a back-edge (of the outer loop) and a
> > loop-entry edge (bb3->bb8) of the inner loop. Doing that threading would
> > destroy some properties that our loop optimizers rely on, e.g. that the
> > loop header of a natural loop is only reached by two edges: entry edge and
> > back edge, and that the latch blocks are empty and that there's only a
> > single latch. (What exactly we require depends on several flags in
> > loops_state_satisfies_p).
>
> But threading 3->8 doesn't involve a loop-entry edge. It involves a new
> edge into the *pre-header* of the inner loop.
>
> I am attaching the IL right after threading and before we clean up the
> CFG (il-after-threading.txt).
>
> After threading we have have rewritten the 5->9 edge into 5->11:
>
> <bb 5> [local count: 118111600]:
> # sum_21 = PHI <sum_14(4), sum_11(3)>
> c[j_19] = sum_21;
> j_13 = j_19 + 1;
> if (n_10(D) > j_13)
> goto <bb 11>; [89.00%]
> else
> goto <bb 6>; [11.00%]
>
> <bb 11> [local count: 105119324]:
>
> <bb 12> [local count: 105119324]:
> # j_7 = PHI <j_13(11)>
> sum_6 = c[j_19];
> goto <bb 8>; [100.00%]
>
> <bb 9> [local count: 0]: ;; This becomes unreachable.
> goto <bb 3>; [100.00%]
>
> Notice BB9 becomes unreachable, so we're basically eliding the latch in BB9.
>
> Question: could BB12 not be considered the loop latch now and BB8 be the
> outer loop's entry? This would also mean, that BB3 is now outside of
> the outer loop, which means the threader peeled off the first iteration
> of the loop. Or is it a requirement that the latch be empty?
>
> >
> >> With knowledge from previous passes (SSA_NAME_RANGE_INFO), we know that
> >> the loop cannot legally iterate outside the size of c[256]. So, j_13
> >> lies within [1, 257] and n_10 is [2, +INF] at the end of the path.
> >> This allows us to thread BB3 to BB8.
> >
> > So, IIUC doing this threading would create a new entry to BB8: it would
> > then be entered by its natural entry (bb3), by its natural back edge
> > (whatever bb that is now) and the new entry from the threaded outer back
> > edge (bb9 or bb5?).
>
> As I mentioned, BB8 looks like the pre-header to the inner loop. But
> yes, it now has multiple entries: the edge from BB3, and the back edge
> from BB12 (which seems like it could be a new latch, since it's the only
> back edge).
It would be helpful to have the patch causing the issue to look at the IL.
But as Micha said, there needs to be a perfect loop nest for interchange
to work.
Richard.
> >
> > The inner loop wouldn't then be recognized as natural anymore and the
> > whole nest not as perfect, and hence loop interchange can't easily happen
> > anymore. Most other structural loop optimizers of us would have problems
> > with that situation as well.
>
> If I clean up the CFG and call loop_optimizer_init() again at this
> point, what is destroyed is the outer loop, not the inner loop. If you
> look at the IL at this point
> (il-after-cleanup-and-rerunning-loop-init.txt), we have a latch in BB7
> going back up to BB4, though the conditional is now in BB6 (eeech):
>
> <bb 4>
> ...
> ...
> ...
>
>
> <bb 6> [local count: 118111600]:
> # sum_21 = PHI <sum_14(5), sum_11(3)>
> # j_18 = PHI <j_17(5), j_19(3)>
> c[j_18] = sum_21;
> j_13 = j_18 + 1;
> if (n_10(D) > j_13)
> goto <bb 7>; [89.00%]
> else
> goto <bb 8>; [11.00%]
>
> <bb 7> [local count: 105119324]:
> # j_7 = PHI <j_13(6)>
> sum_6 = c[j_7];
> goto <bb 4>; [100.00%]
>
> Perhaps, this looks sufficiently different that the loop optimizer can't
> recognize it as a loop?
>
> >
> >> All the blocks lie within the same loop, and the path passes all the
> >> tests in path_profitable_p().
> >>
> >> Is there something about the shape of this path that should make it
> >> invalid in the backward threader, or should the test be marked with
> >> -fdisable-tree-threadN (or something else entirely)?
> >
> > This is a functionality test checking that the very necessary interchange
> > in this test does happen with default plus -floop-interchange (with the
> > intention of it being enabled by O3 or with profile info). So no
> > additional flags can be added without changing the meaning of this test.
>
> Agreed. We're shuffling things around. What I'm missing here is why
> we're unable to recognize this new form as a loop. That being said, I
> am not above figuring out what the magic incantation is to keep the
> threader from doing this transformation. However, I just want to make
> sure we're not papering over something else.
>
> >
> >> Note that the
> >> backward threader is allowed to peel through loop headers.
> >
> > Something needs to give way in the path threaders before loop
> > optimizations: either threading through back edges, through loop latches
> > or through loop headers needs to be disabled. I think traditionally at
> > least threading through latches should be disabled, because doing so
> > usually destroys simple loop structure. I see that profitable_path_p() of
> > the backward threader wants to be careful in some situations involving
> > loops and latches; possibly it's not careful enough yet for the additional
> > power brought by ranger.
> >
> > See also the additional tests tree-cfgcleanup.c:tree_forwarder_block_p is
> > doing when loops are active.
> >
> > After the structural loop optimizers the threader can go wild and thread
> > everything it finds.
>
> Thanks again. This is very helpful.
>
> Aldy
next prev parent reply other threads:[~2021-09-08 13:13 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-07 11:49 Aldy Hernandez
2021-09-07 14:45 ` Michael Matz
2021-09-08 10:44 ` Aldy Hernandez
2021-09-08 13:13 ` Richard Biener [this message]
2021-09-08 13:25 ` Aldy Hernandez
2021-09-08 13:49 ` Richard Biener
2021-09-08 16:19 ` Aldy Hernandez
2021-09-08 16:39 ` Michael Matz
2021-09-08 18:13 ` Michael Matz
2021-09-09 6:57 ` Richard Biener
2021-09-09 7:37 ` Aldy Hernandez
2021-09-09 7:45 ` Richard Biener
2021-09-09 8:36 ` Aldy Hernandez
2021-09-09 8:58 ` Richard Biener
2021-09-09 9:21 ` Aldy Hernandez
2021-09-09 10:15 ` Richard Biener
2021-09-09 11:28 ` Aldy Hernandez
2021-09-10 15:51 ` Jeff Law
2021-09-10 16:11 ` Aldy Hernandez
2021-09-10 15:43 ` Jeff Law
2021-09-10 16:05 ` Aldy Hernandez
2021-09-10 16:21 ` Jeff Law
2021-09-10 16:38 ` Aldy Hernandez
2021-09-09 16:59 ` Jeff Law
2021-09-09 12:47 ` Michael Matz
2021-09-09 8:14 ` Aldy Hernandez
2021-09-09 8:24 ` Richard Biener
2021-09-09 12:52 ` Michael Matz
2021-09-09 13:37 ` Aldy Hernandez
2021-09-09 14:44 ` Michael Matz
2021-09-09 15:07 ` Aldy Hernandez
2021-09-10 7:04 ` Aldy Hernandez
2021-09-09 16:54 ` Jeff Law
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAFiYyc3ZBApnN3ks3YExLTLQ8tvEdFV-uuU2MXVVqXFYn+cdRw@mail.gmail.com \
--to=richard.guenther@gmail.com \
--cc=aldyh@redhat.com \
--cc=amacleod@redhat.com \
--cc=gcc@gcc.gnu.org \
--cc=jeffreyalaw@gmail.com \
--cc=matz@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).