Re: More aggressive threading causing loop-interchange-9.c regression

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <richard.guenther@gmail.com>
To: Aldy Hernandez <aldyh@redhat.com>
Cc: Michael Matz <matz@suse.de>, Jeff Law <jeffreyalaw@gmail.com>,
	 GCC Mailing List <gcc@gcc.gnu.org>,
	Andrew MacLeod <amacleod@redhat.com>
Subject: Re: More aggressive threading causing loop-interchange-9.c regression
Date: Wed, 8 Sep 2021 15:13:38 +0200	[thread overview]
Message-ID: <CAFiYyc3ZBApnN3ks3YExLTLQ8tvEdFV-uuU2MXVVqXFYn+cdRw@mail.gmail.com> (raw)
In-Reply-To: <cb4191f1-0432-8740-b327-078c2aaff0fa@redhat.com>

On Wed, Sep 8, 2021 at 12:44 PM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> First of all.  Thanks so much for this incredibly useful explanation.
> It helps a lot.
>
> I'm still trying to wrap my head around this, so please bear with me.
>
> On 9/7/21 4:45 PM, Michael Matz wrote:
> > Hello,
> >
> > On Tue, 7 Sep 2021, Aldy Hernandez via Gcc wrote:
> >
> >> The regression comes from the simple_reduc_1() function in
> >> tree-ssa/loop-interchange-9.c, and it involves the following path:
> >>
> >> =========== BB 5 ============
> >> Imports: n_10(D)  j_19
> >> Exports: n_10(D)  j_13  j_19
> >>           j_13 : j_19(I)
> >> n_10(D)      int [1, 257]
> >> j_19 int [0, 256]
> >> Relational : (j_13 > j_19)
> >>      <bb 5> [local count: 118111600]:
> >>      # sum_21 = PHI <sum_14(4), sum_11(3)>
> >>      c[j_19] = sum_21;
> >>      j_13 = j_19 + 1;
> >>      if (n_10(D) > j_13)
> >>        goto <bb 9>; [89.00%]
> >>      else
> >>        goto <bb 6>; [11.00%]
> >
> > So, this is the outer loops exit block ...
> >
> >> =========== BB 9 ============
> >> n_10(D)      int [2, 257]
> >> j_13 int [1, 256]
> >> Relational : (n_10(D) > j_19)
> >> Relational : (n_10(D) > j_13)
> >>      <bb 9> [local count: 105119324]:
> >>      goto <bb 3>; [100.00%]
> >
> > ... this the outer loops latch block ...
> >
> >> =========== BB 3 ============
> >> Imports: n_10(D)
> >> Exports: n_10(D)
> >> n_10(D)      int [1, +INF]
> >>      <bb 3> [local count: 118111600]:
> >>      # j_19 = PHI <j_13(9), 0(7)>
> >>      sum_11 = c[j_19];
> >>      if (n_10(D) > 0)
> >>        goto <bb 8>; [89.00%]
> >>      else
> >>        goto <bb 5>; [11.00%]
> >
> > ... and this the outer loops header, as well as inner loops pre-header.
>
> Actually, the inner loop's pre-header seems to be BB8, which is empty,
> and leads into the BB4 which is the header of the inner loop.
>
> > The attempted thread hence involves a back-edge (of the outer loop) and a
> > loop-entry edge (bb3->bb8) of the inner loop.  Doing that threading would
> > destroy some properties that our loop optimizers rely on, e.g. that the
> > loop header of a natural loop is only reached by two edges: entry edge and
> > back edge, and that the latch blocks are empty and that there's only a
> > single latch.  (What exactly we require depends on several flags in
> > loops_state_satisfies_p).
>
> But threading 3->8 doesn't involve a loop-entry edge.  It involves a new
> edge into the *pre-header* of the inner loop.
>
> I am attaching the IL right after threading and before we clean up the
> CFG (il-after-threading.txt).
>
> After threading we have have rewritten the 5->9 edge into 5->11:
>
>    <bb 5> [local count: 118111600]:
>    # sum_21 = PHI <sum_14(4), sum_11(3)>
>    c[j_19] = sum_21;
>    j_13 = j_19 + 1;
>    if (n_10(D) > j_13)
>      goto <bb 11>; [89.00%]
>    else
>      goto <bb 6>; [11.00%]
>
>    <bb 11> [local count: 105119324]:
>
>    <bb 12> [local count: 105119324]:
>    # j_7 = PHI <j_13(11)>
>    sum_6 = c[j_19];
>    goto <bb 8>; [100.00%]
>
>    <bb 9> [local count: 0]:     ;; This becomes unreachable.
>    goto <bb 3>; [100.00%]
>
> Notice BB9 becomes unreachable, so we're basically eliding the latch in BB9.
>
> Question: could BB12 not be considered the loop latch now and BB8 be the
> outer loop's entry?  This would also mean, that BB3 is now outside of
> the outer loop, which means the threader peeled off the first iteration
> of the loop.  Or is it a requirement that the latch be empty?
>
> >
> >> With knowledge from previous passes (SSA_NAME_RANGE_INFO), we know that
> >> the loop cannot legally iterate outside the size of c[256].  So, j_13
> >> lies within [1, 257] and n_10 is [2, +INF] at the end of the path.
> >> This allows us to thread BB3 to BB8.
> >
> > So, IIUC doing this threading would create a new entry to BB8: it would
> > then be entered by its natural entry (bb3), by its natural back edge
> > (whatever bb that is now) and the new entry from the threaded outer back
> > edge (bb9 or bb5?).
>
> As I mentioned, BB8 looks like the pre-header to the inner loop.  But
> yes, it now has multiple entries: the edge from BB3, and the back edge
> from BB12 (which seems like it could be a new latch, since it's the only
> back edge).

It would be helpful to have the patch causing the issue to look at the IL.
But as Micha said, there needs to be a perfect loop nest for interchange
to work.

Richard.

> >
> > The inner loop wouldn't then be recognized as natural anymore and the
> > whole nest not as perfect, and hence loop interchange can't easily happen
> > anymore.  Most other structural loop optimizers of us would have problems
> > with that situation as well.
>
> If I clean up the CFG and call loop_optimizer_init() again at this
> point, what is destroyed is the outer loop, not the inner loop.  If you
> look at the IL at this point
> (il-after-cleanup-and-rerunning-loop-init.txt), we have a latch in BB7
> going back up to BB4, though the conditional is now in BB6 (eeech):
>
>    <bb 4>
> ...
> ...
> ...
>
>
>    <bb 6> [local count: 118111600]:
>    # sum_21 = PHI <sum_14(5), sum_11(3)>
>    # j_18 = PHI <j_17(5), j_19(3)>
>    c[j_18] = sum_21;
>    j_13 = j_18 + 1;
>    if (n_10(D) > j_13)
>      goto <bb 7>; [89.00%]
>    else
>      goto <bb 8>; [11.00%]
>
>    <bb 7> [local count: 105119324]:
>    # j_7 = PHI <j_13(6)>
>    sum_6 = c[j_7];
>    goto <bb 4>; [100.00%]
>
> Perhaps, this looks sufficiently different that the loop optimizer can't
> recognize it as a loop?
>
> >
> >> All the blocks lie within the same loop, and the path passes all the
> >> tests in path_profitable_p().
> >>
> >> Is there something about the shape of this path that should make it
> >> invalid in the backward threader, or should the test be marked with
> >> -fdisable-tree-threadN (or something else entirely)?
> >
> > This is a functionality test checking that the very necessary interchange
> > in this test does happen with default plus -floop-interchange (with the
> > intention of it being enabled by O3 or with profile info).  So no
> > additional flags can be added without changing the meaning of this test.
>
> Agreed.  We're shuffling things around.   What I'm missing here is why
> we're unable to recognize this new form as a loop.  That being said, I
> am not above figuring out what the magic incantation is to keep the
> threader from doing this transformation.  However, I just want to make
> sure we're not papering over something else.
>
> >
> >> Note that the
> >> backward threader is allowed to peel through loop headers.
> >
> > Something needs to give way in the path threaders before loop
> > optimizations: either threading through back edges, through loop latches
> > or through loop headers needs to be disabled.  I think traditionally at
> > least threading through latches should be disabled, because doing so
> > usually destroys simple loop structure.  I see that profitable_path_p() of
> > the backward threader wants to be careful in some situations involving
> > loops and latches; possibly it's not careful enough yet for the additional
> > power brought by ranger.
> >
> > See also the additional tests tree-cfgcleanup.c:tree_forwarder_block_p is
> > doing when loops are active.
> >
> > After the structural loop optimizers the threader can go wild and thread
> > everything it finds.
>
> Thanks again.  This is very helpful.
>
> Aldy

next prev parent reply	other threads:[~2021-09-08 13:13 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-07 11:49 Aldy Hernandez
2021-09-07 14:45 ` Michael Matz
2021-09-08 10:44   ` Aldy Hernandez
2021-09-08 13:13     ` Richard Biener [this message]
2021-09-08 13:25       ` Aldy Hernandez
2021-09-08 13:49         ` Richard Biener
2021-09-08 16:19           ` Aldy Hernandez
2021-09-08 16:39             ` Michael Matz
2021-09-08 18:13               ` Michael Matz
2021-09-09  6:57                 ` Richard Biener
2021-09-09  7:37                   ` Aldy Hernandez
2021-09-09  7:45                     ` Richard Biener
2021-09-09  8:36                       ` Aldy Hernandez
2021-09-09  8:58                         ` Richard Biener
2021-09-09  9:21                           ` Aldy Hernandez
2021-09-09 10:15                             ` Richard Biener
2021-09-09 11:28                               ` Aldy Hernandez
2021-09-10 15:51                               ` Jeff Law
2021-09-10 16:11                                 ` Aldy Hernandez
2021-09-10 15:43                             ` Jeff Law
2021-09-10 16:05                               ` Aldy Hernandez
2021-09-10 16:21                                 ` Jeff Law
2021-09-10 16:38                                   ` Aldy Hernandez
2021-09-09 16:59                           ` Jeff Law
2021-09-09 12:47                   ` Michael Matz
2021-09-09  8:14                 ` Aldy Hernandez
2021-09-09  8:24                   ` Richard Biener
2021-09-09 12:52                   ` Michael Matz
2021-09-09 13:37                     ` Aldy Hernandez
2021-09-09 14:44                       ` Michael Matz
2021-09-09 15:07                         ` Aldy Hernandez
2021-09-10  7:04                         ` Aldy Hernandez
2021-09-09 16:54                   ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFiYyc3ZBApnN3ks3YExLTLQ8tvEdFV-uuU2MXVVqXFYn+cdRw@mail.gmail.com \
    --to=richard.guenther@gmail.com \
    --cc=aldyh@redhat.com \
    --cc=amacleod@redhat.com \
    --cc=gcc@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=matz@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).