public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Aldy Hernandez <aldyh@redhat.com>
To: Michael Matz <matz@suse.de>
Cc: Jeff Law <jeffreyalaw@gmail.com>,
	Richard Biener <richard.guenther@gmail.com>,
	 Andrew MacLeod <amacleod@redhat.com>,
	GCC patches <gcc-patches@gcc.gnu.org>
Subject: Re: [RFC] More jump threading restrictions in the presence of loops.
Date: Tue, 5 Oct 2021 15:33:48 +0200	[thread overview]
Message-ID: <CAGm3qMVpbV417bWxST_c5TPbyC-fMrG4dgpdoZ+47dUOCFOTJA@mail.gmail.com> (raw)
In-Reply-To: <alpine.LSU.2.20.2110041536250.17485@wotan.suse.de>

[-- Attachment #1: Type: text/plain, Size: 5156 bytes --]

On Mon, Oct 4, 2021 at 6:29 PM Michael Matz <matz@suse.de> wrote:
>
> Hello,
>
> On Mon, 4 Oct 2021, Jeff Law wrote:
>
> > And just in case it got lost.  Here's the analysis on 960218-1 on visium:
> >
> > We've got this going into ethread:
> >
> > int f (int x)
> > {
> >   int D.1420;
> >   int a;
> >
> > ;;   basic block 2, loop depth 0, maybe hot
> > ;;    prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
> > ;;    pred:       ENTRY (FALLTHRU,EXECUTABLE)
> >   a_4 = ~x_3(D);
> >   goto <bb 4>; [INV]
> > ;;    succ:       4 (FALLTHRU,EXECUTABLE)
> >
> > ;;   basic block 3, loop depth 1, maybe hot
> > ;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
> > ;;    pred:       4 (TRUE_VALUE,EXECUTABLE)
> >   gl = a_1;
> > ;;    succ:       4 (FALLTHRU,DFS_BACK,EXECUTABLE)
> >
> > ;;   basic block 4, loop depth 1, maybe hot
> > ;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
> > ;;    pred:       2 (FALLTHRU,EXECUTABLE)
> > ;;                3 (FALLTHRU,DFS_BACK,EXECUTABLE)
> >   # a_1 = PHI <a_4(2), 0(3)>
> >   if (a_1 != 0)
> >     goto <bb 3>; [INV]
> >   else
> >     goto <bb 5>; [INV]
> > ;;    succ:       3 (TRUE_VALUE,EXECUTABLE)
> > ;;                5 (FALSE_VALUE,EXECUTABLE)
> >
> > ;;   basic block 5, loop depth 0, maybe hot
> > ;;    prev block 4, next block 1, flags: (NEW, REACHABLE, VISITED)
> > ;;    pred:       4 (FALSE_VALUE,EXECUTABLE)
> >   return;
> > ;;    succ:       EXIT
> >
> > }
>
> First notice that this doesn't have an empty latch block to start with
> (in fact, there is no separate latch block at all), so the loop optimizers
> haven't been initialized for simple latches at this point.  Still, I would
> say that even then we want to be careful with the latch edge, as putting
> code onto it will most probably create a problem downstream once we _do_
> want to intialize the loop optimizers for simple latches.  So, that we are
> careful here is okay, we are just too careful in this specific case.
>
> > There's a pretty obvious jump thread in there.  3->4->5.
> >
> > We used to allow this, but it's currently being rejected and I'm not
> > sure it should.
> >
> > In simplest terms we're threading from inside the loop, through the
> > latch to a point outside the loop.  At first glance, that seems safe.
>
> Is at least the unrestricted jump threader (the one after loop optimizers)
> picking this up?
>
> Independend of that, I agree, that this specific instance of threading
> through the latch block, even though followed by to-be-copied
> instructions, is okay.  We need to determine precisely when that is okay,
> though.  In this case there are several reasons I could see why this is
> so:
> (a) while the thread is through the latch block, it's _not_ through the
>     latch edge (which is 4->3).  That would create the problem, so here
>     we're fine.

It seems we all agree Jeff's finding should be allowed, so let's
attack this one first, since it gets almost all of his visium
failures.  I can submit the rest of the cases separately.

The attached patch catches the IL discussed, and adds a relevant
gimple FE test others can use for experimenting :).

Tested on x86-64 and by visual inspection on visium-elf on the
regressions Jeff pointed me at.

OK?

BTW Jeff, this fixes all the regressions you mention except:

1. pr68911.c

The path not being threaded here is 7->10->11->12.  It crosses loops
multiple times, so I believe the restriction code is correct.

7, 10, 12 are in loop1.
11 is in loop 2.

So we have a path going from loop1 -> loop2 -> loop1.  I can't
conceive any scenario where this is ok, but I can attach the entire IL
if I'm missing something.

2. 961125-1.c

This one is a bit trickier.  Here we're trying to thread the following
conditional, which I mentioned when I contributed this work, we don't
handle (and happens infrequently enough not to matter):

+  // Loop 4
+  <bb 6> [local count: 114863531]:
+  # ptr_8 = PHI <ptr_2(4), ptr_2(5)>
+  if (ptr_8 < &MEM <char[4]> [(void *)":ab" + 3B])
+    goto <bb 7>; [50.00%]
+  else
+    goto <bb 17>; [50.00%]

The hybrid threader doesn't handle &MEM in the final conditional.  As
I mentioned earlier, if this becomes an issue, we can adapt class
pointer_equiv_analyzer like we did for evrp.  I have a gimple FE test
I will contribute as an XFAIL with an associated PR to keep us honest.

That being said... in this particular test, this is all irrelevant
because the path will be disallowed for two reasons:

a) The path crosses loops, and the reason we didn't realize it in the
old code was because the ASSERT_EXPR had pulled the SSA outside the
loop, so it looks like the entire path is l in the same loop.  If you
look at the original IL, it's not.

b) Now the path actually fits the pattern being discussed in this
patch, where there's an early exit out of a loop, so it looks like we
should handle it.  But...in this case, we would fill a presently empty
latch.  Interestingly, the old code didn't catch it, because....you
guessed it...there was an ASSERT_EXPR in the latch.

So I argue that even in the absence of MEM_REF handling, we shouldn't
thread this path.

[-- Attachment #2: 0001-Loosen-loop-crossing-restriction-in-threader.patch --]
[-- Type: text/x-patch, Size: 5004 bytes --]

From 5abe65668f602d53b8f3dc515508dc4616beb048 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez <aldyh@redhat.com>
Date: Tue, 5 Oct 2021 15:03:34 +0200
Subject: [PATCH] Loosen loop crossing restriction in threader.

Crossing loops is generally discouraged from the threader, but we can
make an exception when we don't cross the latch or enter another loop,
since this is just an early exit out of the loop.

In fact, the whole threaded path is logically outside the loop.  This
has nice secondary effects.  For example, objects on the threaded path
will no longer necessarily be live throughout the loop, so we can get
register allocation improvements.  The threaded path can physically
move outside the loop resulting in better icache efficiency, etc.

Tested on x86-64 Linux, and on a visium-elf cross making sure that the
following tests do not have an abort in the final assembly:

gcc.c-torture/execute/960218-1.c
gcc.c-torture/execute/visium-pending-4.c
gcc.c-torture/execute/pr58209.c

gcc/ChangeLog:

	* tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths):

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/ssa-thread-valid.c: New test.

Co-authored-by: Jeff Law <jeffrealaw@gmail.com>
---
 .../gcc.dg/tree-ssa/ssa-thread-valid.c        | 39 ++++++++++++++++++
 gcc/tree-ssa-threadupdate.c                   | 40 ++++++++++++++-----
 2 files changed, 68 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c
new file mode 100644
index 00000000000..7adca97cc2b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c
@@ -0,0 +1,39 @@
+// { dg-do compile }
+// { dg-options "-O2 -fgimple -fdump-statistics" }
+
+// This is a collection of threadable paths.  To simplify maintenance,
+// there should only be one threadable path per function.
+
+int global;
+
+// The thread from 3->4->5 crosses loops but is allowed because it
+// never crosses the latch (BB3) and is just an early exit out of the
+// loop.
+int __GIMPLE (ssa)
+foo1 (int x)
+{
+  int D_1420;
+  int a;
+
+  __BB(2):
+  a_4 = ~x_3(D);
+  goto __BB4;
+
+  // Latch.
+  __BB(3):
+  global = a_1;
+  goto __BB4;
+
+  __BB(4,loop_header(1)):
+  a_1 = __PHI (__BB2: a_4, __BB3: 0);
+  if (a_1 != 0)
+    goto __BB3;
+  else
+    goto __BB5;
+
+  __BB(5):
+  return;
+
+}
+
+// { dg-final { scan-tree-dump "Jumps threaded\" \"foo1\" 1" "statistics" } }
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index dcabfdb30d2..32ce1e3af40 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2766,10 +2766,17 @@ bool
 jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path)
 {
   gcc_checking_assert (!path.is_empty ());
-  edge taken_edge = path[path.length () - 1]->e;
-  loop_p loop = taken_edge->src->loop_father;
+  edge entry = path[0]->e;
+  edge exit = path[path.length () - 1]->e;
   bool seen_latch = false;
-  bool path_crosses_loops = false;
+  int loops_crossed = 0;
+  bool crossed_latch = false;
+  // Use ->dest here instead of ->src to ignore the first block.  The
+  // first block is allowed to be in a different loop, since it'll be
+  // redirected.  See similar comment in profitable_path_p: "we don't
+  // care about that block...".
+  loop_p loop = entry->dest->loop_father;
+  loop_p curr_loop = loop;
 
   for (unsigned int i = 0; i < path.length (); i++)
     {
@@ -2784,19 +2791,30 @@ jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path)
 	}
 
       if (loop->latch == e->src || loop->latch == e->dest)
-	seen_latch = true;
+	{
+	  seen_latch = true;
+	  // Like seen_latch, but excludes the first block.
+	  if (e->src != entry->src)
+	    crossed_latch = true;
+	}
 
-      // The first entry represents the block with an outgoing edge
-      // that we will redirect to the jump threading path.  Thus we
-      // don't care about that block's loop father.
-      if ((i > 0 && e->src->loop_father != loop)
-	  || e->dest->loop_father != loop)
-	path_crosses_loops = true;
+      if (e->dest->loop_father != curr_loop)
+	{
+	  curr_loop = e->dest->loop_father;
+	  ++loops_crossed;
+	}
 
       if (flag_checking && !m_backedge_threads)
 	gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0);
     }
 
+  // If we crossed a loop into an outer loop without crossing the
+  // latch, this is just an early exit from the loop.
+  if (loops_crossed == 1
+      && !crossed_latch
+      && flow_loop_nested_p (exit->dest->loop_father, exit->src->loop_father))
+    return false;
+
   if (cfun->curr_properties & PROP_loop_opts_done)
     return false;
 
@@ -2806,7 +2824,7 @@ jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path)
 		     "would create non-empty latch");
       return true;
     }
-  if (path_crosses_loops)
+  if (loops_crossed)
     {
       cancel_thread (&path, "Path crosses loops");
       return true;
-- 
2.31.1


  parent reply	other threads:[~2021-10-05 13:34 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-04  9:43 Aldy Hernandez
2021-10-04 13:31 ` Jeff Law
2021-10-04 13:36   ` Aldy Hernandez
2021-10-04 15:30     ` Jeff Law
2021-10-04 16:29       ` Michael Matz
2021-10-05 11:22         ` Richard Biener
2021-10-05 12:43           ` Michael Matz
2021-10-05 14:56           ` Jeff Law
2021-10-05 13:33         ` Aldy Hernandez [this message]
2021-10-05 15:10           ` Jeff Law
2021-10-05 16:08           ` Jeff Law
2021-10-05 16:22             ` Aldy Hernandez
2021-10-06 13:15           ` Andreas Schwab
2021-10-06 13:47             ` Aldy Hernandez
2021-10-06 15:03               ` Martin Sebor
2021-10-06 16:22                 ` Aldy Hernandez
2021-10-06 17:03                   ` Aldy Hernandez
2021-10-06 19:11                     ` Martin Sebor
2021-10-06 23:00                   ` Jeff Law
2021-10-06 23:06               ` Jeff Law
2021-10-07  8:15                 ` Aldy Hernandez
2021-10-07 13:35                   ` Jeff Law
2021-10-06 10:22 ` Aldy Hernandez
2021-10-14  9:25   ` Aldy Hernandez
2021-10-14 14:23     ` Jeff Law
2021-10-17  1:32   ` Jeff Law
2021-10-18 10:14     ` Aldy Hernandez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGm3qMVpbV417bWxST_c5TPbyC-fMrG4dgpdoZ+47dUOCFOTJA@mail.gmail.com \
    --to=aldyh@redhat.com \
    --cc=amacleod@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=matz@suse.de \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).