From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <aldyh@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.133.124])
 by sourceware.org (Postfix) with ESMTP id F36363858C39
 for <gcc@gcc.gnu.org>; Fri, 10 Sep 2021 07:05:03 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F36363858C39
Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com
 [209.85.221.72]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-104-xafTyFNUN9C89HMRO0W4zA-1; Fri, 10 Sep 2021 03:05:01 -0400
X-MC-Unique: xafTyFNUN9C89HMRO0W4zA-1
Received: by mail-wr1-f72.google.com with SMTP id
 z1-20020adfec81000000b0015b085dbde3so152535wrn.14
 for <gcc@gcc.gnu.org>; Fri, 10 Sep 2021 00:05:01 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:subject:to:cc:references:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-language;
 bh=612zvAZ6oNrvtzgVlaVshAi+atdBJ25PBE3jyHQagXY=;
 b=phh+VvrBn0HNfcm4vw6YXCNxDgwxYyenjMMG04KHpOJcaxSgbOHQVL6smphfWrMRSr
 M6/JQR6KgzYUGuw3IWrq5NhyJzGOTNo9LB9KVfkAdOu3XxF5OJ6ZJl5UPhPXzio74JVD
 x1enpB6dafvQMB2L7r0nyoUq2G/15GzBuuYiW9ndFLUSo0x3P3aZytsPhOpNZGqWh7DB
 TeuFwFoIgp/2c4/2jmx88CikWLHsX9w4opt2iWeHjnh+rmcrETP49B6gZoEkpCrkx/TR
 1/87f1OwcYg197TLNK7GPjEVHRGL1AX67rGQUSRPQ5BfD/v5S6kU9eZ2V7MhCOL8Rew1
 x/og==
X-Gm-Message-State: AOAM5301wHVqSgP+VrYZ/nm/Pg8orHTBifp9TYS4soEsnEIX6esWNC/R
 RtA/oM+9km+MO3w1vZ9IWEX5ZQb+3gTdEPOEQgqFkPlnjYG3okdUhSVgtWiQjMz48clwS6Iycf6
 RVbnJhvQ=
X-Received: by 2002:a1c:7f83:: with SMTP id a125mr6835710wmd.166.1631257499919; 
 Fri, 10 Sep 2021 00:04:59 -0700 (PDT)
X-Google-Smtp-Source: ABdhPJx9iNp+S9AHPfNvKot5pKSSaNx1mtBCIvjkq26Jgp+tEKmJR5h90PEYuDXy8uWMdGgwfe8lTg==
X-Received: by 2002:a1c:7f83:: with SMTP id a125mr6835682wmd.166.1631257499557; 
 Fri, 10 Sep 2021 00:04:59 -0700 (PDT)
Received: from abulafia.quesejoda.com ([139.47.33.227])
 by smtp.gmail.com with ESMTPSA id l7sm3806448wmj.9.2021.09.10.00.04.58
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Fri, 10 Sep 2021 00:04:58 -0700 (PDT)
Subject: Re: More aggressive threading causing loop-interchange-9.c regression
To: Michael Matz <matz@suse.de>
Cc: Richard Biener <richard.guenther@gmail.com>,
 Jeff Law <jeffreyalaw@gmail.com>, GCC Mailing List <gcc@gcc.gnu.org>,
 Andrew MacLeod <amacleod@redhat.com>, gcc-patches <gcc-patches@gcc.gnu.org>
References: <09e48b82-bc51-405e-7680-89a5f08e4e8f@redhat.com>
 <alpine.LSU.2.20.2109071406070.12583@wotan.suse.de>
 <cb4191f1-0432-8740-b327-078c2aaff0fa@redhat.com>
 <CAFiYyc3ZBApnN3ks3YExLTLQ8tvEdFV-uuU2MXVVqXFYn+cdRw@mail.gmail.com>
 <d8234eed-6c60-c5fb-8800-5e9c5b932c58@redhat.com>
 <CAFiYyc2KWNEMD31AdYuNJ-dP7ixMsWtTCtokQpcbRrZctTUqzA@mail.gmail.com>
 <6d5695e4-e4eb-14a5-46a6-f425d1514008@redhat.com>
 <alpine.LSU.2.20.2109081627081.12583@wotan.suse.de>
 <alpine.LSU.2.20.2109081736530.12583@wotan.suse.de>
 <56bd6a6c-0416-7123-c792-521495d69654@redhat.com>
 <alpine.LSU.2.20.2109091249420.12583@wotan.suse.de>
 <7dad1f1f-98e3-f6c7-8cbd-d01122b72260@redhat.com>
 <alpine.LSU.2.20.2109091342460.12583@wotan.suse.de>
From: Aldy Hernandez <aldyh@redhat.com>
Message-ID: <b957b468-bf17-7d5f-42f1-fc46bfa71fc8@redhat.com>
Date: Fri, 10 Sep 2021 09:04:57 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.11.0
MIME-Version: 1.0
In-Reply-To: <alpine.LSU.2.20.2109091342460.12583@wotan.suse.de>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: multipart/mixed; boundary="------------DED459D991C7EF32778A26AD"
Content-Language: en-US
X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT,
 NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 10 Sep 2021 07:05:06 -0000

This is a multi-part message in MIME format.
--------------DED459D991C7EF32778A26AD
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

I think things are clear enough to propose a patch.  Thanks for 
everyone's input.

I have added some comments and tweaked Michael's patch to include the 
final BB where we're threading to, in the check.  Without this last 
check, we fail in tree-ssa/cunroll-15.c because the latch becomes 
polluted with PHI nodes.

OK for trunk?
Aldy

commit e735a2cd01485773635bd66c97af21059992d5a7 (HEAD -> 
pending/global-ranges)
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Thu Sep 9 20:30:28 2021 +0200

     Disable threading through latches until after loop optimizations.

     The motivation for this patch was enabling the use of global ranges in
     the path solver, but this caused certain properties of loops being
     destroyed which made subsequent loop optimizations to fail.
     Consequently, this patch's mail goal is to disable jump threading
     involving the latch until after loop optimizations have run.

     I have added a bit in cfun to indicate when the "loopdone" optimization
     has completed.  This helps the path threader determine when it's 
safe to
     thread through loop structures.  I can adapt the patch if there is an
     alternate way of determining this.

     As can be seen in the test adjustments, we mostly shift the threading
     from the early threaders (ethread, thread[12] to the late threaders
     thread[34]).  I have nuked some of the early notes in the testcases
     that came as part of the jump threader rewrite.  They're mostly noise
     now.

     Note that we could probably relax some other restrictions in
     profitable_path_p when loop optimizations have completed, but it would
     require more testing, and I'm hesitant to touch more things than needed
     at this point.  I have added a reminder to the function to keep this
     in mind.

     Finally, perhaps as a follow-up, we should apply the same 
restrictions to
     the forward threader.  At some point I'd like to combine the cost 
models.

     Tested on x86-64 Linux.

     p.s. There is a thorough discussion involving the limitations of jump
     threading involving loops here:

             https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html

--------------DED459D991C7EF32778A26AD
Content-Type: text/x-patch; charset=UTF-8;
 name="0001-Disable-threading-through-latches-until-after-loop-o.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename*0="0001-Disable-threading-through-latches-until-after-loop-o.pa";
 filename*1="tch"

>From e735a2cd01485773635bd66c97af21059992d5a7 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez <aldyh@redhat.com>
Date: Thu, 9 Sep 2021 20:30:28 +0200
Subject: [PATCH] Disable threading through latches until after loop
 optimizations.

The motivation for this patch was enabling the use of global ranges in
the path solver, but this caused certain properties of loops being
destroyed which made subsequent loop optimizations to fail.
Consequently, this patch's mail goal is to disable jump threading
involving the latch until after loop optimizations have run.

I have added a bit in cfun to indicate when the "loopdone" optimization
has completed.  This helps the path threader determine when it's safe to
thread through loop structures.  I can adapt the patch if there is an
alternate way of determining this.

As can be seen in the test adjustments, we mostly shift the threading
from the early threaders (ethread, thread[12] to the late threaders
thread[34]).  I have nuked some of the early notes in the testcases
that came as part of the jump threader rewrite.  They're mostly noise
now.

Note that we could probably relax some other restrictions in
profitable_path_p when loop optimizations have completed, but it would
require more testing, and I'm hesitant to touch more things than needed
at this point.  I have added a reminder to the function to keep this
in mind.

Finally, perhaps as a follow-up, we should apply the same restrictions to
the forward threader.  At some point I'd like to combine the cost models.

Tested on x86-64 Linux.

p.s. There is a thorough discussion involving the limitations of jump
threading involving loops here:

	https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html

gcc/ChangeLog:

	* function.h (struct function): Add loop_optimizers_done.
	(set_loop_optimizers_done): New.
	(loop_optimizers_done_p): New.
	* gimple-range-path.cc (path_range_query::internal_range_of_expr):
	Intersect with global range.
	* tree-ssa-loop.c (tree_ssa_loop_done): Call set_loop_optimizers_done.
	* tree-ssa-threadbackward.c
	(back_threader_profitability::profitable_path_p): Disable
	threading through latches until after loop optimizations have run.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Adjust for disabling of
	threading through latches.
	* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.

Co-authored-by: Michael Matz <matz@suse.de>
---
 gcc/function.h                                | 15 ++++++++
 gcc/gimple-range-path.cc                      |  3 ++
 .../gcc.dg/tree-ssa/ssa-dom-thread-2b.c       |  4 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-6.c        | 37 +------------------
 .../gcc.dg/tree-ssa/ssa-dom-thread-7.c        | 17 +--------
 gcc/tree-ssa-loop.c                           |  1 +
 gcc/tree-ssa-threadbackward.c                 | 28 +++++++++++++-
 7 files changed, 50 insertions(+), 55 deletions(-)

diff --git a/gcc/function.h b/gcc/function.h
index 36003e7576a..fb99b6a44fc 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -438,6 +438,9 @@ struct GTY(()) function {
 
   /* Set if there are any OMP_TARGET regions in the function.  */
   unsigned int has_omp_target : 1;
+
+  /* Set if SSA loop optimizers have completed.  */
+  unsigned int loop_optimizers_done : 1;
 };
 
 /* Add the decl D to the local_decls list of FUN.  */
@@ -514,6 +517,18 @@ set_loops_for_fn (struct function *fn, struct loops *loops)
   fn->x_current_loops = loops;
 }
 
+inline void
+set_loop_optimizers_done (struct function *fn)
+{
+  fn->loop_optimizers_done = 1;
+}
+
+inline bool
+loop_optimizers_done_p (struct function *fn)
+{
+  return fn->loop_optimizers_done;
+}
+
 /* For backward compatibility... eventually these should all go away.  */
 #define current_function_funcdef_no (cfun->funcdef_no)
 
diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index a4fa3b296ff..c616b65756f 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -127,6 +127,9 @@ path_range_query::internal_range_of_expr (irange &r, tree name, gimple *stmt)
   basic_block bb = stmt ? gimple_bb (stmt) : exit_bb ();
   if (stmt && range_defined_in_block (r, name, bb))
     {
+      if (TREE_CODE (name) == SSA_NAME)
+	r.intersect (gimple_range_global (name));
+
       set_cache (r, name);
       return true;
     }
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
index e1c33e86cd7..823ada982ff 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */
+/* { dg-options "-O2 -fdump-tree-thread3-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */
 
 void foo();
 void bla();
@@ -26,4 +26,4 @@ void thread_latch_through_header (void)
    case.  And we want to thread through the header as well.  These
    are both caught by threading in DOM.  */
 /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2"} } */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread1"} } */
+/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread3"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
index c7bf867b084..ee46759bacc 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
@@ -1,41 +1,8 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread2-details" } */
+/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3-details" } */
 
-/* All the threads in the thread1 dump start on a X->BB12 edge, as can
-   be seen in the dump:
-
-     Registering FSM jump thread: (x, 12) incoming edge; ...
-     etc
-     etc
-
-   Before the new evrp, we were threading paths that started at the
-   following edges:
-
-      Registering FSM jump thread: (10, 12) incoming edge
-      Registering FSM jump thread:  (6, 12) incoming edge
-      Registering FSM jump thread:  (9, 12) incoming edge
-
-   This was because the PHI at BB12 had constant values coming in from
-   BB10, BB6, and BB9:
-
-   # state_10 = PHI <state_11(7), 0(10), state_11(5), 1(6), state_11(8), 2(9), state_11(11)>
-
-   Now with the new evrp, we get:
-
-   # state_10 = PHI <0(7), 0(10), state_11(5), 1(6), 0(8), 2(9), 1(11)>
-
-   Thus, we have 3 more paths that are known to be constant and can be
-   threaded.  Which means that by the second threading pass, we can
-   only find one profitable path.
-
-   For the record, all these extra constants are better paths coming
-   out of switches.  For example:
-
-     SWITCH_BB -> BBx -> BBy -> BBz -> PHI
-
-   We now know the value of the switch index at PHI.  */
 /* { dg-final { scan-tree-dump-times "Registering FSM jump" 6 "thread1" } } */
-/* { dg-final { scan-tree-dump-times "Registering FSM jump" 1 "thread2" } } */
+/* { dg-final { scan-tree-dump-times "Registering FSM jump" 1 "thread3" } } */
 
 int sum0, sum1, sum2, sum3;
 int foo (char *s, char **ret)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index 5fc2145a432..ba07942f9dd 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -1,23 +1,8 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
 
-/* Here we have the same issue as was commented in ssa-dom-thread-6.c.
-   The PHI coming into the threader has a lot more constants, so the
-   threader can thread more paths.
-
-$ diff clean/a.c.105t.mergephi2 a.c.105t.mergephi2 
-252c252
-<   # s_50 = PHI <s_49(10), 5(14), s_51(18), s_51(22), 1(26), 1(29), 1(31), s_51(5), 4(12), 1(15), 5(17), 1(19), 3(21), 1(23), 6(25), 7(28), s_51(30)>
----
->   # s_50 = PHI <s_49(10), 5(14), 4(18), 5(22), 1(26), 1(29), 1(31), s_51(5), 4(12), 1(15), 5(17), 1(19), 3(21), 1(23), 6(25), 7(28), 7(30)>
-272a273
-
-  I spot checked a few and they all have the same pattern.  We are
-  basically tracking the switch index better through multiple
-  paths.  */
-
 /* { dg-final { scan-tree-dump "Jumps threaded: 18"  "thread1" } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread2" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" } } */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom2" } } */
 
 /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC.  It's high enough
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 0cc4b3bbccf..c827a51d3ce 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -528,6 +528,7 @@ tree_ssa_loop_done (void)
   free_numbers_of_iterations_estimates (cfun);
   scev_finalize ();
   loop_optimizer_finalize (cfun, true);
+  set_loop_optimizers_done (cfun);
   return 0;
 }
 
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 449232c7715..ded31a9e2de 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ssa.h"
 #include "tree-cfgcleanup.h"
 #include "tree-pretty-print.h"
+#include "cfghooks.h"
 
 // Path registry for the backwards threader.  After all paths have been
 // registered with register_path(), thread_through_all_blocks() is called
@@ -564,7 +565,10 @@ back_threader_registry::thread_through_all_blocks (bool may_peel_loop_headers)
    TAKEN_EDGE, otherwise it is NULL.
 
    CREATES_IRREDUCIBLE_LOOP, if non-null is set to TRUE if threading this path
-   would create an irreducible loop.  */
+   would create an irreducible loop.
+
+   ?? It seems we should be able to loosen some of the restrictions in
+   this function after loop optimizations have run.  */
 
 bool
 back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
@@ -725,7 +729,11 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
 	 the last entry in the array when determining if we thread
 	 through the loop latch.  */
       if (loop->latch == bb)
-	threaded_through_latch = true;
+	{
+	  threaded_through_latch = true;
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, " (latch)");
+	}
     }
 
   gimple *stmt = get_gimple_control_stmt (m_path[0]);
@@ -845,6 +853,22 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
 		 "a multiway branch.\n");
       return false;
     }
+
+  /* Threading through a non-empty latch would cause code to be added
+     to the latch.  This could alter the loop form sufficiently to
+     cause loop optimizations to fail.  Disable these threads until
+     after loop optimizations have run.  */
+  if ((threaded_through_latch
+       || (taken_edge && taken_edge->dest == loop->latch))
+      && !loop_optimizers_done_p (cfun)
+      && empty_block_p (loop->latch))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "  FAIL: FSM Thread through latch before loop opts would create non-empty latch\n");
+      return false;
+
+    }
   return true;
 }
 
-- 
2.31.1


--------------DED459D991C7EF32778A26AD--