From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenther@suse.de>
Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29])
 by sourceware.org (Postfix) with ESMTPS id E19F93857C49
 for <gcc-patches@gcc.gnu.org>; Tue,  2 Aug 2022 08:41:07 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E19F93857C49
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
 by smtp-out2.suse.de (Postfix) with ESMTP id 702082056D;
 Tue,  2 Aug 2022 08:41:06 +0000 (UTC)
Received: from wotan.suse.de (wotan.suse.de [10.160.0.1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by relay2.suse.de (Postfix) with ESMTPS id 5C2112C141;
 Tue,  2 Aug 2022 08:41:06 +0000 (UTC)
Date: Tue, 2 Aug 2022 08:41:06 +0000 (UTC)
From: Richard Biener <rguenther@suse.de>
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards
 threader
User-Agent: Alpine 2.22 (LSU 394 2020-01-19)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, MISSING_MID,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Aug 2022 08:41:13 -0000
Message-ID: <20220802084106.Vgw5KaJDv4l0RFKq7cCtG7MkalQWNZiWbMcoFvoeQpA@z>

I am trying to make sense of back_threader_profitability::profitable_path_p
and the first thing I notice is that we do

  /* Threading is profitable if the path duplicated is hot but also
     in a case we separate cold path from hot path and permit optimization
     of the hot path later.  Be on the agressive side here. In some testcases,
     as in PR 78407 this leads to noticeable improvements.  */
  if (m_speed_p
      && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
          || contains_hot_bb))
    {
      if (n_insns >= param_max_fsm_thread_path_insns)
        {
          if (dump_file && (dump_flags & TDF_DETAILS))
            fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
                     "the number of instructions on the path "
                     "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
          return false;
        }
...
    }
  else if (!m_speed_p && n_insns > 1)
    {
      if (dump_file && (dump_flags & TDF_DETAILS))
        fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
                 "duplication of %i insns is needed and optimizing for size.\n",
                 n_insns);
      return false;
    }
...
  return true;

thus we apply the n_insns >= param_max_fsm_thread_path_insns only
to "hot paths".  The comment above this isn't entirely clear whether
this is by design ("Be on the aggressive side here ...") but I think
this is a mistake.  In fact the "hot path" check seems entirely
useless since if the path is not hot we simply continue threading it.

I have my reservations about how we compute hot (contains_hot_bb
in particular), but the following first refactors the above to apply
the size constraints always and then _not_ threading if the path
is not considered hot (but allow threading if n_insns <= 1 as with
the !m_speed_p case).

As for contains_hot_bb - it might be that this consciously captures
the case where we separate a cold from a hot path even though the
threaded path itself is cold.  Consider

   A
  / \ (unlikely)
 B   C
  \ /
   D
  / \
 ..  abort()

when we want to thread A -> B -> D -> abort () and A (or D)
has a hot BB count then we have contains_hot_bb even though
the counts on the path itself are small.  In fact when we
thread the only relevant count for the resulting threaded
path is the count of A with the A->C probability applied
(that should also the count to subtract from the blocks
we copied - sth missing for the backwards threader as well).

So I'm wondering how the logic computing contains_hot_bb
relates to the above comment before the costing block.
Anyone remembers?

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

	* tree-ssa-threadbackwards.cc
	(back_threader_profitability::profitable_path_p): Apply
	size constraints to all paths.  Do not thread cold paths.
---
 gcc/tree-ssa-threadbackward.cc | 53 +++++++++++++++++++++-------------
 1 file changed, 33 insertions(+), 20 deletions(-)

diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 0519f2a8c4b..a887568032b 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -761,22 +761,43 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
 	*creates_irreducible_loop = true;
     }
 
-  /* Threading is profitable if the path duplicated is hot but also
+  const int max_cold_insns = 1;
+  if (!m_speed_p && n_insns > max_cold_insns)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
+		 "duplication of %i insns is needed and optimizing for size.\n",
+		 n_insns);
+      return false;
+    }
+  else if (n_insns >= param_max_fsm_thread_path_insns)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
+		 "the number of instructions on the path "
+		 "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
+      return false;
+    }
+
+  /* Threading is profitable if the path duplicated is small or hot but also
      in a case we separate cold path from hot path and permit optimization
      of the hot path later.  Be on the agressive side here. In some testcases,
      as in PR 78407 this leads to noticeable improvements.  */
-  if (m_speed_p
-      && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
-	  || contains_hot_bb))
+  if (!(n_insns <= max_cold_insns
+	|| contains_hot_bb
+	|| (taken_edge && optimize_edge_for_speed_p (taken_edge))))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
+		 "path is not profitable to thread.\n");
+      return false;
+    }
+
+  /* If the path is not small to duplicate and either the entry or
+     the final destination is probably never executed avoid separating
+     the cold path since that can lead to spurious diagnostics.  */
+  if (n_insns > max_cold_insns)
     {
-      if (n_insns >= param_max_fsm_thread_path_insns)
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
-		     "the number of instructions on the path "
-		     "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
-	  return false;
-	}
       if (taken_edge && probably_never_executed_edge_p (cfun, taken_edge))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
@@ -794,14 +815,6 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
 	  return false;
 	}
     }
-  else if (!m_speed_p && n_insns > 1)
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
-		 "duplication of %i insns is needed and optimizing for size.\n",
-		 n_insns);
-      return false;
-    }
 
   /* We avoid creating irreducible inner loops unless we thread through
      a multiway branch, in which case we have deemed it worth losing
-- 
2.35.3