From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id E19F93857C49 for ; Tue, 2 Aug 2022 08:41:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E19F93857C49 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 702082056D; Tue, 2 Aug 2022 08:41:06 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 5C2112C141; Tue, 2 Aug 2022 08:41:06 +0000 (UTC) Date: Tue, 2 Aug 2022 08:41:06 +0000 (UTC) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Aug 2022 08:41:13 -0000 Message-ID: <20220802084106.Vgw5KaJDv4l0RFKq7cCtG7MkalQWNZiWbMcoFvoeQpA@z> I am trying to make sense of back_threader_profitability::profitable_path_p and the first thing I notice is that we do /* Threading is profitable if the path duplicated is hot but also in a case we separate cold path from hot path and permit optimization of the hot path later. Be on the agressive side here. In some testcases, as in PR 78407 this leads to noticeable improvements. */ if (m_speed_p && ((taken_edge && optimize_edge_for_speed_p (taken_edge)) || contains_hot_bb)) { if (n_insns >= param_max_fsm_thread_path_insns) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " FAIL: Jump-thread path not considered: " "the number of instructions on the path " "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n"); return false; } ... } else if (!m_speed_p && n_insns > 1) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " FAIL: Jump-thread path not considered: " "duplication of %i insns is needed and optimizing for size.\n", n_insns); return false; } ... return true; thus we apply the n_insns >= param_max_fsm_thread_path_insns only to "hot paths". The comment above this isn't entirely clear whether this is by design ("Be on the aggressive side here ...") but I think this is a mistake. In fact the "hot path" check seems entirely useless since if the path is not hot we simply continue threading it. I have my reservations about how we compute hot (contains_hot_bb in particular), but the following first refactors the above to apply the size constraints always and then _not_ threading if the path is not considered hot (but allow threading if n_insns <= 1 as with the !m_speed_p case). As for contains_hot_bb - it might be that this consciously captures the case where we separate a cold from a hot path even though the threaded path itself is cold. Consider A / \ (unlikely) B C \ / D / \ .. abort() when we want to thread A -> B -> D -> abort () and A (or D) has a hot BB count then we have contains_hot_bb even though the counts on the path itself are small. In fact when we thread the only relevant count for the resulting threaded path is the count of A with the A->C probability applied (that should also the count to subtract from the blocks we copied - sth missing for the backwards threader as well). So I'm wondering how the logic computing contains_hot_bb relates to the above comment before the costing block. Anyone remembers? Bootstrap & regtest running on x86_64-unknown-linux-gnu. * tree-ssa-threadbackwards.cc (back_threader_profitability::profitable_path_p): Apply size constraints to all paths. Do not thread cold paths. --- gcc/tree-ssa-threadbackward.cc | 53 +++++++++++++++++++++------------- 1 file changed, 33 insertions(+), 20 deletions(-) diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc index 0519f2a8c4b..a887568032b 100644 --- a/gcc/tree-ssa-threadbackward.cc +++ b/gcc/tree-ssa-threadbackward.cc @@ -761,22 +761,43 @@ back_threader_profitability::profitable_path_p (const vec &m_path, *creates_irreducible_loop = true; } - /* Threading is profitable if the path duplicated is hot but also + const int max_cold_insns = 1; + if (!m_speed_p && n_insns > max_cold_insns) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, " FAIL: Jump-thread path not considered: " + "duplication of %i insns is needed and optimizing for size.\n", + n_insns); + return false; + } + else if (n_insns >= param_max_fsm_thread_path_insns) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, " FAIL: Jump-thread path not considered: " + "the number of instructions on the path " + "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n"); + return false; + } + + /* Threading is profitable if the path duplicated is small or hot but also in a case we separate cold path from hot path and permit optimization of the hot path later. Be on the agressive side here. In some testcases, as in PR 78407 this leads to noticeable improvements. */ - if (m_speed_p - && ((taken_edge && optimize_edge_for_speed_p (taken_edge)) - || contains_hot_bb)) + if (!(n_insns <= max_cold_insns + || contains_hot_bb + || (taken_edge && optimize_edge_for_speed_p (taken_edge)))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, " FAIL: Jump-thread path not considered: " + "path is not profitable to thread.\n"); + return false; + } + + /* If the path is not small to duplicate and either the entry or + the final destination is probably never executed avoid separating + the cold path since that can lead to spurious diagnostics. */ + if (n_insns > max_cold_insns) { - if (n_insns >= param_max_fsm_thread_path_insns) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, " FAIL: Jump-thread path not considered: " - "the number of instructions on the path " - "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n"); - return false; - } if (taken_edge && probably_never_executed_edge_p (cfun, taken_edge)) { if (dump_file && (dump_flags & TDF_DETAILS)) @@ -794,14 +815,6 @@ back_threader_profitability::profitable_path_p (const vec &m_path, return false; } } - else if (!m_speed_p && n_insns > 1) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, " FAIL: Jump-thread path not considered: " - "duplication of %i insns is needed and optimizing for size.\n", - n_insns); - return false; - } /* We avoid creating irreducible inner loops unless we thread through a multiway branch, in which case we have deemed it worth losing -- 2.35.3