From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id B7D283858C20 for ; Tue, 16 Aug 2022 09:18:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B7D283858C20 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id BC5D61F912; Tue, 16 Aug 2022 09:18:31 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id AD48C2C143; Tue, 16 Aug 2022 09:18:31 +0000 (UTC) Date: Tue, 16 Aug 2022 09:18:31 +0000 (UTC) From: Richard Biener To: Aldy Hernandez cc: Andrew MacLeod , Jeff Law , gcc-patches Subject: Re: [PATCH] Support threading of just the exit edge In-Reply-To: Message-ID: References: <20220812120145.91C6513AAE@imap2.suse-dmz.suse.de> <3b6265b1-a0ab-f0b7-2d1f-aaf6277eec14@redhat.com> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2022 09:18:34 -0000 On Mon, 15 Aug 2022, Aldy Hernandez wrote: > On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod wrote: > > > > heh. or just > > > > > > + int_range<2> r; > > + if (!fold_range (r, const_cast (cond_stmt)) > > + || !r.singleton_p (&val)) > > > > > > if you do not provide a range_query to any of the fold_using_range code, > > it defaults to: > > > > fur_source::fur_source (range_query *q) > > { > > if (q) > > m_query = q; > > else if (cfun) > > m_query = get_range_query (cfun); > > else > > m_query = get_global_range_query (); > > m_gori = NULL; > > } > > > > Sweet. Even better! So when I do the following incremental change ontop of the posted patch then I see that the path query is able to simplify more "single BB paths" than the global range folding. diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc index 669098e4ec3..777e778037f 100644 --- a/gcc/tree-ssa-threadbackward.cc +++ b/gcc/tree-ssa-threadbackward.cc @@ -314,6 +314,12 @@ back_threader::find_taken_edge_cond (const vec &path, { int_range_max r; + int_range<2> rf; + if (path.length () == 1) + { + fold_range (rf, cond); + } + m_solver->compute_ranges (path, m_imports); m_solver->range_of_stmt (r, cond); @@ -325,6 +331,8 @@ back_threader::find_taken_edge_cond (const vec &path, if (r == true_range || r == false_range) { + if (path.length () == 1) + gcc_assert (r == rf); edge e_true, e_false; basic_block bb = gimple_bb (cond); extract_true_false_edges_from_block (bb, &e_true, &e_false); Even doing the following (not sure what's the difference and in particular expense over the path range query) results in missed simplifications (checking my set of cc1files). diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc index 669098e4ec3..1d43a179d08 100644 --- a/gcc/tree-ssa-threadbackward.cc +++ b/gcc/tree-ssa-threadbackward.cc @@ -99,6 +99,7 @@ private: back_threader_registry m_registry; back_threader_profitability m_profit; + gimple_ranger *m_ranger; path_range_query *m_solver; // Current path being analyzed. @@ -146,12 +147,14 @@ back_threader::back_threader (function *fun, unsigned flags, bool first) // The path solver needs EDGE_DFS_BACK in resolving mode. if (flags & BT_RESOLVE) mark_dfs_back_edges (); - m_solver = new path_range_query (flags & BT_RESOLVE); + m_ranger = new gimple_ranger; + m_solver = new path_range_query (flags & BT_RESOLVE, m_ranger); } back_threader::~back_threader () { delete m_solver; + delete m_ranger; loop_optimizer_finalize (); } @@ -314,6 +317,12 @@ back_threader::find_taken_edge_cond (const vec &path, { int_range_max r; + int_range<2> rf; + if (path.length () == 1) + { + fold_range (rf, cond, m_ranger); + } + m_solver->compute_ranges (path, m_imports); m_solver->range_of_stmt (r, cond); @@ -325,6 +334,8 @@ back_threader::find_taken_edge_cond (const vec &path, if (r == true_range || r == false_range) { + if (path.length () == 1) + gcc_assert (r == rf); edge e_true, e_false; basic_block bb = gimple_bb (cond); extract_true_false_edges_from_block (bb, &e_true, &e_false); one example is [local count: 14414059]: _128 = node_177(D)->typed.type; pretmp_413 = MEM[(const union tree_node *)_128].base.code; _431 = pretmp_413 + 65519; if (_128 == 0B) goto ; [18.09%] else goto ; [81.91%] where m_imports for the path is just _128 and the range computed is false while the ranger query returns VARYING. But path_range_query::range_defined_in_block does if (bb && POINTER_TYPE_P (TREE_TYPE (name))) m_ranger->m_cache.m_exit.maybe_adjust_range (r, name, bb); which adjusts the range to ~[0, 0], probably because of the dereference in the following stmt. Why does fold_range not do this when folding the exit test? Is there a way to make it do so? It looks like the only routine using this in gimple-range.cc is range_on_edge and there it's used for e->src after calling range_on_exit for e->src (why's it not done in range_on_exit?). A testcase for this is int foo (int **p, int i) { int *q = *p; int res = *q + i; if (q) return res; return -1; } which we "thread" with the path and with the above ICEs because fold_range doesn't get that if (q) is always true. Without the patch ethread doesn't want to duplicate the block (it's too large) but threadfull will if you disable evrp (if you remove the increment by 'i' it again won't since nothing interesting prevails and it won't go to BB 0 and fails to pick up a thread of length > 1): Checking profitability of path (backwards): bb:2 (6 insns) bb:0 Control statement insns: 2 Overall: 4 insns [1] Registering jump thread: (0, 2) incoming edge; (2, 3) nocopy; path: 0->2->3 SUCCESS Removing basic block 2 ;; basic block 2, loop depth 0 ;; pred: _1 = *p_6(D); _2 = (long unsigned int) n_7(D); _3 = _2 * 4; q_8 = _1 + _3; res_9 = *q_8; if (q_8 != 0B) goto ; [98.33%] else goto ; [1.67%] ;; succ: 3 ;; 4 ... (it copied BB 2) ... int foo (int * * p, int n) { int res; int * q; int * _10; long unsigned int _11; long unsigned int _12; [local count: 1073741824]: _10 = *p_6(D); _11 = (long unsigned int) n_7(D); _12 = _11 * 4; q_13 = _10 + _12; res_14 = *q_13; return res_14;