From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id E6C743857823 for ; Thu, 2 Sep 2021 08:50:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E6C743857823 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id BDEE2225D3; Thu, 2 Sep 2021 08:50:13 +0000 (UTC) Received: from murzim.suse.de (murzim.suse.de [10.160.4.192]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id A21C8A3B87; Thu, 2 Sep 2021 08:50:13 +0000 (UTC) Date: Thu, 2 Sep 2021 10:50:13 +0200 (CEST) From: Richard Biener To: Xionghu Luo cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] tree-optimization/102155 - fix LIM fill_always_executed_in CFG walk In-Reply-To: Message-ID: References: <47so1129-4rp-93pp-op88-4p649p42po80@fhfr.qr> <4a364fc3-7390-d3ca-5e71-6520061cf738@linux.ibm.com> User-Agent: Alpine 2.21 (LSU 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Sep 2021 08:50:25 -0000 On Thu, 2 Sep 2021, Richard Biener wrote: > On Thu, 2 Sep 2021, Xionghu Luo wrote: > > > > > > > On 2021/9/1 17:58, Richard Biener wrote: > > > This fixes the CFG walk order of fill_always_executed_in to use > > > RPO oder rather than the dominator based order computed by > > > get_loop_body_in_dom_order. That fixes correctness issues with > > > unordered dominator children. > > > > > > The RPO order computed by rev_post_order_and_mark_dfs_back_seme in > > > its for-iteration mode is a good match for the algorithm. > > > > > > Xionghu, I've tried to only fix the CFG walk order issue and not > > > change anything else with this so we have a more correct base > > > to work against. The code still walks inner loop bodies > > > up to loop depth times and thus is quadratic in the loop depth. > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, if you don't > > > have any comments I plan to push this and then revisit what we > > > were circling around. > > > > LGTM, thanks. > > I pushed it, thought again in the attempt to build a testcase and > concluded I was wrong with the appearant mishandling of > contains_call - get_loop_body_in_dom_order seems to be exactly > correct for this specific case. So I reverted the commit again. And I figured what the /* In a loop that is always entered we may proceed anyway. But record that we entered it and stop once we leave it. */ comment was about. The code was present before the fix for PR78185 and it was supposed to catch the case where the entered inner loop is not finite. Just as the testcase from PR78185 shows the stopping was done too late when the exit block was already marked as to be always executed. A simpler fix for PR78185 would have been to move if (!flow_bb_inside_loop_p (inn_loop, bb)) break; before setting of last = bb. In fact the installed fix was more pessimistic than that given it terminated already when entering a possibly infinite loop. So we can improve that by doing sth like which should also improve the situation for some of the cases you were looking at? What remains is that we continue to stop when entering a not always executed loop: if (bb->loop_father->header == bb) { if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) break; that I can at this point only explain by possible efficiency concerns? Any better idea on that one? I'm going to test the patch below which improves the situation for volatile int flag, bar; double foo (double *valp) { double sum = 0; for (int i = 0; i < 256; ++i) { for (int j = 0; j < 256; ++j) bar = flag; if (flag) sum += 1.; sum += *valp; } return sum; } Thanks, Richard. diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c index d9f75d5025e..f0c93d6a882 100644 --- a/gcc/tree-ssa-loop-im.c +++ b/gcc/tree-ssa-loop-im.c @@ -3044,23 +3044,27 @@ fill_always_executed_in_1 (class loop *loop, sbitmap contains_call) edge_iterator ei; bb = bbs[i]; + if (!flow_bb_inside_loop_p (inn_loop, bb)) + { + /* When we are leaving a possibly infinite inner loop + we have to stop processing. */ + if (!finite_loop_p (inn_loop)) + break; + /* If the loop was finite we can continue with processing + the loop we exited to. */ + inn_loop = bb->loop_father; + } + if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) last = bb; if (bitmap_bit_p (contains_call, bb->index)) break; + /* If LOOP exits from this BB stop processing. */ FOR_EACH_EDGE (e, ei, bb->succs) - { - /* If there is an exit from this BB. */ - if (!flow_bb_inside_loop_p (loop, e->dest)) - break; - /* Or we enter a possibly non-finite loop. */ - if (flow_loop_nested_p (bb->loop_father, - e->dest->loop_father) - && ! finite_loop_p (e->dest->loop_father)) - break; - } + if (!flow_bb_inside_loop_p (loop, e->dest)) + break; if (e) break; @@ -3069,16 +3073,14 @@ fill_always_executed_in_1 (class loop *loop, sbitmap contains_call) if (bb->flags & BB_IRREDUCIBLE_LOOP) break; - if (!flow_bb_inside_loop_p (inn_loop, bb)) - break; - if (bb->loop_father->header == bb) { if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) break; /* In a loop that is always entered we may proceed anyway. - But record that we entered it and stop once we leave it. */ + But record that we entered it and stop once we leave it + since it might not be finite. */ inn_loop = bb->loop_father; } }