From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id A07953858C60 for ; Thu, 2 Sep 2021 10:00:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A07953858C60 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 1829YEag042971; Thu, 2 Sep 2021 06:00:16 -0400 Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0a-001b2d01.pphosted.com with ESMTP id 3atv158mwn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 02 Sep 2021 06:00:16 -0400 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1829wuKj001394; Thu, 2 Sep 2021 10:00:14 GMT Received: from b06avi18878370.portsmouth.uk.ibm.com (b06avi18878370.portsmouth.uk.ibm.com [9.149.26.194]) by ppma02fra.de.ibm.com with ESMTP id 3atdxd80w4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 02 Sep 2021 10:00:13 +0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 1829u4sm50528668 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 2 Sep 2021 09:56:04 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 190D752071; Thu, 2 Sep 2021 10:00:08 +0000 (GMT) Received: from luoxhus-MacBook-Pro.local (unknown [9.197.227.160]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTPS id 8761A52073; Thu, 2 Sep 2021 10:00:06 +0000 (GMT) Subject: Re: [PATCH] tree-optimization/102155 - fix LIM fill_always_executed_in CFG walk To: Richard Biener Cc: gcc-patches@gcc.gnu.org References: <47so1129-4rp-93pp-op88-4p649p42po80@fhfr.qr> <4a364fc3-7390-d3ca-5e71-6520061cf738@linux.ibm.com> From: Xionghu Luo Message-ID: <566e5d26-2c0e-8181-2249-211ebf369b73@linux.ibm.com> Date: Thu, 2 Sep 2021 18:00:02 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.0; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: Y9tcAbqvE5MYKc-I-MZFbXEQlWQG6LmD X-Proofpoint-ORIG-GUID: Y9tcAbqvE5MYKc-I-MZFbXEQlWQG6LmD X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-09-02_03:2021-09-01, 2021-09-02 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 phishscore=0 mlxlogscore=999 bulkscore=0 priorityscore=1501 spamscore=0 impostorscore=0 mlxscore=0 suspectscore=0 malwarescore=0 lowpriorityscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2108310000 definitions=main-2109020059 X-Spam-Status: No, score=-13.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Sep 2021 10:00:19 -0000 On 2021/9/2 16:50, Richard Biener wrote: > On Thu, 2 Sep 2021, Richard Biener wrote: > >> On Thu, 2 Sep 2021, Xionghu Luo wrote: >> >>> >>> >>> On 2021/9/1 17:58, Richard Biener wrote: >>>> This fixes the CFG walk order of fill_always_executed_in to use >>>> RPO oder rather than the dominator based order computed by >>>> get_loop_body_in_dom_order. That fixes correctness issues with >>>> unordered dominator children. >>>> >>>> The RPO order computed by rev_post_order_and_mark_dfs_back_seme in >>>> its for-iteration mode is a good match for the algorithm. >>>> >>>> Xionghu, I've tried to only fix the CFG walk order issue and not >>>> change anything else with this so we have a more correct base >>>> to work against. The code still walks inner loop bodies >>>> up to loop depth times and thus is quadratic in the loop depth. >>>> >>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, if you don't >>>> have any comments I plan to push this and then revisit what we >>>> were circling around. >>> >>> LGTM, thanks. >> >> I pushed it, thought again in the attempt to build a testcase and >> concluded I was wrong with the appearant mishandling of >> contains_call - get_loop_body_in_dom_order seems to be exactly >> correct for this specific case. So I reverted the commit again. > > And I figured what the > > /* In a loop that is always entered we may proceed anyway. > But record that we entered it and stop once we leave it. > */ > > comment was about. The code was present before the fix for PR78185 > and it was supposed to catch the case where the entered inner loop > is not finite. Just as the testcase from PR78185 shows the > stopping was done too late when the exit block was already marked > as to be always executed. A simpler fix for PR78185 would have been > to move > > if (!flow_bb_inside_loop_p (inn_loop, bb)) > break; > > before setting of last = bb. In fact the installed fix was more > pessimistic than that given it terminated already when entering > a possibly infinite loop. So we can improve that by doing > sth like which should also improve the situation for some of > the cases you were looking at? > > What remains is that we continue to stop when entering a > not always executed loop: > > if (bb->loop_father->header == bb) > { > if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) > break; Yes. This will cause blocks after inner loop missed to be check if they are actually ALWAYS_EXECUTED. I am afraid O(N^2) is inevitable here... > > that I can at this point only explain by possible efficiency > concerns? Any better idea on that one? >From experiment, early break from inner loop seems not cost shorter time than full inner loop walk. I will take more precise measurement and larger data set on the function fill_always_executed_in_1 if necessary. My previous v2 patch also tried to update inn_loop level by level when exiting from inn_loops, but it is proved to be unnecessary but you worried about the dominance order by get_loop_body_in_dom_order. > > I'm going to test the patch below which improves the situation for > > volatile int flag, bar; > double foo (double *valp) > { > double sum = 0; > for (int i = 0; i < 256; ++i) > { > for (int j = 0; j < 256; ++j) > bar = flag; > if (flag) > sum += 1.; > sum += *valp; > } > return sum; > } The patch still fails to handle cases like this: struct X { int i; int j; int k;}; volatile int m; void bar (struct X *x, int n, int l, int k) { for (int i = 0; i < l; i++) { if (k) for (int j = 0; j < l; j++) { if (m) x->i = m; else x->i = 1 - m; int *r = &x->k; int tem2 = *r; x->k += tem2 * j; } x->i = m; } } x->i is still not marked ALWAYS_EXECUTED for outer loop. > > Thanks, > Richard. > > diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c > index d9f75d5025e..f0c93d6a882 100644 > --- a/gcc/tree-ssa-loop-im.c > +++ b/gcc/tree-ssa-loop-im.c > @@ -3044,23 +3044,27 @@ fill_always_executed_in_1 (class loop *loop, > sbitmap contains_call) > edge_iterator ei; > bb = bbs[i]; > > + if (!flow_bb_inside_loop_p (inn_loop, bb)) > + { > + /* When we are leaving a possibly infinite inner loop > + we have to stop processing. */ > + if (!finite_loop_p (inn_loop)) > + break; > + /* If the loop was finite we can continue with processing > + the loop we exited to. */ > + inn_loop = bb->loop_father; > + } > + > if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) > last = bb; > > if (bitmap_bit_p (contains_call, bb->index)) > break; > > + /* If LOOP exits from this BB stop processing. */ > FOR_EACH_EDGE (e, ei, bb->succs) > - { > - /* If there is an exit from this BB. */ > - if (!flow_bb_inside_loop_p (loop, e->dest)) > - break; > - /* Or we enter a possibly non-finite loop. */ > - if (flow_loop_nested_p (bb->loop_father, > - e->dest->loop_father) > - && ! finite_loop_p (e->dest->loop_father)) > - break; > - } > + if (!flow_bb_inside_loop_p (loop, e->dest)) > + break; > if (e) > break; > > @@ -3069,16 +3073,14 @@ fill_always_executed_in_1 (class loop *loop, > sbitmap contains_call) > if (bb->flags & BB_IRREDUCIBLE_LOOP) > break; > > - if (!flow_bb_inside_loop_p (inn_loop, bb)) > - break; > - > if (bb->loop_father->header == bb) > { > if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) > break; > > /* In a loop that is always entered we may proceed anyway. > - But record that we entered it and stop once we leave it. > */ > + But record that we entered it and stop once we leave it > + since it might not be finite. */ > inn_loop = bb->loop_father; > } > } > -- Thanks, Xionghu