From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 68CB83857828 for ; Thu, 5 Aug 2021 08:50:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 68CB83857828 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 1758nvMk062532; Thu, 5 Aug 2021 04:50:30 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 3a88sne229-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 05 Aug 2021 04:50:30 -0400 Received: from m0098419.ppops.net (m0098419.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 1758oU6q064370; Thu, 5 Aug 2021 04:50:30 -0400 Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0b-001b2d01.pphosted.com with ESMTP id 3a88sne21h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 05 Aug 2021 04:50:30 -0400 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1758lf9a001894; Thu, 5 Aug 2021 08:50:28 GMT Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by ppma04ams.nl.ibm.com with ESMTP id 3a4x592w04-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 05 Aug 2021 08:50:28 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 1758oQ2a56623430 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 5 Aug 2021 08:50:26 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 12BD611C050; Thu, 5 Aug 2021 08:50:26 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EC1BC11C04C; Thu, 5 Aug 2021 08:50:23 +0000 (GMT) Received: from KewenLins-MacBook-Pro.local (unknown [9.200.63.169]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 5 Aug 2021 08:50:23 +0000 (GMT) Subject: Re: [PATCH v3] Make loops_list support an optional loop_p root To: Richard Biener Cc: GCC Patches , Segher Boessenkool , Martin Sebor , Bill Schmidt References: <0a8b77ba-1d54-1eff-b54d-d2cb1e769e09@linux.ibm.com> <61ac669c-7293-f53a-20c7-158b5a813cee@linux.ibm.com> <221d8a67-264a-b6a9-e705-bfb4a45f14bb@linux.ibm.com> <963b4fca-8ce6-c9d7-0b08-8431fa433322@linux.ibm.com> From: "Kewen.Lin" Message-ID: <4aa7d9dc-d284-16ed-36a0-6999c9e9a3ee@linux.ibm.com> Date: Thu, 5 Aug 2021 16:50:22 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: ZHaLcI0ov7a0tVOo-XmT8uFxB77G80rT X-Proofpoint-GUID: div433XUM0ZVB4Cvrct4p00N5_he0vIo X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-08-05_03:2021-08-05, 2021-08-05 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 phishscore=0 clxscore=1015 mlxscore=0 spamscore=0 impostorscore=0 malwarescore=0 bulkscore=0 lowpriorityscore=0 adultscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108050050 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Aug 2021 08:50:33 -0000 on 2021/8/4 下午8:04, Richard Biener wrote: > On Wed, Aug 4, 2021 at 12:47 PM Kewen.Lin wrote: >> >> on 2021/8/4 下午6:01, Richard Biener wrote: >>> On Wed, Aug 4, 2021 at 4:36 AM Kewen.Lin wrote: >>>> >>>> on 2021/8/3 下午8:08, Richard Biener wrote: >>>>> On Fri, Jul 30, 2021 at 7:20 AM Kewen.Lin wrote: >>>>>> >>>>>> on 2021/7/29 下午4:01, Richard Biener wrote: >>>>>>> On Fri, Jul 23, 2021 at 10:41 AM Kewen.Lin wrote: >>>>>>>> >>>>>>>> on 2021/7/22 下午8:56, Richard Biener wrote: >>>>>>>>> On Tue, Jul 20, 2021 at 4:37 >>>>>>>>> PM Kewen.Lin wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> This v2 has addressed some review comments/suggestions: >>>>>>>>>> >>>>>>>>>> - Use "!=" instead of "<" in function operator!= (const Iter &rhs) >>>>>>>>>> - Add new CTOR loops_list (struct loops *loops, unsigned flags) >>>>>>>>>> to support loop hierarchy tree rather than just a function, >>>>>>>>>> and adjust to use loops* accordingly. >>>>>>>>> >>>>>>>>> I actually meant struct loop *, not struct loops * ;) At the point >>>>>>>>> we pondered to make loop invariant motion work on single >>>>>>>>> loop nests we gave up not only but also because it iterates >>>>>>>>> over the loop nest but all the iterators only ever can process >>>>>>>>> all loops, not say, all loops inside a specific 'loop' (and >>>>>>>>> including that 'loop' if LI_INCLUDE_ROOT). So the >>>>>>>>> CTOR would take the 'root' of the loop tree as argument. >>>>>>>>> >>>>>>>>> I see that doesn't trivially fit how loops_list works, at least >>>>>>>>> not for LI_ONLY_INNERMOST. But I guess FROM_INNERMOST >>>>>>>>> could be adjusted to do ONLY_INNERMOST as well? >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks for the clarification! I just realized that the previous >>>>>>>> version with struct loops* is problematic, all traversal is >>>>>>>> still bounded with outer_loop == NULL. I think what you expect >>>>>>>> is to respect the given loop_p root boundary. Since we just >>>>>>>> record the loops' nums, I think we still need the function* fn? >>>>>>> >>>>>>> Would it simplify things if we recorded the actual loop *? >>>>>>> >>>>>> >>>>>> I'm afraid it's unsafe to record the loop*. I had the same >>>>>> question why the loop iterator uses index rather than loop* when >>>>>> I read this at the first time. I guess the design of processing >>>>>> loops allows its user to update or even delete the folllowing >>>>>> loops to be visited. For example, when the user does some tricks >>>>>> on one loop, then it duplicates the loop and its children to >>>>>> somewhere and then removes the loop and its children, when >>>>>> iterating onto its children later, the "index" way will check its >>>>>> validity by get_loop at that point, but the "loop *" way will >>>>>> have some recorded pointers to become dangling, can't do the >>>>>> validity check on itself, seems to need a side linear search to >>>>>> ensure the validity. >>>>>> >>>>>>> There's still the to_visit reserve which needs a bound on >>>>>>> the number of loops for efficiency reasons. >>>>>>> >>>>>> >>>>>> Yes, I still keep the fn in the updated version. >>>>>> >>>>>>>> So I add one optional argument loop_p root and update the >>>>>>>> visiting codes accordingly. Before this change, the previous >>>>>>>> visiting uses the outer_loop == NULL as the termination condition, >>>>>>>> it perfectly includes the root itself, but with this given root, >>>>>>>> we have to use it as the termination condition to avoid to iterate >>>>>>>> onto its possible existing next. >>>>>>>> >>>>>>>> For LI_ONLY_INNERMOST, I was thinking whether we can use the >>>>>>>> code like: >>>>>>>> >>>>>>>> struct loops *fn_loops = loops_for_fn (fn)->larray; >>>>>>>> for (i = 0; vec_safe_iterate (fn_loops, i, &aloop); i++) >>>>>>>> if (aloop != NULL >>>>>>>> && aloop->inner == NULL >>>>>>>> && flow_loop_nested_p (tree_root, aloop)) >>>>>>>> this->to_visit.quick_push (aloop->num); >>>>>>>> >>>>>>>> it has the stable bound, but if the given root only has several >>>>>>>> child loops, it can be much worse if there are many loops in fn. >>>>>>>> It seems impossible to predict the given root loop hierarchy size, >>>>>>>> maybe we can still use the original linear searching for the case >>>>>>>> loops_for_fn (fn) == root? But since this visiting seems not so >>>>>>>> performance critical, I chose to share the code originally used >>>>>>>> for FROM_INNERMOST, hope it can have better readability and >>>>>>>> maintainability. >>>>>>> >>>>>>> I was indeed looking for something that has execution/storage >>>>>>> bound on the subtree we're interested in. If we pull the CTOR >>>>>>> out-of-line we can probably keep the linear search for >>>>>>> LI_ONLY_INNERMOST when looking at the whole loop tree. >>>>>>> >>>>>> >>>>>> OK, I've moved the suggested single loop tree walker out-of-line >>>>>> to cfgloop.c, and brought the linear search back for >>>>>> LI_ONLY_INNERMOST when looking at the whole loop tree. >>>>>> >>>>>>> It just seemed to me that we can eventually re-use a >>>>>>> single loop tree walker for all orders, just adjusting the >>>>>>> places we push. >>>>>>> >>>>>> >>>>>> Wow, good point! Indeed, I have further unified all orders >>>>>> handlings into a single function walk_loop_tree. >>>>>> >>>>>>>> >>>>>>>> Bootstrapped and regtested on powerpc64le-linux-gnu P9, >>>>>>>> x86_64-redhat-linux and aarch64-linux-gnu, also >>>>>>>> bootstrapped on ppc64le P9 with bootstrap-O3 config. >>>>>>>> >>>>>>>> Does the attached patch meet what you expect? >>>>>>> >>>>>>> So yeah, it's probably close to what is sensible. Not sure >>>>>>> whether optimizing the loops for the !only_push_innermost_p >>>>>>> case is important - if we manage to produce a single >>>>>>> walker with conditionals based on 'flags' then IPA-CP should >>>>>>> produce optimal clones as well I guess. >>>>>>> >>>>>> >>>>>> Thanks for the comments, the updated v2 is attached. >>>>>> Comparing with v1, it does: >>>>>> >>>>>> - Unify one single loop tree walker for all orders. >>>>>> - Move walk_loop_tree out-of-line to cfgloop.c. >>>>>> - Keep the linear search for LI_ONLY_INNERMOST with >>>>>> tree_root of fn loops. >>>>>> - Use class loop * instead of loop_p. >>>>>> >>>>>> Bootstrapped & regtested on powerpc64le-linux-gnu Power9 >>>>>> (with/without the hunk for LI_ONLY_INNERMOST linear search, >>>>>> it can have the coverage to exercise LI_ONLY_INNERMOST >>>>>> in walk_loop_tree when "without"). >>>>>> >>>>>> Is it ok for trunk? >>>>> >>>>> Looks good to me. I think that the 'mn' was an optimization >>>>> for the linear walk and it's cheaper to pointer test against >>>>> the actual 'root' loop (no need to dereference). Thus >>>>> >>>>> + if (flags & LI_ONLY_INNERMOST && tree_root == loops->tree_root) >>>>> { >>>>> - for (i = 0; vec_safe_iterate (loops_for_fn (fn)->larray, i, &aloop); i++) >>>>> + class loop *aloop; >>>>> + unsigned int i; >>>>> + for (i = 0; vec_safe_iterate (loops->larray, i, &aloop); i++) >>>>> if (aloop != NULL >>>>> && aloop->inner == NULL >>>>> - && aloop->num >= mn) >>>>> + && aloop->num != mn) >>>>> this->to_visit.quick_push (aloop->num); >>>>> >>>>> could elide the aloop->num != mn check and start iterating from 1, >>>>> since loops->tree_root->num == 0 >>>>> >>>>> and the walk_loop_tree could simply do >>>>> >>>>> class loop *exclude = flags & LI_INCLUDE_ROOT ? NULL : root; >>>>> >>>>> and pointer test aloop against exclude. That avoids the idea that >>>>> 'mn' is a vehicle to exclude one random loop from the iteration. >>>>> >>>> >>>> Good idea! Thanks for the comments! The attached v3 has addressed >>>> the review comments on "mn". >>>> >>>> Bootstrapped & regtested again on powerpc64le-linux-gnu Power9 >>>> (with/without the hunk for LI_ONLY_INNERMOST linear search). >>>> >>>> Is it ok for trunk? >>> >>> + /* Early handle root without any inner loops, make later >>> + processing simpler, that is all loops processed in the >>> + following while loop are impossible to be root. */ >>> + if (!root->inner) >>> + { >>> + if (root != exclude) >>> + this->to_visit.quick_push (root->num); >>> + return; >>> + } >>> >>> could be >>> >>> if (!root->inner) >>> { >>> if (flags & LI_INCLUDE_ROOT) >>> this->to_visit.quick_push (root->num); >>> } >>> >> >> OK, I thought wrongly that all places with "exclude" might be >> more consistent, so gave up to use flags directly. :) >> >>> + class loop *aloop; >>> + for (aloop = root; >>> + aloop->inner != NULL; >>> + aloop = aloop->inner) >>> + { >>> + if (preorder_p && aloop != exclude) >>> + this->to_visit.quick_push (aloop->num); >>> + continue; >>> + } >>> >>> could be >>> >>> + class loop *aloop; >>> + for (aloop = root->inner; >>> + aloop->inner != NULL; >>> + aloop = aloop->inner) >>> + { >>> + if (preorder_p) >>> + this->to_visit.quick_push (aloop->num); >>> + continue; >>> + } >>> >> >> This seems wrong? For preorder_p, we might miss to push root >> when root->inner isn't NULL. The below "else if" makes it safe. > > oops, yes. > >> @@ -2125,17 +2125,19 @@ loops_list::walk_loop_tree (class loop *root, unsigned flags) >> following while loop are impossible to be root. */ >> if (!root->inner) >> { >> - if (root != exclude) >> + if (flags & LI_INCLUDE_ROOT) >> this->to_visit.quick_push (root->num); >> return; >> } >> + else if (preorder_p && flags & LI_INCLUDE_ROOT) >> + this->to_visit.quick_push (root->num); >> >>> + /* When visiting from innermost, we need to consider root here >>> + since the previous while loop doesn't handle it. */ >>> + if (from_innermost_p && root != exclude) >>> + this->to_visit.quick_push (root->num); >>> >>> could be like the first. I think that's more clear even. Sorry for >>> finding a better solution again. >>> >> >> It's totally fine, thanks for all the nice suggestions! :) >> >>> OK with that change >>> >> >> Thanks, the attached diff is the delta against v3, excepting for >> the "else if", the other changes follow the suggestion above. >> >> Could you have another look to confirm? > > I'm missing the line that removes 'exclude', other than that it looks > OK. > Thanks! Bootstrapped & regress-tested on powerpc64le-linux-gnu P9, x86_64-redhat-linux and aarch64-linux-gnu. Committed in r12-2756. BR, Kewen