From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id DA1373857007 for ; Wed, 4 Aug 2021 10:47:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DA1373857007 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 174AXAge059857; Wed, 4 Aug 2021 06:47:31 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3a6ffkxsfw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Aug 2021 06:47:31 -0400 Received: from m0098394.ppops.net (m0098394.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 174AcHGI084686; Wed, 4 Aug 2021 06:47:30 -0400 Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com with ESMTP id 3a6ffkxse6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Aug 2021 06:47:30 -0400 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 174AhXc1021557; Wed, 4 Aug 2021 10:47:26 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma04ams.nl.ibm.com with ESMTP id 3a4x5911nq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Aug 2021 10:47:26 +0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 174AlOIw57737722 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Aug 2021 10:47:24 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DD2B5A406F; Wed, 4 Aug 2021 10:47:23 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0093BA407B; Wed, 4 Aug 2021 10:47:22 +0000 (GMT) Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.147.34]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 4 Aug 2021 10:47:21 +0000 (GMT) From: "Kewen.Lin" Subject: Re: [PATCH v3] Make loops_list support an optional loop_p root To: Richard Biener Cc: GCC Patches , Segher Boessenkool , Martin Sebor , Bill Schmidt References: <0a8b77ba-1d54-1eff-b54d-d2cb1e769e09@linux.ibm.com> <61ac669c-7293-f53a-20c7-158b5a813cee@linux.ibm.com> <221d8a67-264a-b6a9-e705-bfb4a45f14bb@linux.ibm.com> <963b4fca-8ce6-c9d7-0b08-8431fa433322@linux.ibm.com> Message-ID: Date: Wed, 4 Aug 2021 18:47:20 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------3825604DEB8C3B84B716C310" Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: DSmuugZzJWWNkACjcPz6TtU6bP4jcBpA X-Proofpoint-GUID: LpuXJ7anNJuHw-N5fZwNVuA3BFcqdKBQ X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-08-04_03:2021-08-04, 2021-08-04 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 mlxscore=0 malwarescore=0 clxscore=1015 mlxlogscore=999 phishscore=0 adultscore=0 impostorscore=0 bulkscore=0 lowpriorityscore=0 priorityscore=1501 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108040053 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Aug 2021 10:47:36 -0000 This is a multi-part message in MIME format. --------------3825604DEB8C3B84B716C310 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit on 2021/8/4 下午6:01, Richard Biener wrote: > On Wed, Aug 4, 2021 at 4:36 AM Kewen.Lin wrote: >> >> on 2021/8/3 下午8:08, Richard Biener wrote: >>> On Fri, Jul 30, 2021 at 7:20 AM Kewen.Lin wrote: >>>> >>>> on 2021/7/29 下午4:01, Richard Biener wrote: >>>>> On Fri, Jul 23, 2021 at 10:41 AM Kewen.Lin wrote: >>>>>> >>>>>> on 2021/7/22 下午8:56, Richard Biener wrote: >>>>>>> On Tue, Jul 20, 2021 at 4:37 >>>>>>> PM Kewen.Lin wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> This v2 has addressed some review comments/suggestions: >>>>>>>> >>>>>>>> - Use "!=" instead of "<" in function operator!= (const Iter &rhs) >>>>>>>> - Add new CTOR loops_list (struct loops *loops, unsigned flags) >>>>>>>> to support loop hierarchy tree rather than just a function, >>>>>>>> and adjust to use loops* accordingly. >>>>>>> >>>>>>> I actually meant struct loop *, not struct loops * ;) At the point >>>>>>> we pondered to make loop invariant motion work on single >>>>>>> loop nests we gave up not only but also because it iterates >>>>>>> over the loop nest but all the iterators only ever can process >>>>>>> all loops, not say, all loops inside a specific 'loop' (and >>>>>>> including that 'loop' if LI_INCLUDE_ROOT). So the >>>>>>> CTOR would take the 'root' of the loop tree as argument. >>>>>>> >>>>>>> I see that doesn't trivially fit how loops_list works, at least >>>>>>> not for LI_ONLY_INNERMOST. But I guess FROM_INNERMOST >>>>>>> could be adjusted to do ONLY_INNERMOST as well? >>>>>>> >>>>>> >>>>>> >>>>>> Thanks for the clarification! I just realized that the previous >>>>>> version with struct loops* is problematic, all traversal is >>>>>> still bounded with outer_loop == NULL. I think what you expect >>>>>> is to respect the given loop_p root boundary. Since we just >>>>>> record the loops' nums, I think we still need the function* fn? >>>>> >>>>> Would it simplify things if we recorded the actual loop *? >>>>> >>>> >>>> I'm afraid it's unsafe to record the loop*. I had the same >>>> question why the loop iterator uses index rather than loop* when >>>> I read this at the first time. I guess the design of processing >>>> loops allows its user to update or even delete the folllowing >>>> loops to be visited. For example, when the user does some tricks >>>> on one loop, then it duplicates the loop and its children to >>>> somewhere and then removes the loop and its children, when >>>> iterating onto its children later, the "index" way will check its >>>> validity by get_loop at that point, but the "loop *" way will >>>> have some recorded pointers to become dangling, can't do the >>>> validity check on itself, seems to need a side linear search to >>>> ensure the validity. >>>> >>>>> There's still the to_visit reserve which needs a bound on >>>>> the number of loops for efficiency reasons. >>>>> >>>> >>>> Yes, I still keep the fn in the updated version. >>>> >>>>>> So I add one optional argument loop_p root and update the >>>>>> visiting codes accordingly. Before this change, the previous >>>>>> visiting uses the outer_loop == NULL as the termination condition, >>>>>> it perfectly includes the root itself, but with this given root, >>>>>> we have to use it as the termination condition to avoid to iterate >>>>>> onto its possible existing next. >>>>>> >>>>>> For LI_ONLY_INNERMOST, I was thinking whether we can use the >>>>>> code like: >>>>>> >>>>>> struct loops *fn_loops = loops_for_fn (fn)->larray; >>>>>> for (i = 0; vec_safe_iterate (fn_loops, i, &aloop); i++) >>>>>> if (aloop != NULL >>>>>> && aloop->inner == NULL >>>>>> && flow_loop_nested_p (tree_root, aloop)) >>>>>> this->to_visit.quick_push (aloop->num); >>>>>> >>>>>> it has the stable bound, but if the given root only has several >>>>>> child loops, it can be much worse if there are many loops in fn. >>>>>> It seems impossible to predict the given root loop hierarchy size, >>>>>> maybe we can still use the original linear searching for the case >>>>>> loops_for_fn (fn) == root? But since this visiting seems not so >>>>>> performance critical, I chose to share the code originally used >>>>>> for FROM_INNERMOST, hope it can have better readability and >>>>>> maintainability. >>>>> >>>>> I was indeed looking for something that has execution/storage >>>>> bound on the subtree we're interested in. If we pull the CTOR >>>>> out-of-line we can probably keep the linear search for >>>>> LI_ONLY_INNERMOST when looking at the whole loop tree. >>>>> >>>> >>>> OK, I've moved the suggested single loop tree walker out-of-line >>>> to cfgloop.c, and brought the linear search back for >>>> LI_ONLY_INNERMOST when looking at the whole loop tree. >>>> >>>>> It just seemed to me that we can eventually re-use a >>>>> single loop tree walker for all orders, just adjusting the >>>>> places we push. >>>>> >>>> >>>> Wow, good point! Indeed, I have further unified all orders >>>> handlings into a single function walk_loop_tree. >>>> >>>>>> >>>>>> Bootstrapped and regtested on powerpc64le-linux-gnu P9, >>>>>> x86_64-redhat-linux and aarch64-linux-gnu, also >>>>>> bootstrapped on ppc64le P9 with bootstrap-O3 config. >>>>>> >>>>>> Does the attached patch meet what you expect? >>>>> >>>>> So yeah, it's probably close to what is sensible. Not sure >>>>> whether optimizing the loops for the !only_push_innermost_p >>>>> case is important - if we manage to produce a single >>>>> walker with conditionals based on 'flags' then IPA-CP should >>>>> produce optimal clones as well I guess. >>>>> >>>> >>>> Thanks for the comments, the updated v2 is attached. >>>> Comparing with v1, it does: >>>> >>>> - Unify one single loop tree walker for all orders. >>>> - Move walk_loop_tree out-of-line to cfgloop.c. >>>> - Keep the linear search for LI_ONLY_INNERMOST with >>>> tree_root of fn loops. >>>> - Use class loop * instead of loop_p. >>>> >>>> Bootstrapped & regtested on powerpc64le-linux-gnu Power9 >>>> (with/without the hunk for LI_ONLY_INNERMOST linear search, >>>> it can have the coverage to exercise LI_ONLY_INNERMOST >>>> in walk_loop_tree when "without"). >>>> >>>> Is it ok for trunk? >>> >>> Looks good to me. I think that the 'mn' was an optimization >>> for the linear walk and it's cheaper to pointer test against >>> the actual 'root' loop (no need to dereference). Thus >>> >>> + if (flags & LI_ONLY_INNERMOST && tree_root == loops->tree_root) >>> { >>> - for (i = 0; vec_safe_iterate (loops_for_fn (fn)->larray, i, &aloop); i++) >>> + class loop *aloop; >>> + unsigned int i; >>> + for (i = 0; vec_safe_iterate (loops->larray, i, &aloop); i++) >>> if (aloop != NULL >>> && aloop->inner == NULL >>> - && aloop->num >= mn) >>> + && aloop->num != mn) >>> this->to_visit.quick_push (aloop->num); >>> >>> could elide the aloop->num != mn check and start iterating from 1, >>> since loops->tree_root->num == 0 >>> >>> and the walk_loop_tree could simply do >>> >>> class loop *exclude = flags & LI_INCLUDE_ROOT ? NULL : root; >>> >>> and pointer test aloop against exclude. That avoids the idea that >>> 'mn' is a vehicle to exclude one random loop from the iteration. >>> >> >> Good idea! Thanks for the comments! The attached v3 has addressed >> the review comments on "mn". >> >> Bootstrapped & regtested again on powerpc64le-linux-gnu Power9 >> (with/without the hunk for LI_ONLY_INNERMOST linear search). >> >> Is it ok for trunk? > > + /* Early handle root without any inner loops, make later > + processing simpler, that is all loops processed in the > + following while loop are impossible to be root. */ > + if (!root->inner) > + { > + if (root != exclude) > + this->to_visit.quick_push (root->num); > + return; > + } > > could be > > if (!root->inner) > { > if (flags & LI_INCLUDE_ROOT) > this->to_visit.quick_push (root->num); > } > OK, I thought wrongly that all places with "exclude" might be more consistent, so gave up to use flags directly. :) > + class loop *aloop; > + for (aloop = root; > + aloop->inner != NULL; > + aloop = aloop->inner) > + { > + if (preorder_p && aloop != exclude) > + this->to_visit.quick_push (aloop->num); > + continue; > + } > > could be > > + class loop *aloop; > + for (aloop = root->inner; > + aloop->inner != NULL; > + aloop = aloop->inner) > + { > + if (preorder_p) > + this->to_visit.quick_push (aloop->num); > + continue; > + } > This seems wrong? For preorder_p, we might miss to push root when root->inner isn't NULL. The below "else if" makes it safe. @@ -2125,17 +2125,19 @@ loops_list::walk_loop_tree (class loop *root, unsigned flags) following while loop are impossible to be root. */ if (!root->inner) { - if (root != exclude) + if (flags & LI_INCLUDE_ROOT) this->to_visit.quick_push (root->num); return; } + else if (preorder_p && flags & LI_INCLUDE_ROOT) + this->to_visit.quick_push (root->num); > + /* When visiting from innermost, we need to consider root here > + since the previous while loop doesn't handle it. */ > + if (from_innermost_p && root != exclude) > + this->to_visit.quick_push (root->num); > > could be like the first. I think that's more clear even. Sorry for > finding a better solution again. > It's totally fine, thanks for all the nice suggestions! :) > OK with that change > Thanks, the attached diff is the delta against v3, excepting for the "else if", the other changes follow the suggestion above. Could you have another look to confirm? I'll do the full testing again before committing. BR, Kewen --------------3825604DEB8C3B84B716C310 Content-Type: text/plain; charset=UTF-8; x-mac-type="0"; x-mac-creator="0"; name="delta.diff" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="delta.diff" ZGlmZiAtLWdpdCBhL2djYy9jZmdsb29wLmMgYi9nY2MvY2ZnbG9vcC5jCmluZGV4IGFmYmFh MjE2Y2U1Li40YzZkOGVkOTBkMiAxMDA2NDQKLS0tIGEvZ2NjL2NmZ2xvb3AuYworKysgYi9n Y2MvY2ZnbG9vcC5jCkBAIC0yMTI1LDE3ICsyMTI1LDE5IEBAIGxvb3BzX2xpc3Q6OndhbGtf bG9vcF90cmVlIChjbGFzcyBsb29wICpyb290LCB1bnNpZ25lZCBmbGFncykKICAgICAgZm9s bG93aW5nIHdoaWxlIGxvb3AgYXJlIGltcG9zc2libGUgdG8gYmUgcm9vdC4gICovCiAgIGlm ICghcm9vdC0+aW5uZXIpCiAgICAgewotICAgICAgaWYgKHJvb3QgIT0gZXhjbHVkZSkKKyAg ICAgIGlmIChmbGFncyAmIExJX0lOQ0xVREVfUk9PVCkKIAl0aGlzLT50b192aXNpdC5xdWlj a19wdXNoIChyb290LT5udW0pOwogICAgICAgcmV0dXJuOwogICAgIH0KKyAgZWxzZSBpZiAo cHJlb3JkZXJfcCAmJiBmbGFncyAmIExJX0lOQ0xVREVfUk9PVCkKKyAgICB0aGlzLT50b192 aXNpdC5xdWlja19wdXNoIChyb290LT5udW0pOwogCiAgIGNsYXNzIGxvb3AgKmFsb29wOwot ICBmb3IgKGFsb29wID0gcm9vdDsKKyAgZm9yIChhbG9vcCA9IHJvb3QtPmlubmVyOwogICAg ICAgIGFsb29wLT5pbm5lciAhPSBOVUxMOwogICAgICAgIGFsb29wID0gYWxvb3AtPmlubmVy KQogICAgIHsKLSAgICAgIGlmIChwcmVvcmRlcl9wICYmIGFsb29wICE9IGV4Y2x1ZGUpCisg ICAgICBpZiAocHJlb3JkZXJfcCkKIAl0aGlzLT50b192aXNpdC5xdWlja19wdXNoIChhbG9v cC0+bnVtKTsKICAgICAgIGNvbnRpbnVlOwogICAgIH0KQEAgLTIxNjUsNyArMjE2Nyw3IEBA IGxvb3BzX2xpc3Q6OndhbGtfbG9vcF90cmVlIChjbGFzcyBsb29wICpyb290LCB1bnNpZ25l ZCBmbGFncykKIAogICAvKiBXaGVuIHZpc2l0aW5nIGZyb20gaW5uZXJtb3N0LCB3ZSBuZWVk IHRvIGNvbnNpZGVyIHJvb3QgaGVyZQogICAgICBzaW5jZSB0aGUgcHJldmlvdXMgd2hpbGUg bG9vcCBkb2Vzbid0IGhhbmRsZSBpdC4gICovCi0gIGlmIChmcm9tX2lubmVybW9zdF9wICYm IHJvb3QgIT0gZXhjbHVkZSkKKyAgaWYgKGZyb21faW5uZXJtb3N0X3AgJiYgZmxhZ3MgJiBM SV9JTkNMVURFX1JPT1QpCiAgICAgdGhpcy0+dG9fdmlzaXQucXVpY2tfcHVzaCAocm9vdC0+ bnVtKTsKIH0KIAo= --------------3825604DEB8C3B84B716C310--