From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x632.google.com (mail-ej1-x632.google.com [IPv6:2a00:1450:4864:20::632]) by sourceware.org (Postfix) with ESMTPS id 6BB39397200E for ; Wed, 4 Aug 2021 12:04:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6BB39397200E Received: by mail-ej1-x632.google.com with SMTP id oz16so3326587ejc.7 for ; Wed, 04 Aug 2021 05:04:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=TUXYqFdgRGNgwdax0xAAXgGgwSsH+xdmPyGF3fqwZCM=; b=pQvUusCND+U7JYuFMczjWnm28t8wj2E261TiYMbomfGdlUTMCoREVB41rc6Ws4/jj/ 7l2ErCgg+TQGxtmGgMdZZPaaBm7c95vI0ePhwv+BgeF+NBgcih2StK826LQl3Kis+x/k 0Hz1xOo6X2jGoBa3rxN+gK8FlXGfbdBvYacHd6/mZZg9g5FmY/VujV2TwdM9WQ95MyJD zXihDv4gvQfbZdIw5fTFLeyNoqeRt6HokvJkA43ShR8zAvZp+T5xTWmsu1ChHywMkO97 aSKoDUL34xgGciglbiVO40LazrZdVq3Y1orY7pcOI5mjwss5h2KxOMIFSY8y24igRZXd OQRQ== X-Gm-Message-State: AOAM531UFuFG/eUB0xB7oGz4RBNMNSqxBkW0c/eHnPqcyZjmXY+mDN03 WPYEu/M7lUhBJQJcy7h/cS+H+ZItXMJIwc8apvc= X-Google-Smtp-Source: ABdhPJwDgftP0rUfLHDEgh8Nud2aCUtHIuodfpRe6tBPrDuvCSzx9ahqegpMqwiOMsjhnKI9zMjXvugiOQWjptJNxj4= X-Received: by 2002:a17:906:6cb:: with SMTP id v11mr25329949ejb.482.1628078668416; Wed, 04 Aug 2021 05:04:28 -0700 (PDT) MIME-Version: 1.0 References: <0a8b77ba-1d54-1eff-b54d-d2cb1e769e09@linux.ibm.com> <61ac669c-7293-f53a-20c7-158b5a813cee@linux.ibm.com> <221d8a67-264a-b6a9-e705-bfb4a45f14bb@linux.ibm.com> <963b4fca-8ce6-c9d7-0b08-8431fa433322@linux.ibm.com> In-Reply-To: From: Richard Biener Date: Wed, 4 Aug 2021 14:04:17 +0200 Message-ID: Subject: Re: [PATCH v3] Make loops_list support an optional loop_p root To: "Kewen.Lin" Cc: GCC Patches , Segher Boessenkool , Martin Sebor , Bill Schmidt Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Aug 2021 12:04:31 -0000 On Wed, Aug 4, 2021 at 12:47 PM Kewen.Lin wrote: > > on 2021/8/4 =E4=B8=8B=E5=8D=886:01, Richard Biener wrote: > > On Wed, Aug 4, 2021 at 4:36 AM Kewen.Lin wrote: > >> > >> on 2021/8/3 =E4=B8=8B=E5=8D=888:08, Richard Biener wrote: > >>> On Fri, Jul 30, 2021 at 7:20 AM Kewen.Lin wrote= : > >>>> > >>>> on 2021/7/29 =E4=B8=8B=E5=8D=884:01, Richard Biener wrote: > >>>>> On Fri, Jul 23, 2021 at 10:41 AM Kewen.Lin wr= ote: > >>>>>> > >>>>>> on 2021/7/22 =E4=B8=8B=E5=8D=888:56, Richard Biener wrote: > >>>>>>> On Tue, Jul 20, 2021 at 4:37 > >>>>>>> PM Kewen.Lin wrote: > >>>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> This v2 has addressed some review comments/suggestions: > >>>>>>>> > >>>>>>>> - Use "!=3D" instead of "<" in function operator!=3D (const It= er &rhs) > >>>>>>>> - Add new CTOR loops_list (struct loops *loops, unsigned flags= ) > >>>>>>>> to support loop hierarchy tree rather than just a function, > >>>>>>>> and adjust to use loops* accordingly. > >>>>>>> > >>>>>>> I actually meant struct loop *, not struct loops * ;) At the poi= nt > >>>>>>> we pondered to make loop invariant motion work on single > >>>>>>> loop nests we gave up not only but also because it iterates > >>>>>>> over the loop nest but all the iterators only ever can process > >>>>>>> all loops, not say, all loops inside a specific 'loop' (and > >>>>>>> including that 'loop' if LI_INCLUDE_ROOT). So the > >>>>>>> CTOR would take the 'root' of the loop tree as argument. > >>>>>>> > >>>>>>> I see that doesn't trivially fit how loops_list works, at least > >>>>>>> not for LI_ONLY_INNERMOST. But I guess FROM_INNERMOST > >>>>>>> could be adjusted to do ONLY_INNERMOST as well? > >>>>>>> > >>>>>> > >>>>>> > >>>>>> Thanks for the clarification! I just realized that the previous > >>>>>> version with struct loops* is problematic, all traversal is > >>>>>> still bounded with outer_loop =3D=3D NULL. I think what you expec= t > >>>>>> is to respect the given loop_p root boundary. Since we just > >>>>>> record the loops' nums, I think we still need the function* fn? > >>>>> > >>>>> Would it simplify things if we recorded the actual loop *? > >>>>> > >>>> > >>>> I'm afraid it's unsafe to record the loop*. I had the same > >>>> question why the loop iterator uses index rather than loop* when > >>>> I read this at the first time. I guess the design of processing > >>>> loops allows its user to update or even delete the folllowing > >>>> loops to be visited. For example, when the user does some tricks > >>>> on one loop, then it duplicates the loop and its children to > >>>> somewhere and then removes the loop and its children, when > >>>> iterating onto its children later, the "index" way will check its > >>>> validity by get_loop at that point, but the "loop *" way will > >>>> have some recorded pointers to become dangling, can't do the > >>>> validity check on itself, seems to need a side linear search to > >>>> ensure the validity. > >>>> > >>>>> There's still the to_visit reserve which needs a bound on > >>>>> the number of loops for efficiency reasons. > >>>>> > >>>> > >>>> Yes, I still keep the fn in the updated version. > >>>> > >>>>>> So I add one optional argument loop_p root and update the > >>>>>> visiting codes accordingly. Before this change, the previous > >>>>>> visiting uses the outer_loop =3D=3D NULL as the termination condit= ion, > >>>>>> it perfectly includes the root itself, but with this given root, > >>>>>> we have to use it as the termination condition to avoid to iterate > >>>>>> onto its possible existing next. > >>>>>> > >>>>>> For LI_ONLY_INNERMOST, I was thinking whether we can use the > >>>>>> code like: > >>>>>> > >>>>>> struct loops *fn_loops =3D loops_for_fn (fn)->larray; > >>>>>> for (i =3D 0; vec_safe_iterate (fn_loops, i, &aloop); i++) > >>>>>> if (aloop !=3D NULL > >>>>>> && aloop->inner =3D=3D NULL > >>>>>> && flow_loop_nested_p (tree_root, aloop)) > >>>>>> this->to_visit.quick_push (aloop->num); > >>>>>> > >>>>>> it has the stable bound, but if the given root only has several > >>>>>> child loops, it can be much worse if there are many loops in fn. > >>>>>> It seems impossible to predict the given root loop hierarchy size, > >>>>>> maybe we can still use the original linear searching for the case > >>>>>> loops_for_fn (fn) =3D=3D root? But since this visiting seems not = so > >>>>>> performance critical, I chose to share the code originally used > >>>>>> for FROM_INNERMOST, hope it can have better readability and > >>>>>> maintainability. > >>>>> > >>>>> I was indeed looking for something that has execution/storage > >>>>> bound on the subtree we're interested in. If we pull the CTOR > >>>>> out-of-line we can probably keep the linear search for > >>>>> LI_ONLY_INNERMOST when looking at the whole loop tree. > >>>>> > >>>> > >>>> OK, I've moved the suggested single loop tree walker out-of-line > >>>> to cfgloop.c, and brought the linear search back for > >>>> LI_ONLY_INNERMOST when looking at the whole loop tree. > >>>> > >>>>> It just seemed to me that we can eventually re-use a > >>>>> single loop tree walker for all orders, just adjusting the > >>>>> places we push. > >>>>> > >>>> > >>>> Wow, good point! Indeed, I have further unified all orders > >>>> handlings into a single function walk_loop_tree. > >>>> > >>>>>> > >>>>>> Bootstrapped and regtested on powerpc64le-linux-gnu P9, > >>>>>> x86_64-redhat-linux and aarch64-linux-gnu, also > >>>>>> bootstrapped on ppc64le P9 with bootstrap-O3 config. > >>>>>> > >>>>>> Does the attached patch meet what you expect? > >>>>> > >>>>> So yeah, it's probably close to what is sensible. Not sure > >>>>> whether optimizing the loops for the !only_push_innermost_p > >>>>> case is important - if we manage to produce a single > >>>>> walker with conditionals based on 'flags' then IPA-CP should > >>>>> produce optimal clones as well I guess. > >>>>> > >>>> > >>>> Thanks for the comments, the updated v2 is attached. > >>>> Comparing with v1, it does: > >>>> > >>>> - Unify one single loop tree walker for all orders. > >>>> - Move walk_loop_tree out-of-line to cfgloop.c. > >>>> - Keep the linear search for LI_ONLY_INNERMOST with > >>>> tree_root of fn loops. > >>>> - Use class loop * instead of loop_p. > >>>> > >>>> Bootstrapped & regtested on powerpc64le-linux-gnu Power9 > >>>> (with/without the hunk for LI_ONLY_INNERMOST linear search, > >>>> it can have the coverage to exercise LI_ONLY_INNERMOST > >>>> in walk_loop_tree when "without"). > >>>> > >>>> Is it ok for trunk? > >>> > >>> Looks good to me. I think that the 'mn' was an optimization > >>> for the linear walk and it's cheaper to pointer test against > >>> the actual 'root' loop (no need to dereference). Thus > >>> > >>> + if (flags & LI_ONLY_INNERMOST && tree_root =3D=3D loops->tree_root= ) > >>> { > >>> - for (i =3D 0; vec_safe_iterate (loops_for_fn (fn)->larray, i, = &aloop); i++) > >>> + class loop *aloop; > >>> + unsigned int i; > >>> + for (i =3D 0; vec_safe_iterate (loops->larray, i, &aloop); i++= ) > >>> if (aloop !=3D NULL > >>> && aloop->inner =3D=3D NULL > >>> - && aloop->num >=3D mn) > >>> + && aloop->num !=3D mn) > >>> this->to_visit.quick_push (aloop->num); > >>> > >>> could elide the aloop->num !=3D mn check and start iterating from 1, > >>> since loops->tree_root->num =3D=3D 0 > >>> > >>> and the walk_loop_tree could simply do > >>> > >>> class loop *exclude =3D flags & LI_INCLUDE_ROOT ? NULL : root; > >>> > >>> and pointer test aloop against exclude. That avoids the idea that > >>> 'mn' is a vehicle to exclude one random loop from the iteration. > >>> > >> > >> Good idea! Thanks for the comments! The attached v3 has addressed > >> the review comments on "mn". > >> > >> Bootstrapped & regtested again on powerpc64le-linux-gnu Power9 > >> (with/without the hunk for LI_ONLY_INNERMOST linear search). > >> > >> Is it ok for trunk? > > > > + /* Early handle root without any inner loops, make later > > + processing simpler, that is all loops processed in the > > + following while loop are impossible to be root. */ > > + if (!root->inner) > > + { > > + if (root !=3D exclude) > > + this->to_visit.quick_push (root->num); > > + return; > > + } > > > > could be > > > > if (!root->inner) > > { > > if (flags & LI_INCLUDE_ROOT) > > this->to_visit.quick_push (root->num); > > } > > > > OK, I thought wrongly that all places with "exclude" might be > more consistent, so gave up to use flags directly. :) > > > + class loop *aloop; > > + for (aloop =3D root; > > + aloop->inner !=3D NULL; > > + aloop =3D aloop->inner) > > + { > > + if (preorder_p && aloop !=3D exclude) > > + this->to_visit.quick_push (aloop->num); > > + continue; > > + } > > > > could be > > > > + class loop *aloop; > > + for (aloop =3D root->inner; > > + aloop->inner !=3D NULL; > > + aloop =3D aloop->inner) > > + { > > + if (preorder_p) > > + this->to_visit.quick_push (aloop->num); > > + continue; > > + } > > > > This seems wrong? For preorder_p, we might miss to push root > when root->inner isn't NULL. The below "else if" makes it safe. oops, yes. > @@ -2125,17 +2125,19 @@ loops_list::walk_loop_tree (class loop *root, uns= igned flags) > following while loop are impossible to be root. */ > if (!root->inner) > { > - if (root !=3D exclude) > + if (flags & LI_INCLUDE_ROOT) > this->to_visit.quick_push (root->num); > return; > } > + else if (preorder_p && flags & LI_INCLUDE_ROOT) > + this->to_visit.quick_push (root->num); > > > + /* When visiting from innermost, we need to consider root here > > + since the previous while loop doesn't handle it. */ > > + if (from_innermost_p && root !=3D exclude) > > + this->to_visit.quick_push (root->num); > > > > could be like the first. I think that's more clear even. Sorry for > > finding a better solution again. > > > > It's totally fine, thanks for all the nice suggestions! :) > > > OK with that change > > > > Thanks, the attached diff is the delta against v3, excepting for > the "else if", the other changes follow the suggestion above. > > Could you have another look to confirm? I'm missing the line that removes 'exclude', other than that it looks OK. Thanks, Richard. > I'll do the full testing again before committing. > > BR, > Kewen