From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) by sourceware.org (Postfix) with ESMTPS id 770CA393A428 for ; Tue, 3 Aug 2021 12:08:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 770CA393A428 Received: by mail-ej1-x630.google.com with SMTP id e19so36016560ejs.9 for ; Tue, 03 Aug 2021 05:08:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=rMKvjOhMPxS7necqsrf5h6LMpXGyU4M7X6CbaYKwMJU=; b=OLPH35OQTSokagm1ZSW8/qQwcSx8lsTMxg51vgsfyI1os+h17rnf8wB+XGHzUP/oE1 Z4heNr5YWBaLdDkc7ep0ht5PwoP0Xq/syefsYNPHGbIqiSaPClWzJrkhU5yaj6wPCo6u 9F1z4yS6xZikzRUqLTAokts/HgzI53BRy57Ngys0zl4yMPqWlVQjpwpcuHFq4lWpeLJC +A/CxKrHwBnGCPzDkKnvUsun/HZBOfxyIV0NgyeR21pcMKOoz4IWeRhUuF59eCw9vRkB GwfW8UpzRkEhOxNcyl0aL+4g/ma9XwIdpxom+lPE++mJbelfH/0dbLJfvaq/Mmp64a5l ucFQ== X-Gm-Message-State: AOAM530QPLdvHcTi87HF7cKtnnkVEUe3ltnMfQI35HQcAe9D2gNffp/Y eYBL6Oq8KBulMh++h1dtoX13wYMWWUcGJdIPVoalDneGm6s= X-Google-Smtp-Source: ABdhPJyszlVpFJvdciusWv0bMk6rYK6bTDCrTXGgZna6Hh9nfTEpmrN1YmJxMBg6GodgZZEQkmQaELszT0aLm31dB0Q= X-Received: by 2002:a17:906:140e:: with SMTP id p14mr20973513ejc.235.1627992520424; Tue, 03 Aug 2021 05:08:40 -0700 (PDT) MIME-Version: 1.0 References: <0a8b77ba-1d54-1eff-b54d-d2cb1e769e09@linux.ibm.com> <61ac669c-7293-f53a-20c7-158b5a813cee@linux.ibm.com> <221d8a67-264a-b6a9-e705-bfb4a45f14bb@linux.ibm.com> In-Reply-To: <221d8a67-264a-b6a9-e705-bfb4a45f14bb@linux.ibm.com> From: Richard Biener Date: Tue, 3 Aug 2021 14:08:29 +0200 Message-ID: Subject: Re: [PATCH v2] Make loops_list support an optional loop_p root To: "Kewen.Lin" Cc: GCC Patches , Segher Boessenkool , Martin Sebor , Bill Schmidt Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Aug 2021 12:08:45 -0000 On Fri, Jul 30, 2021 at 7:20 AM Kewen.Lin wrote: > > on 2021/7/29 =E4=B8=8B=E5=8D=884:01, Richard Biener wrote: > > On Fri, Jul 23, 2021 at 10:41 AM Kewen.Lin wrote: > >> > >> on 2021/7/22 =E4=B8=8B=E5=8D=888:56, Richard Biener wrote: > >>> On Tue, Jul 20, 2021 at 4:37 > >>> PM Kewen.Lin wrote: > >>>> > >>>> Hi, > >>>> > >>>> This v2 has addressed some review comments/suggestions: > >>>> > >>>> - Use "!=3D" instead of "<" in function operator!=3D (const Iter &= rhs) > >>>> - Add new CTOR loops_list (struct loops *loops, unsigned flags) > >>>> to support loop hierarchy tree rather than just a function, > >>>> and adjust to use loops* accordingly. > >>> > >>> I actually meant struct loop *, not struct loops * ;) At the point > >>> we pondered to make loop invariant motion work on single > >>> loop nests we gave up not only but also because it iterates > >>> over the loop nest but all the iterators only ever can process > >>> all loops, not say, all loops inside a specific 'loop' (and > >>> including that 'loop' if LI_INCLUDE_ROOT). So the > >>> CTOR would take the 'root' of the loop tree as argument. > >>> > >>> I see that doesn't trivially fit how loops_list works, at least > >>> not for LI_ONLY_INNERMOST. But I guess FROM_INNERMOST > >>> could be adjusted to do ONLY_INNERMOST as well? > >>> > >> > >> > >> Thanks for the clarification! I just realized that the previous > >> version with struct loops* is problematic, all traversal is > >> still bounded with outer_loop =3D=3D NULL. I think what you expect > >> is to respect the given loop_p root boundary. Since we just > >> record the loops' nums, I think we still need the function* fn? > > > > Would it simplify things if we recorded the actual loop *? > > > > I'm afraid it's unsafe to record the loop*. I had the same > question why the loop iterator uses index rather than loop* when > I read this at the first time. I guess the design of processing > loops allows its user to update or even delete the folllowing > loops to be visited. For example, when the user does some tricks > on one loop, then it duplicates the loop and its children to > somewhere and then removes the loop and its children, when > iterating onto its children later, the "index" way will check its > validity by get_loop at that point, but the "loop *" way will > have some recorded pointers to become dangling, can't do the > validity check on itself, seems to need a side linear search to > ensure the validity. > > > There's still the to_visit reserve which needs a bound on > > the number of loops for efficiency reasons. > > > > Yes, I still keep the fn in the updated version. > > >> So I add one optional argument loop_p root and update the > >> visiting codes accordingly. Before this change, the previous > >> visiting uses the outer_loop =3D=3D NULL as the termination condition, > >> it perfectly includes the root itself, but with this given root, > >> we have to use it as the termination condition to avoid to iterate > >> onto its possible existing next. > >> > >> For LI_ONLY_INNERMOST, I was thinking whether we can use the > >> code like: > >> > >> struct loops *fn_loops =3D loops_for_fn (fn)->larray; > >> for (i =3D 0; vec_safe_iterate (fn_loops, i, &aloop); i++) > >> if (aloop !=3D NULL > >> && aloop->inner =3D=3D NULL > >> && flow_loop_nested_p (tree_root, aloop)) > >> this->to_visit.quick_push (aloop->num); > >> > >> it has the stable bound, but if the given root only has several > >> child loops, it can be much worse if there are many loops in fn. > >> It seems impossible to predict the given root loop hierarchy size, > >> maybe we can still use the original linear searching for the case > >> loops_for_fn (fn) =3D=3D root? But since this visiting seems not so > >> performance critical, I chose to share the code originally used > >> for FROM_INNERMOST, hope it can have better readability and > >> maintainability. > > > > I was indeed looking for something that has execution/storage > > bound on the subtree we're interested in. If we pull the CTOR > > out-of-line we can probably keep the linear search for > > LI_ONLY_INNERMOST when looking at the whole loop tree. > > > > OK, I've moved the suggested single loop tree walker out-of-line > to cfgloop.c, and brought the linear search back for > LI_ONLY_INNERMOST when looking at the whole loop tree. > > > It just seemed to me that we can eventually re-use a > > single loop tree walker for all orders, just adjusting the > > places we push. > > > > Wow, good point! Indeed, I have further unified all orders > handlings into a single function walk_loop_tree. > > >> > >> Bootstrapped and regtested on powerpc64le-linux-gnu P9, > >> x86_64-redhat-linux and aarch64-linux-gnu, also > >> bootstrapped on ppc64le P9 with bootstrap-O3 config. > >> > >> Does the attached patch meet what you expect? > > > > So yeah, it's probably close to what is sensible. Not sure > > whether optimizing the loops for the !only_push_innermost_p > > case is important - if we manage to produce a single > > walker with conditionals based on 'flags' then IPA-CP should > > produce optimal clones as well I guess. > > > > Thanks for the comments, the updated v2 is attached. > Comparing with v1, it does: > > - Unify one single loop tree walker for all orders. > - Move walk_loop_tree out-of-line to cfgloop.c. > - Keep the linear search for LI_ONLY_INNERMOST with > tree_root of fn loops. > - Use class loop * instead of loop_p. > > Bootstrapped & regtested on powerpc64le-linux-gnu Power9 > (with/without the hunk for LI_ONLY_INNERMOST linear search, > it can have the coverage to exercise LI_ONLY_INNERMOST > in walk_loop_tree when "without"). > > Is it ok for trunk? Looks good to me. I think that the 'mn' was an optimization for the linear walk and it's cheaper to pointer test against the actual 'root' loop (no need to dereference). Thus + if (flags & LI_ONLY_INNERMOST && tree_root =3D=3D loops->tree_root) { - for (i =3D 0; vec_safe_iterate (loops_for_fn (fn)->larray, i, &aloop= ); i++) + class loop *aloop; + unsigned int i; + for (i =3D 0; vec_safe_iterate (loops->larray, i, &aloop); i++) if (aloop !=3D NULL && aloop->inner =3D=3D NULL - && aloop->num >=3D mn) + && aloop->num !=3D mn) this->to_visit.quick_push (aloop->num); could elide the aloop->num !=3D mn check and start iterating from 1, since loops->tree_root->num =3D=3D 0 and the walk_loop_tree could simply do class loop *exclude =3D flags & LI_INCLUDE_ROOT ? NULL : root; and pointer test aloop against exclude. That avoids the idea that 'mn' is a vehicle to exclude one random loop from the iteration. Richard. > BR, > Kewen > ----- > gcc/ChangeLog: > > * cfgloop.h (loops_list::loops_list): Add one optional argument r= oot > and adjust accordingly, update loop tree walking and factor out > to ... > * cfgloop.c (loops_list::walk_loop_tree): ...this. New function.