From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) by sourceware.org (Postfix) with ESMTPS id 0EC4039AE85F for ; Thu, 29 Jul 2021 08:01:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0EC4039AE85F Received: by mail-ed1-x52d.google.com with SMTP id u12so7003631eds.2 for ; Thu, 29 Jul 2021 01:01:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=3Aiqw87a28btaCtZhmjz3ozBWJ3wjuMvOZ34MuzsBzQ=; b=a7w5FsEHfJySZKoaH18gL9CBQaMKOfjjNQzEqgXPLCO/SImuxsyxzo1RGv4SdnOIcv cSJuK6jf2jnlhbiyzxZqJKK0frc5Ns4pOl+Sf83ncwkrit2nuTfSNFL9OoVDxst/BhnQ sgQ8g5lK91IZKwcq+eHofYLn1o2F933cbhkEZfk1fU1/3bBqLp/Xzi9uYpwH3osLM7hp 4II/UnuJVJ1toj+PtD+cDgrf9MZyMtQMOzSs7THditE70FUfYQY5k62fPvncUhLSCYmM O/lrDVQpwGhPg5x9c+xbr3qxvbuStKE2gOHlOUEKRec72sPMzLMbLkjIIXPdnd/TZ7I3 NsDA== X-Gm-Message-State: AOAM533vnVsuf6gTGgfhCysq7lVzhFzsIr7D3H1KcNVVBOrqYYiXhIUU rM7kO0WlgVUPloc3f6pu0TVtS9/xuez6zsKtjKg= X-Google-Smtp-Source: ABdhPJwSbN83Ep1O1hzn5impysBVuGn9GfSygO02SxGcY06FQC456VVjNoSQdAgdNCUArehod+rUesKRrsY61z+ER10= X-Received: by 2002:a05:6402:524b:: with SMTP id t11mr4479479edd.361.1627545671049; Thu, 29 Jul 2021 01:01:11 -0700 (PDT) MIME-Version: 1.0 References: <0a8b77ba-1d54-1eff-b54d-d2cb1e769e09@linux.ibm.com> <61ac669c-7293-f53a-20c7-158b5a813cee@linux.ibm.com> In-Reply-To: <61ac669c-7293-f53a-20c7-158b5a813cee@linux.ibm.com> From: Richard Biener Date: Thu, 29 Jul 2021 10:01:00 +0200 Message-ID: Subject: Re: [PATCH] Make loops_list support an optional loop_p root To: "Kewen.Lin" Cc: GCC Patches , Jakub Jelinek , Jonathan Wakely , Segher Boessenkool , Richard Sandiford , Trevor Saunders , Martin Sebor , Bill Schmidt Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2021 08:01:13 -0000 On Fri, Jul 23, 2021 at 10:41 AM Kewen.Lin wrote: > > on 2021/7/22 =E4=B8=8B=E5=8D=888:56, Richard Biener wrote: > > On Tue, Jul 20, 2021 at 4:37 > > PM Kewen.Lin wrote: > >> > >> Hi, > >> > >> This v2 has addressed some review comments/suggestions: > >> > >> - Use "!=3D" instead of "<" in function operator!=3D (const Iter &rh= s) > >> - Add new CTOR loops_list (struct loops *loops, unsigned flags) > >> to support loop hierarchy tree rather than just a function, > >> and adjust to use loops* accordingly. > > > > I actually meant struct loop *, not struct loops * ;) At the point > > we pondered to make loop invariant motion work on single > > loop nests we gave up not only but also because it iterates > > over the loop nest but all the iterators only ever can process > > all loops, not say, all loops inside a specific 'loop' (and > > including that 'loop' if LI_INCLUDE_ROOT). So the > > CTOR would take the 'root' of the loop tree as argument. > > > > I see that doesn't trivially fit how loops_list works, at least > > not for LI_ONLY_INNERMOST. But I guess FROM_INNERMOST > > could be adjusted to do ONLY_INNERMOST as well? > > > > > Thanks for the clarification! I just realized that the previous > version with struct loops* is problematic, all traversal is > still bounded with outer_loop =3D=3D NULL. I think what you expect > is to respect the given loop_p root boundary. Since we just > record the loops' nums, I think we still need the function* fn? Would it simplify things if we recorded the actual loop *? There's still the to_visit reserve which needs a bound on the number of loops for efficiency reasons. > So I add one optional argument loop_p root and update the > visiting codes accordingly. Before this change, the previous > visiting uses the outer_loop =3D=3D NULL as the termination condition, > it perfectly includes the root itself, but with this given root, > we have to use it as the termination condition to avoid to iterate > onto its possible existing next. > > For LI_ONLY_INNERMOST, I was thinking whether we can use the > code like: > > struct loops *fn_loops =3D loops_for_fn (fn)->larray; > for (i =3D 0; vec_safe_iterate (fn_loops, i, &aloop); i++) > if (aloop !=3D NULL > && aloop->inner =3D=3D NULL > && flow_loop_nested_p (tree_root, aloop)) > this->to_visit.quick_push (aloop->num); > > it has the stable bound, but if the given root only has several > child loops, it can be much worse if there are many loops in fn. > It seems impossible to predict the given root loop hierarchy size, > maybe we can still use the original linear searching for the case > loops_for_fn (fn) =3D=3D root? But since this visiting seems not so > performance critical, I chose to share the code originally used > for FROM_INNERMOST, hope it can have better readability and > maintainability. I was indeed looking for something that has execution/storage bound on the subtree we're interested in. If we pull the CTOR out-of-line we can probably keep the linear search for LI_ONLY_INNERMOST when looking at the whole loop tree. It just seemed to me that we can eventually re-use a single loop tree walker for all orders, just adjusting the places we push. > > Bootstrapped and regtested on powerpc64le-linux-gnu P9, > x86_64-redhat-linux and aarch64-linux-gnu, also > bootstrapped on ppc64le P9 with bootstrap-O3 config. > > Does the attached patch meet what you expect? So yeah, it's probably close to what is sensible. Not sure whether optimizing the loops for the !only_push_innermost_p case is important - if we manage to produce a single walker with conditionals based on 'flags' then IPA-CP should produce optimal clones as well I guess. Richard. > > BR, > Kewen > ----- > gcc/ChangeLog: > > * cfgloop.h (loops_list::loops_list): Add one optional argument r= oot > and adjust accordingly.