From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x62a.google.com (mail-ej1-x62a.google.com [IPv6:2a00:1450:4864:20::62a]) by sourceware.org (Postfix) with ESMTPS id 9343A3858001 for ; Fri, 25 Jun 2021 07:58:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9343A3858001 Received: by mail-ej1-x62a.google.com with SMTP id ot9so12712910ejb.8 for ; Fri, 25 Jun 2021 00:58:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=IjDSiWHQolFvMl0HIXrbHj84V/bL1Tlwye5+pdes+tc=; b=jAeOz66rUTHzWgkfSGUWpZNHsYce2mP2DbsGEyHQDbt1rvqU5wSbKXJUqATSGv7pT/ OCKHIW+IVHEx6DxSrXd3W1SXEtSCLJ9Pbwmdj574W6vv67j4jj3UJCrBido41dkGrm+R 13o8UMNuBZHRkvf4Rj9gCOm+42rH3o26dE93DDJmGY6RWXPHIIUnb5daovQivrMm5rZx 4rLqN/GoYRHVtevb1qk6PUNAwGTzi7qlrkCq15Q/hkj0bcxTz6g7Vmu+iirSzRglO/oW lYfZedtqA0xhtsdGNGP+WLik3z5oM1xQZWANzej8BWomn/MeOqVXiOFH5oNknJpY9oo3 fdpw== X-Gm-Message-State: AOAM530S8wdBw6nsdHjilLktErdjo3y0yMjkLj8b8a2Ai0bfWi2vxt8A GBFjxIbD9D6J/fu4j9HV3RTfpyNTN14z4JFGmWk= X-Google-Smtp-Source: ABdhPJwjx49FPEg2N8fiQJQqAoRYrUPxTxFITqxP50pyFZAfpY/CKux3wM0/63zAE0FEX8qeYeIzrG5zXAi7lcp8UnI= X-Received: by 2002:a17:906:a38d:: with SMTP id k13mr9766309ejz.250.1624607900666; Fri, 25 Jun 2021 00:58:20 -0700 (PDT) MIME-Version: 1.0 References: <07775b9d-b8eb-48cb-57ef-9cc278d38967@redhat.com> <550e1ffd-957c-f348-49b6-b980c072c307@redhat.com> <3ac5e4e1-405f-fd5a-cb36-433a93f77df4@gmail.com> In-Reply-To: From: Richard Biener Date: Fri, 25 Jun 2021 09:58:09 +0200 Message-ID: Subject: Re: replacing the backwards threader and more To: Jeff Law Cc: Aldy Hernandez , GCC Mailing List , Andrew MacLeod Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Jun 2021 07:58:23 -0000 On Fri, Jun 25, 2021 at 9:54 AM Richard Biener wrote: > > On Thu, Jun 24, 2021 at 6:14 PM Jeff Law wrote: > > > > > > > > On 6/21/2021 8:40 AM, Aldy Hernandez wrote: > > > > > > > > > On 6/9/21 2:09 PM, Richard Biener wrote: > > >> On Wed, Jun 9, 2021 at 1:50 PM Aldy Hernandez via Gcc > > >> wrote: > > >>> > > >>> Hi Jeff. Hi folks. > > >>> > > >>> What started as a foray into severing the old (forward) threader's > > >>> dependency on evrp, turned into a rewrite of the backwards threader > > >>> code. I'd like to discuss the possibility of replacing the current > > >>> backwards threader with a new one that gets far more threads and can > > >>> potentially subsume all threaders in the future. > > >>> > > >>> I won't include code here, as it will just detract from the high level > > >>> discussion. But if it helps, I could post what I have, which just > > >>> needs > > >>> some cleanups and porting to the latest trunk changes Andrew has made. > > >>> > > >>> Currently the backwards threader works by traversing DEF chains through > > >>> PHIs leading to possible paths that start in a constant. When such a > > >>> path is found, it is checked to see if it is profitable, and if so, the > > >>> constant path is threaded. The current implementation is rather > > >>> limited > > >>> since backwards paths must end in a constant. For example, the > > >>> backwards threader can't get any of the tests in > > >>> gcc.dg/tree-ssa/ssa-thread-14.c: > > >>> > > >>> if (a && b) > > >>> foo (); > > >>> if (!b && c) > > >>> bar (); > > >>> > > >>> etc. > > >>> > > >>> After my refactoring patches to the threading code, it is now possible > > >>> to drop in an alternate implementation that shares the profitability > > >>> code (is this path profitable?), the jump registry, and the actual jump > > >>> threading code. I have leveraged this to write a ranger-based threader > > >>> that gets every single thread the current code gets, plus 90-130% more. > > >>> > > >>> Here are the details from the branch, which should be very similar to > > >>> trunk. I'm presenting the branch numbers because they contain Andrew's > > >>> upcoming relational query which significantly juices up the results. > > >>> > > >>> New threader: > > >>> ethread:65043 (+3.06%) > > >>> dom:32450 (-13.3%) > > >>> backwards threader:72482 (+89.6%) > > >>> vrp:40532 (-30.7%) > > >>> Total threaded: 210507 (+6.70%) > > >>> > > >>> This means that the new code gets 89.6% more jump threading > > >>> opportunities than the code I want to replace. In doing so, it reduces > > >>> the amount of DOM threading opportunities by 13.3% and by 30.7% from > > >>> the > > >>> VRP jump threader. The total improvement across the jump threading > > >>> opportunities in the compiler is 6.70%. > > >>> > > >>> However, these are pessimistic numbers... > > >>> > > >>> I have noticed that some of the threading opportunities that DOM and > > >>> VRP > > >>> now get are not because they're smarter, but because they're picking up > > >>> opportunities that the new code exposes. I experimented with > > >>> running an > > >>> iterative threader, and then seeing what VRP and DOM could actually > > >>> get. > > >>> This is too expensive to do in real life, but it at least shows what > > >>> the effect of the new code is on DOM/VRP's abilities: > > >>> > > >>> Iterative threader: > > >>> ethread:65043 (+3.06%) > > >>> dom:31170 (-16.7%) > > >>> thread:86717 (+127%) > > >>> vrp:33851 (-42.2%) > > >>> Total threaded: 216781 (+9.90%) > > >>> > > >>> This means that the new code not only gets 127% more cases, but it > > >>> reduces the DOM and VRP opportunities considerably (16.7% and 42.2% > > >>> respectively). The end result is that we have the possibility of > > >>> getting almost 10% more jump threading opportunities in the entire > > >>> compilation run. > > >> > > >> Yeah, DOM once was iterating ... > > >> > > >> You probably have noticed that we have very man (way too many) > > >> 'thread' passes, often in close succession with each other or > > >> DOM or VRP. So in the above numbers I wonder if you can break > > >> down the numbers individually for the actual passes (in their order)? > > > > > > As promised. > > > > > > *** LEGACY: > > > ethread42:61152 30.1369% (61152 threads for 30.1% of total) > > > thread117:29646 14.6101% > > > vrp118:62088 30.5982% > > > thread132:2232 1.09997% > > > dom133:31116 15.3346% > > > thread197:1950 0.960998% > > > dom198:10661 5.25395% > > > thread200:587 0.289285% > > > vrp201:3482 1.716% > > > Total: 202914 > > > > > The above is from current trunk with my patches applied, defaulting to > > > legacy mode. It follows the pass number nomenclature in the > > > *.statistics files. > > > > > > New threader code (This is what I envision current trunk to look with > > > my patchset): > > > > > > *** RANGER: > > > ethread42:64389 30.2242% > > > thread117:49449 23.2114% > > > vrp118:46118 21.6478% > > > thread132:8153 3.82702% > > > dom133:27168 12.7527% > > > thread197:5542 2.60141% > > > dom198:8191 3.84485% > > > thread200:1038 0.487237% > > > vrp201:2990 1.40351% > > > Total: 213038 > > So this makes me think we should focus on dropping thread197, thread200, > > & vrp201 and I'd probably focus on vrp201 first since we know we want to > > get rid of it anyway and that may change the data for thread???. Then > > I'd be looking at thread200 and thread197 in that order. I suspect that > > at least some of the cases in thread200 and vrp201 are exposed by dom198. > > It might be interesting to see replacing vrp201 with evrp and run > thread200 after > that or alternatively move dom198 after vrp201 and then replace vrp201 with > evrp and kill off thread200 completely. Supposedly we will for now keep the > forward threading in DOM. Since we have FRE right before DOM it's > likely only threading that DOM performs now. (but of course testcases > will need adjustment) So instead of NEXT_PASS (pass_fre, false /* may_iterate */); /* After late FRE we rewrite no longer addressed locals into SSA form if possible. */ NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); NEXT_PASS (pass_strlen); NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */); /* Threading can leave many const/copy propagations in the IL. Clean them up. Instead of just copy_prop, we use ccp to compute alignment and nonzero bits. */ NEXT_PASS (pass_ccp, true /* nonzero_p */); do NEXT_PASS (pass_fre, false /* may_iterate */); NEXT_PASS (pass_early_vrp); NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_strlen); /// ??? /* Threading can leave many const/copy propagations in the IL. Clean them up. Instead of just copy_prop, we use ccp to compute alignment and nonzero bits. */ NEXT_PASS (pass_ccp, true /* nonzero_p */); where I only wonder about pass_strlen placement and how much it depends on earlier threading (I'd just run it after FRE?) > Richard. > > > > > Jeff