From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.guenther@gmail.com>
Received: from mail-ej1-x62a.google.com (mail-ej1-x62a.google.com
 [IPv6:2a00:1450:4864:20::62a])
 by sourceware.org (Postfix) with ESMTPS id 9343A3858001
 for <gcc@gcc.gnu.org>; Fri, 25 Jun 2021 07:58:21 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9343A3858001
Received: by mail-ej1-x62a.google.com with SMTP id ot9so12712910ejb.8
 for <gcc@gcc.gnu.org>; Fri, 25 Jun 2021 00:58:21 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=IjDSiWHQolFvMl0HIXrbHj84V/bL1Tlwye5+pdes+tc=;
 b=jAeOz66rUTHzWgkfSGUWpZNHsYce2mP2DbsGEyHQDbt1rvqU5wSbKXJUqATSGv7pT/
 OCKHIW+IVHEx6DxSrXd3W1SXEtSCLJ9Pbwmdj574W6vv67j4jj3UJCrBido41dkGrm+R
 13o8UMNuBZHRkvf4Rj9gCOm+42rH3o26dE93DDJmGY6RWXPHIIUnb5daovQivrMm5rZx
 4rLqN/GoYRHVtevb1qk6PUNAwGTzi7qlrkCq15Q/hkj0bcxTz6g7Vmu+iirSzRglO/oW
 lYfZedtqA0xhtsdGNGP+WLik3z5oM1xQZWANzej8BWomn/MeOqVXiOFH5oNknJpY9oo3
 fdpw==
X-Gm-Message-State: AOAM530S8wdBw6nsdHjilLktErdjo3y0yMjkLj8b8a2Ai0bfWi2vxt8A
 GBFjxIbD9D6J/fu4j9HV3RTfpyNTN14z4JFGmWk=
X-Google-Smtp-Source: ABdhPJwjx49FPEg2N8fiQJQqAoRYrUPxTxFITqxP50pyFZAfpY/CKux3wM0/63zAE0FEX8qeYeIzrG5zXAi7lcp8UnI=
X-Received: by 2002:a17:906:a38d:: with SMTP id
 k13mr9766309ejz.250.1624607900666; 
 Fri, 25 Jun 2021 00:58:20 -0700 (PDT)
MIME-Version: 1.0
References: <07775b9d-b8eb-48cb-57ef-9cc278d38967@redhat.com>
 <CAFiYyc2OgPLMyvoGk=XQY0xJvLDG-dPCV+6dQj-WuRVDLocuKA@mail.gmail.com>
 <550e1ffd-957c-f348-49b6-b980c072c307@redhat.com>
 <3ac5e4e1-405f-fd5a-cb36-433a93f77df4@gmail.com>
 <CAFiYyc3Zx22DOWKso=CSQgESUyjYTyo7VaMZfS_1LyDzw_c4kQ@mail.gmail.com>
In-Reply-To: <CAFiYyc3Zx22DOWKso=CSQgESUyjYTyo7VaMZfS_1LyDzw_c4kQ@mail.gmail.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Fri, 25 Jun 2021 09:58:09 +0200
Message-ID: <CAFiYyc3Td_Jcm+vDhUUA70BiMuy3pmgf1fYT+akTgygYTHTj3A@mail.gmail.com>
Subject: Re: replacing the backwards threader and more
To: Jeff Law <jeffreyalaw@gmail.com>
Cc: Aldy Hernandez <aldyh@redhat.com>, GCC Mailing List <gcc@gcc.gnu.org>, 
 Andrew MacLeod <amacleod@redhat.com>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 25 Jun 2021 07:58:23 -0000

On Fri, Jun 25, 2021 at 9:54 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Jun 24, 2021 at 6:14 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
> >
> >
> >
> > On 6/21/2021 8:40 AM, Aldy Hernandez wrote:
> > >
> > >
> > > On 6/9/21 2:09 PM, Richard Biener wrote:
> > >> On Wed, Jun 9, 2021 at 1:50 PM Aldy Hernandez via Gcc
> > >> <gcc@gcc.gnu.org> wrote:
> > >>>
> > >>> Hi Jeff.  Hi folks.
> > >>>
> > >>> What started as a foray into severing the old (forward) threader's
> > >>> dependency on evrp, turned into a rewrite of the backwards threader
> > >>> code.  I'd like to discuss the possibility of replacing the current
> > >>> backwards threader with a new one that gets far more threads and can
> > >>> potentially subsume all threaders in the future.
> > >>>
> > >>> I won't include code here, as it will just detract from the high level
> > >>> discussion.  But if it helps, I could post what I have, which just
> > >>> needs
> > >>> some cleanups and porting to the latest trunk changes Andrew has made.
> > >>>
> > >>> Currently the backwards threader works by traversing DEF chains through
> > >>> PHIs leading to possible paths that start in a constant.  When such a
> > >>> path is found, it is checked to see if it is profitable, and if so, the
> > >>> constant path is threaded.  The current implementation is rather
> > >>> limited
> > >>> since backwards paths must end in a constant.  For example, the
> > >>> backwards threader can't get any of the tests in
> > >>> gcc.dg/tree-ssa/ssa-thread-14.c:
> > >>>
> > >>>     if (a && b)
> > >>>       foo ();
> > >>>     if (!b && c)
> > >>>       bar ();
> > >>>
> > >>> etc.
> > >>>
> > >>> After my refactoring patches to the threading code, it is now possible
> > >>> to drop in an alternate implementation that shares the profitability
> > >>> code (is this path profitable?), the jump registry, and the actual jump
> > >>> threading code.  I have leveraged this to write a ranger-based threader
> > >>> that gets every single thread the current code gets, plus 90-130% more.
> > >>>
> > >>> Here are the details from the branch, which should be very similar to
> > >>> trunk.  I'm presenting the branch numbers because they contain Andrew's
> > >>> upcoming relational query which significantly juices up the results.
> > >>>
> > >>> New threader:
> > >>>            ethread:65043    (+3.06%)
> > >>>            dom:32450      (-13.3%)
> > >>>            backwards threader:72482   (+89.6%)
> > >>>            vrp:40532      (-30.7%)
> > >>>     Total threaded:  210507 (+6.70%)
> > >>>
> > >>> This means that the new code gets 89.6% more jump threading
> > >>> opportunities than the code I want to replace.  In doing so, it reduces
> > >>> the amount of DOM threading opportunities by 13.3% and by 30.7% from
> > >>> the
> > >>> VRP jump threader.  The total  improvement across the jump threading
> > >>> opportunities in the compiler is 6.70%.
> > >>>
> > >>> However, these are pessimistic numbers...
> > >>>
> > >>> I have noticed that some of the threading opportunities that DOM and
> > >>> VRP
> > >>> now get are not because they're smarter, but because they're picking up
> > >>> opportunities that the new code exposes.  I experimented with
> > >>> running an
> > >>> iterative threader, and then seeing what VRP and DOM could actually
> > >>> get.
> > >>>    This is too expensive to do in real life, but it at least shows what
> > >>> the effect of the new code is on DOM/VRP's abilities:
> > >>>
> > >>>     Iterative threader:
> > >>>       ethread:65043    (+3.06%)
> > >>>       dom:31170    (-16.7%)
> > >>>           thread:86717    (+127%)
> > >>>           vrp:33851    (-42.2%)
> > >>>     Total threaded:  216781 (+9.90%)
> > >>>
> > >>> This means that the new code not only gets 127% more cases, but it
> > >>> reduces the DOM and VRP opportunities considerably (16.7% and 42.2%
> > >>> respectively).   The end result is that we have the possibility of
> > >>> getting almost 10% more jump threading opportunities in the entire
> > >>> compilation run.
> > >>
> > >> Yeah, DOM once was iterating ...
> > >>
> > >> You probably have noticed that we have very man (way too many)
> > >> 'thread' passes, often in close succession with each other or
> > >> DOM or VRP.  So in the above numbers I wonder if you can break
> > >> down the numbers individually for the actual passes (in their order)?
> > >
> > > As promised.
> > >
> > > *** LEGACY:
> > > ethread42:61152 30.1369% (61152 threads for 30.1% of total)
> > > thread117:29646 14.6101%
> > > vrp118:62088 30.5982%
> > > thread132:2232 1.09997%
> > > dom133:31116 15.3346%
> > > thread197:1950 0.960998%
> > > dom198:10661 5.25395%
> > > thread200:587 0.289285%
> > > vrp201:3482 1.716%
> > > Total:  202914
> >
> > > The above is from current trunk with my patches applied, defaulting to
> > > legacy mode.  It follows the pass number nomenclature in the
> > > *.statistics files.
> > >
> > > New threader code (This is what I envision current trunk to look with
> > > my patchset):
> > >
> > > *** RANGER:
> > > ethread42:64389 30.2242%
> > > thread117:49449 23.2114%
> > > vrp118:46118 21.6478%
> > > thread132:8153 3.82702%
> > > dom133:27168 12.7527%
> > > thread197:5542 2.60141%
> > > dom198:8191 3.84485%
> > > thread200:1038 0.487237%
> > > vrp201:2990 1.40351%
> > > Total:  213038
> > So this makes me think we should focus on dropping thread197, thread200,
> > & vrp201 and I'd probably focus on vrp201 first since we know we want to
> > get rid of it anyway and that may change the data for thread???.  Then
> > I'd be looking at thread200 and thread197 in that order.  I suspect that
> > at least some of the cases in thread200 and vrp201 are exposed by dom198.
>
> It might be interesting to see replacing vrp201 with evrp and run
> thread200 after
> that or alternatively move dom198 after vrp201 and then replace vrp201 with
> evrp and kill off thread200 completely.  Supposedly we will for now keep the
> forward threading in DOM.  Since we have FRE right before DOM it's
> likely only threading that DOM performs now.  (but of course testcases
> will need adjustment)

So instead of

      NEXT_PASS (pass_fre, false /* may_iterate */);
      /* After late FRE we rewrite no longer addressed locals into SSA
         form if possible.  */
      NEXT_PASS (pass_thread_jumps);
      NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
      NEXT_PASS (pass_strlen);
      NEXT_PASS (pass_thread_jumps);
      NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
      /* Threading can leave many const/copy propagations in the IL.
         Clean them up.  Instead of just copy_prop, we use ccp to
         compute alignment and nonzero bits.  */
      NEXT_PASS (pass_ccp, true /* nonzero_p */);

do

      NEXT_PASS (pass_fre, false /* may_iterate */);
      NEXT_PASS (pass_early_vrp);
      NEXT_PASS (pass_thread_jumps);
      NEXT_PASS (pass_strlen); /// ???
      /* Threading can leave many const/copy propagations in the IL.
         Clean them up.  Instead of just copy_prop, we use ccp to
         compute alignment and nonzero bits.  */
      NEXT_PASS (pass_ccp, true /* nonzero_p */);

where I only wonder about pass_strlen placement and how much it depends on
earlier threading (I'd just run it after FRE?)

> Richard.
>
> >
> > Jeff