From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.guenther@gmail.com>
Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com
 [IPv6:2a00:1450:4864:20::432])
 by sourceware.org (Postfix) with ESMTPS id E9238398EC08
 for <gcc@gcc.gnu.org>; Wed,  9 Jun 2021 16:18:29 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E9238398EC08
Received: by mail-wr1-x432.google.com with SMTP id q5so26171663wrm.1
 for <gcc@gcc.gnu.org>; Wed, 09 Jun 2021 09:18:29 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:user-agent:in-reply-to:references
 :mime-version:content-transfer-encoding:subject:to:cc:from
 :message-id;
 bh=pG6OtV7zzO46ovhow4zvFXbCe3goU2kV8LmtQeyemoo=;
 b=oxvTMeauahnM1hR8WAdofrFyQFATWoSq+As7wEcrL6Z0dwreeiHpJt7ByExLWQaTNS
 9Y2EHbAcuZZfdFFh7W3nqLhWhLFnxMirq+FKgLDE60QoufaZbC2lESNScJUR7dxiUI66
 SCBhoqA0sM+SXV0/Qh0aWu/vXaO43PoMSOHrSKFQRHXxK1vzQTeF2bdO9JNXVL5Zk9EW
 9laG3H3CdtCC7zzL/+DLk4hhYEKFwwYBqtO4hLmSWqzdlrbRK/xo17ErahfxN3IY/LSN
 C6yS+m+guGs7dPIzeBm4FcdCFvdxL4Dh9Phf11ucrTBqFyJ1pZhQ566HHstLrsWW+S3k
 xAAQ==
X-Gm-Message-State: AOAM533FMp31736laQIKyc19FEAryzFD37OYeV1N+KcoI66qHfJrAclv
 kaVPM3OTS167cxY343qLppM=
X-Google-Smtp-Source: ABdhPJwcbAG94VQ1zPIcBr+7OTyf4l4+QMqRNtFyPWlWZ35S1Li6PNT2vaT28yxH8KQh0jWbNTL2Cg==
X-Received: by 2002:adf:e110:: with SMTP id t16mr601368wrz.359.1623255508833; 
 Wed, 09 Jun 2021 09:18:28 -0700 (PDT)
Received: from [192.168.178.32]
 (dynamic-095-115-086-022.95.115.pool.telefonica.de. [95.115.86.22])
 by smtp.gmail.com with ESMTPSA id 31sm456675wrc.96.2021.06.09.09.18.28
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 09 Jun 2021 09:18:28 -0700 (PDT)
Date: Wed, 09 Jun 2021 18:18:26 +0200
User-Agent: K-9 Mail for Android
In-Reply-To: <e24871cd-5455-ca71-02cf-26d54066f7bb@redhat.com>
References: <07775b9d-b8eb-48cb-57ef-9cc278d38967@redhat.com>
 <CAFiYyc2OgPLMyvoGk=XQY0xJvLDG-dPCV+6dQj-WuRVDLocuKA@mail.gmail.com>
 <e24871cd-5455-ca71-02cf-26d54066f7bb@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain;
 charset=utf-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: replacing the backwards threader and more
To: Aldy Hernandez <aldyh@redhat.com>
CC: Jeff Law <jeffreyalaw@gmail.com>,GCC Mailing List <gcc@gcc.gnu.org>
From: Richard Biener <richard.guenther@gmail.com>
Message-ID: <7A72F0E5-24DF-4DB1-9309-791870E69FC3@gmail.com>
X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Jun 2021 16:18:32 -0000

On June 9, 2021 5:34:03 PM GMT+02:00, Aldy Hernandez <aldyh@redhat=2Ecom> w=
rote:
>
>
>On 6/9/21 2:09 PM, Richard Biener wrote:
>> On Wed, Jun 9, 2021 at 1:50 PM Aldy Hernandez via Gcc
><gcc@gcc=2Egnu=2Eorg> wrote:
>>>
>>> Hi Jeff=2E  Hi folks=2E
>>>
>>> What started as a foray into severing the old (forward) threader's
>>> dependency on evrp, turned into a rewrite of the backwards threader
>>> code=2E  I'd like to discuss the possibility of replacing the current
>>> backwards threader with a new one that gets far more threads and can
>>> potentially subsume all threaders in the future=2E
>>>
>>> I won't include code here, as it will just detract from the high
>level
>>> discussion=2E  But if it helps, I could post what I have, which just
>needs
>>> some cleanups and porting to the latest trunk changes Andrew has
>made=2E
>>>
>>> Currently the backwards threader works by traversing DEF chains
>through
>>> PHIs leading to possible paths that start in a constant=2E  When such
>a
>>> path is found, it is checked to see if it is profitable, and if so,
>the
>>> constant path is threaded=2E  The current implementation is rather
>limited
>>> since backwards paths must end in a constant=2E  For example, the
>>> backwards threader can't get any of the tests in
>>> gcc=2Edg/tree-ssa/ssa-thread-14=2Ec:
>>>
>>>     if (a && b)
>>>       foo ();
>>>     if (!b && c)
>>>       bar ();
>>>
>>> etc=2E
>>>
>>> After my refactoring patches to the threading code, it is now
>possible
>>> to drop in an alternate implementation that shares the profitability
>>> code (is this path profitable?), the jump registry, and the actual
>jump
>>> threading code=2E  I have leveraged this to write a ranger-based
>threader
>>> that gets every single thread the current code gets, plus 90-130%
>more=2E
>>>
>>> Here are the details from the branch, which should be very similar
>to
>>> trunk=2E  I'm presenting the branch numbers because they contain
>Andrew's
>>> upcoming relational query which significantly juices up the results=2E
>>>
>>> New threader:
>>>            ethread:65043    (+3=2E06%)
>>>            dom:32450      (-13=2E3%)
>>>            backwards threader:72482   (+89=2E6%)
>>>            vrp:40532      (-30=2E7%)
>>>     Total threaded:  210507 (+6=2E70%)
>>>
>>> This means that the new code gets 89=2E6% more jump threading
>>> opportunities than the code I want to replace=2E  In doing so, it
>reduces
>>> the amount of DOM threading opportunities by 13=2E3% and by 30=2E7% fr=
om
>the
>>> VRP jump threader=2E  The total  improvement across the jump threading
>>> opportunities in the compiler is 6=2E70%=2E
>>>
>>> However, these are pessimistic numbers=2E=2E=2E
>>>
>>> I have noticed that some of the threading opportunities that DOM and
>VRP
>>> now get are not because they're smarter, but because they're picking
>up
>>> opportunities that the new code exposes=2E  I experimented with
>running an
>>> iterative threader, and then seeing what VRP and DOM could actually
>get=2E
>>>    This is too expensive to do in real life, but it at least shows
>what
>>> the effect of the new code is on DOM/VRP's abilities:
>>>
>>>     Iterative threader:
>>>       ethread:65043    (+3=2E06%)
>>>       dom:31170    (-16=2E7%)
>>>           thread:86717    (+127%)
>>>           vrp:33851    (-42=2E2%)
>>>     Total threaded:  216781 (+9=2E90%)
>>>
>>> This means that the new code not only gets 127% more cases, but it
>>> reduces the DOM and VRP opportunities considerably (16=2E7% and 42=2E2=
%
>>> respectively)=2E   The end result is that we have the possibility of
>>> getting almost 10% more jump threading opportunities in the entire
>>> compilation run=2E
>>=20
>> Yeah, DOM once was iterating =2E=2E=2E
>>=20
>> You probably have noticed that we have very man (way too many)
>> 'thread' passes, often in close succession with each other or
>> DOM or VRP=2E  So in the above numbers I wonder if you can break
>> down the numbers individually for the actual passes (in their order)?
>
>Sure, I can do that=2E  Let me whip up the old branch and gather some
>info=2E
>
>>=20
>>> (Note that the new code gets even more opportunities, but I'm only
>>> reporting the profitable ones that made it all the way through to
>the
>>> threader backend, and actually eliminated a branch=2E)
>>>
>>> The overall compilation hit from this work is currently 1=2E38% as
>>> measured by callgrind=2E  We should be able to reduce this a bit, plus
>we
>>> could get some of that back if we can replace the DOM and VRP
>threaders
>>> (future work)=2E
>>>
>>> My proposed implementation should be able to get any threading
>>> opportunity, and will get more as range-ops and ranger improve=2E
>>>
>>> I can go into the details if necessary, but the gist of it is that
>we
>>> leverage the import facility in the ranger to only look up paths
>that
>>> have a direct repercussion in the conditional being threaded, thus
>>> reducing the search space=2E  This enhanced path discovery, plus an
>engine
>>> to resolve conditionals based on knowledge from a CFG path, is all
>that
>>> is needed to register new paths=2E  There is no limit to how far back
>we
>>> look, though in practice, we stop looking once a path is too
>expensive
>>> to continue the search in a given direction=2E
>>>
>>> The solver API is simple:
>>>
>>> // This class is a thread path solver=2E  Given a set of BBs
>indicating
>>> // a path through the CFG, range_in_path() will return the range
>>> // of an SSA as if the BBs in the path would have been executed in
>>> // order=2E
>>> //
>>> // Note that the blocks are in reverse order, thus the exit block is
>>> path[0]=2E
>>>
>>> class thread_solver : gori_compute
>>> {
>>>
>>> public:
>>>     thread_solver (gimple_ranger &ranger);
>>>     virtual ~thread_solver ();
>>>     void set_path (const vec<basic_block> *, const bitmap_head
>*imports);
>>>     void range_in_path (irange &, tree name);
>>>     void range_in_path (irange &, gimple *);
>>> =2E=2E=2E
>>> };
>>>
>>> Basically, as we're discovering paths, we ask the solver what the
>value
>>> of the final conditional in a BB is in a given path=2E  If it
>resolves, we
>>> register the path=2E
>>>
>>> A follow-up project would be to analyze what DOM/VRP are actually
>>> getting that we don't, because in theory with an enhanced ranger, we
>>> should be able to get everything they do (minus some float stuff,
>and
>>> some CSE things DOM does)=2E  However, IMO, this is good enough to at
>>> least replace the current backwards threading code=2E
>>>
>>> My suggestion would be to keep both implementations, defaulting to
>the
>>> ranger based, and running the old code immediately after-- trapping
>if
>>> it can find any threading opportunities=2E
>>=20
>> But due to iteration uncovering new opportunities this will
>inevitably
>> break, no?
>
>No, actually because I'm comparing current and new backwards threader=20
>behavior on the same IL, something I can't do with VRP because the IL
>is=20
>slightly different (VRP asserts)=2E
>
>So I can compare apples to apples in the backwards threader code=2E  What
>
>I do is run the new code first, and then run the old code on the same
>IL=20
>before the threader has altered the CFG, while asserting that the old=20
>code cannot register any "new" paths not already present in the
>registry=2E
>
>I have tested this assertion throughout 200+ =2Eii files from a
>bootstrap,=20
>and there has never been a case where the old code can get something we
>
>can't=2E

Ah, that's nice!=20

>>=20
>>> After a few weeks, we could
>>> kill the old code=2E
>>=20
>> Note that for analyzing threadings done apart from looking at overall
>> numbers the statistics infrastructure can be useful, likewise could
>> be the opt-info one where you can diff stats based on file or
>function
>> (statistics) or even location of a participating jump (opt-info)=2E
>>=20
>> If you are re-using tree-ssa-thread{edge,update}=2E{c,h} anyway you
>> probably only have to amend one or two places=2E  I'm personally
>> breaking things down to file/function via statistics to spot gross
>> differences more locally=2E
>
>I am indeed re-using all of tree-ssa-thread{edge,update}=2E  That is=20
>unchanged=2E  I've been using the overall statistics plus an awk script
>to=20
>collate it all=2E  But thanks for the opt-info tip=2E  I didn't know abou=
t
>that=2E
>
>>=20
>> IMHO removing threading from VRP (as a step to make "VRP"
>> into another EVRP run) should be part of the initial transition,
>> there's always a thread pass nearby=2E  Performing threading from
>> EVRP itself might be another option to evaluate=2E  Trimming down
>> the number of (now backwards-)threaders would be another goal=2E
>
>Agreed=2E  I will have to sit down and investigate what VRP is getting=2E=
=20
>Last I peeked it was either stuff range-ops could be taught, or=20
>unconditional jumps to unconditional jumps that could be easily
>handled=2E=20
>However, I think we can prove that the current backwards threader code=20
>cannot get anything in the presence of the new code, and could be
>easily=20
>replaced before concentrating on DOM/VRP=2E
>
>That being said, figuring out exactly the discrepancies with DOM/VRP is
>
>on my short-term radar ;-)=2E
>
>Aldy