From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-487848-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 77348 invoked by alias); 18 Oct 2018 21:05:38 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 77338 invoked by uid 89); 18 Oct 2018 21:05:37 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS autolearn=ham version=3.3.2 spammy=profiles
X-HELO: foss.arm.com
Received: from usa-sjc-mx-foss1.foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 18 Oct 2018 21:05:36 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9CDA2A78;	Thu, 18 Oct 2018 14:05:34 -0700 (PDT)
Received: from localhost (unknown [10.32.99.101])	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CED0F3F71A;	Thu, 18 Oct 2018 14:05:33 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener <rguenther@suse.de>
Mail-Followup-To: Richard Biener <rguenther@suse.de>,gcc-patches@gcc.gnu.org,  stevenb.gcc@gmail.com, richard.sandiford@arm.com
Cc: gcc-patches@gcc.gnu.org,  stevenb.gcc@gmail.com
Subject: Re: [PATCH] Add splay-tree "view" for bitmap
References: <alpine.LSU.2.20.1810181503430.4374@zhemvz.fhfr.qr>	<877eif8ih9.fsf@arm.com>	<alpine.LSU.2.20.1810181612360.4374@zhemvz.fhfr.qr>
Date: Thu, 18 Oct 2018 22:27:00 -0000
In-Reply-To: <alpine.LSU.2.20.1810181612360.4374@zhemvz.fhfr.qr> (Richard	Biener's message of "Thu, 18 Oct 2018 16:24:34 +0200 (CEST)")
Message-ID: <87woqf55k3.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-SW-Source: 2018-10/txt/msg01152.txt.bz2

Richard Biener <rguenther@suse.de> writes:
> On Thu, 18 Oct 2018, Richard Sandiford wrote:
>
>> Richard Biener <rguenther@suse.de> writes:
>> > PR63155 made me pick up this old work from Steven, it turns our
>> > linked-list implementation to a two-mode one with one being a
>> > splay tree featuring O(log N) complexity for find/remove.
>> >
>> > Over Stevens original patch I added a bitmap_tree_to_vec helper
>> > that I use from the debug/print methods to avoid changing view
>> > there.  In theory the bitmap iterator could get a "stack"
>> > as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.
>> >
>> > This can be used to fix the two biggest bottlenecks in the PRs
>> > testcase, namely SSA propagator worklist handling and out-of-SSA
>> > coalesce list building.  perf shows the following data, first
>> > unpatched, second patched - also watch the thrid coulumn (samples)
>> > when comparing percentages.
>> >
>> > -O0
>> > -   18.19%    17.35%           407  cc1      cc1               [.] bit=
map_set_b=E2=96=92
>> >    - bitmap_set_bit                                                   =
         =E2=96=92
>> >       + 8.77% create_coalesce_list_for_region                         =
         =E2=96=92
>> >       + 4.21% calculate_live_ranges                                   =
         =E2=96=92
>> >       + 2.02% build_ssa_conflict_graph                                =
         =E2=96=92
>> >       + 1.66% insert_phi_nodes_for                                    =
         =E2=96=92
>> >       + 0.86% coalesce_ssa_name=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20
>> > patched:
>> > -   12.39%    10.48%           129  cc1      cc1               [.] bit=
map_set_b=E2=96=92
>> >    - bitmap_set_bit                                                   =
         =E2=96=92
>> >       + 5.27% calculate_live_ranges                                   =
         =E2=96=92
>> >       + 2.76% insert_phi_nodes_for                                    =
         =E2=96=92
>> >       + 1.90% create_coalesce_list_for_region                         =
         =E2=96=92
>> >       + 1.63% build_ssa_conflict_graph                                =
         =E2=96=92
>> >       + 0.35% coalesce_ssa_name=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20
>> >
>> > -O1
>> > -   17.53%    17.53%           842  cc1      cc1               [.] bit=
map_set_b=E2=96=92
>> >    - bitmap_set_bit                                                   =
         =E2=96=92
>> >       + 12.39% add_ssa_edge                                           =
         =E2=96=92
>> >       + 1.48% create_coalesce_list_for_region                         =
         =E2=96=92
>> >       + 0.82% solve_constraints                                       =
         =E2=96=92
>> >       + 0.71% calculate_live_ranges                                   =
         =E2=96=92
>> >       + 0.64% add_implicit_graph_edge                                 =
         =E2=96=92
>> >       + 0.41% insert_phi_nodes_for                                    =
         =E2=96=92
>> >       + 0.34% build_ssa_conflict_graph=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20
>> > patched:
>> > -    5.79%     5.00%           167  cc1      cc1               [.] bit=
map_set_b=E2=96=92
>> >    - bitmap_set_bit                                                   =
         =E2=96=92
>> >       + 1.41% add_ssa_edge                                            =
         =E2=96=92
>> >       + 0.88% calculate_live_ranges                                   =
         =E2=96=92
>> >       + 0.75% add_implicit_graph_edge                                 =
         =E2=96=92
>> >       + 0.68% solve_constraints                                       =
         =E2=96=92
>> >       + 0.48% insert_phi_nodes_for                                    =
         =E2=96=92
>> >       + 0.45% build_ssa_conflict_graph=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20
>> >
>> > -O3
>> > -   12.37%    12.34%          1145  cc1      cc1               [.] bit=
map_set_b=E2=96=92
>> >    - bitmap_set_bit                                                   =
         =E2=96=92
>> >       + 9.14% add_ssa_edge                                            =
         =E2=96=92
>> >       + 0.80% create_coalesce_list_for_region                         =
         =E2=96=92
>> >       + 0.69% add_implicit_graph_edge                                 =
         =E2=96=92
>> >       + 0.54% solve_constraints                                       =
         =E2=96=92
>> >       + 0.34% calculate_live_ranges                                   =
         =E2=96=92
>> >       + 0.27% insert_phi_nodes_for                                    =
         =E2=96=92
>> >       + 0.21% build_ssa_conflict_graph=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20
>> > -    4.36%     3.86%           227  cc1      cc1               [.] bit=
map_set_b=E2=96=92
>> >    - bitmap_set_bit                                                   =
         =E2=96=92
>> >       + 0.98% add_ssa_edge                                            =
         =E2=96=92
>> >       + 0.86% add_implicit_graph_edge                                 =
         =E2=96=92
>> >       + 0.64% solve_constraints                                       =
         =E2=96=92
>> >       + 0.57% calculate_live_ranges                                   =
         =E2=96=92
>> >       + 0.32% build_ssa_conflict_graph                                =
         =E2=96=92
>> >       + 0.29% mark_all_vars_used_1                                    =
         =E2=96=92
>> >       + 0.20% insert_phi_nodes_for                                    =
         =E2=96=92
>> >       + 0.16% create_coalesce_list_for_region=20=20=20=20=20=20=20=20=
=20=20=20=20=20
>> >
>> >
>> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>>=20
>> What's the performance like for more reasonable testcases?  E.g. is there
>> a measurable change either way in --enable-checking=3Drelease for some g=
cc
>> .iis at -g or -O2 -g?
>
> I did a quick check on my set of cc1files (still .i, .ii ones tend to
> be unusable quite quickly...).  Most of them compile too quickly
> to make any difference appear other than noise.  Multi-second differences
> like for PR63155 should be the exception but our O(n) bitmap
> implementation really makes some parts of GCC quadratic where it
> doesn't appear so.
>
> Is there a reason you expect it to be ever slower?

During recent stage3s I've tried to look at profiles of cc1plus
to see whether there was something easy we could do to improve
compile times.  And bitmap operations always showed up near the
top of the profile.  There were always some pathological queries
in which the linear search really hurt, but whenever I tried "simple"
ways to avoid the obvious cases, they made those queries quicker
but slowed things down overall.  It seemed that adding any extra logic
to the queries hurted because even a small slowdown in common lookups
overwhelmed a big saving in less common lookups.

But there again I was looking to speed up more "normal" cases, not ones
like the PR.

FWIW I've tried it on a local x86_64 box and it slows down current
optabs.ii at -O2 -g by ~0.35% (reproducable).  I see similar slowdowns
for the other files I tried.  But that's hardly a huge amount, and
probably a price worth paying for the speed-up in the PR.

Thanks,
Richard