From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 77348 invoked by alias); 18 Oct 2018 21:05:38 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 77338 invoked by uid 89); 18 Oct 2018 21:05:37 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS autolearn=ham version=3.3.2 spammy=profiles X-HELO: foss.arm.com Received: from usa-sjc-mx-foss1.foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 18 Oct 2018 21:05:36 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9CDA2A78; Thu, 18 Oct 2018 14:05:34 -0700 (PDT) Received: from localhost (unknown [10.32.99.101]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CED0F3F71A; Thu, 18 Oct 2018 14:05:33 -0700 (PDT) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener ,gcc-patches@gcc.gnu.org, stevenb.gcc@gmail.com, richard.sandiford@arm.com Cc: gcc-patches@gcc.gnu.org, stevenb.gcc@gmail.com Subject: Re: [PATCH] Add splay-tree "view" for bitmap References: <877eif8ih9.fsf@arm.com> Date: Thu, 18 Oct 2018 22:27:00 -0000 In-Reply-To: (Richard Biener's message of "Thu, 18 Oct 2018 16:24:34 +0200 (CEST)") Message-ID: <87woqf55k3.fsf@arm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-SW-Source: 2018-10/txt/msg01152.txt.bz2 Richard Biener writes: > On Thu, 18 Oct 2018, Richard Sandiford wrote: > >> Richard Biener writes: >> > PR63155 made me pick up this old work from Steven, it turns our >> > linked-list implementation to a two-mode one with one being a >> > splay tree featuring O(log N) complexity for find/remove. >> > >> > Over Stevens original patch I added a bitmap_tree_to_vec helper >> > that I use from the debug/print methods to avoid changing view >> > there. In theory the bitmap iterator could get a "stack" >> > as well and we could at least support EXECUTE_IF_SET_IN_BITMAP. >> > >> > This can be used to fix the two biggest bottlenecks in the PRs >> > testcase, namely SSA propagator worklist handling and out-of-SSA >> > coalesce list building. perf shows the following data, first >> > unpatched, second patched - also watch the thrid coulumn (samples) >> > when comparing percentages. >> > >> > -O0 >> > - 18.19% 17.35% 407 cc1 cc1 [.] bit= map_set_b=E2=96=92 >> > - bitmap_set_bit = =E2=96=92 >> > + 8.77% create_coalesce_list_for_region = =E2=96=92 >> > + 4.21% calculate_live_ranges = =E2=96=92 >> > + 2.02% build_ssa_conflict_graph = =E2=96=92 >> > + 1.66% insert_phi_nodes_for = =E2=96=92 >> > + 0.86% coalesce_ssa_name=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20 >> > patched: >> > - 12.39% 10.48% 129 cc1 cc1 [.] bit= map_set_b=E2=96=92 >> > - bitmap_set_bit = =E2=96=92 >> > + 5.27% calculate_live_ranges = =E2=96=92 >> > + 2.76% insert_phi_nodes_for = =E2=96=92 >> > + 1.90% create_coalesce_list_for_region = =E2=96=92 >> > + 1.63% build_ssa_conflict_graph = =E2=96=92 >> > + 0.35% coalesce_ssa_name=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20 >> > >> > -O1 >> > - 17.53% 17.53% 842 cc1 cc1 [.] bit= map_set_b=E2=96=92 >> > - bitmap_set_bit = =E2=96=92 >> > + 12.39% add_ssa_edge = =E2=96=92 >> > + 1.48% create_coalesce_list_for_region = =E2=96=92 >> > + 0.82% solve_constraints = =E2=96=92 >> > + 0.71% calculate_live_ranges = =E2=96=92 >> > + 0.64% add_implicit_graph_edge = =E2=96=92 >> > + 0.41% insert_phi_nodes_for = =E2=96=92 >> > + 0.34% build_ssa_conflict_graph=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 >> > patched: >> > - 5.79% 5.00% 167 cc1 cc1 [.] bit= map_set_b=E2=96=92 >> > - bitmap_set_bit = =E2=96=92 >> > + 1.41% add_ssa_edge = =E2=96=92 >> > + 0.88% calculate_live_ranges = =E2=96=92 >> > + 0.75% add_implicit_graph_edge = =E2=96=92 >> > + 0.68% solve_constraints = =E2=96=92 >> > + 0.48% insert_phi_nodes_for = =E2=96=92 >> > + 0.45% build_ssa_conflict_graph=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20 >> > >> > -O3 >> > - 12.37% 12.34% 1145 cc1 cc1 [.] bit= map_set_b=E2=96=92 >> > - bitmap_set_bit = =E2=96=92 >> > + 9.14% add_ssa_edge = =E2=96=92 >> > + 0.80% create_coalesce_list_for_region = =E2=96=92 >> > + 0.69% add_implicit_graph_edge = =E2=96=92 >> > + 0.54% solve_constraints = =E2=96=92 >> > + 0.34% calculate_live_ranges = =E2=96=92 >> > + 0.27% insert_phi_nodes_for = =E2=96=92 >> > + 0.21% build_ssa_conflict_graph=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20 >> > - 4.36% 3.86% 227 cc1 cc1 [.] bit= map_set_b=E2=96=92 >> > - bitmap_set_bit = =E2=96=92 >> > + 0.98% add_ssa_edge = =E2=96=92 >> > + 0.86% add_implicit_graph_edge = =E2=96=92 >> > + 0.64% solve_constraints = =E2=96=92 >> > + 0.57% calculate_live_ranges = =E2=96=92 >> > + 0.32% build_ssa_conflict_graph = =E2=96=92 >> > + 0.29% mark_all_vars_used_1 = =E2=96=92 >> > + 0.20% insert_phi_nodes_for = =E2=96=92 >> > + 0.16% create_coalesce_list_for_region=20=20=20=20=20=20=20=20= =20=20=20=20=20 >> > >> > >> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. >>=20 >> What's the performance like for more reasonable testcases? E.g. is there >> a measurable change either way in --enable-checking=3Drelease for some g= cc >> .iis at -g or -O2 -g? > > I did a quick check on my set of cc1files (still .i, .ii ones tend to > be unusable quite quickly...). Most of them compile too quickly > to make any difference appear other than noise. Multi-second differences > like for PR63155 should be the exception but our O(n) bitmap > implementation really makes some parts of GCC quadratic where it > doesn't appear so. > > Is there a reason you expect it to be ever slower? During recent stage3s I've tried to look at profiles of cc1plus to see whether there was something easy we could do to improve compile times. And bitmap operations always showed up near the top of the profile. There were always some pathological queries in which the linear search really hurt, but whenever I tried "simple" ways to avoid the obvious cases, they made those queries quicker but slowed things down overall. It seemed that adding any extra logic to the queries hurted because even a small slowdown in common lookups overwhelmed a big saving in less common lookups. But there again I was looking to speed up more "normal" cases, not ones like the PR. FWIW I've tried it on a local x86_64 box and it slows down current optabs.ii at -O2 -g by ~0.35% (reproducable). I see similar slowdowns for the other files I tried. But that's hardly a huge amount, and probably a price worth paying for the speed-up in the PR. Thanks, Richard