From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13137 invoked by alias); 11 Oct 2014 08:19:52 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 13122 invoked by uid 89); 11 Oct 2014 08:19:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: nikam.ms.mff.cuni.cz Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Sat, 11 Oct 2014 08:19:48 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 53D2D543E51; Sat, 11 Oct 2014 10:19:44 +0200 (CEST) Date: Sat, 11 Oct 2014 08:23:00 -0000 From: Jan Hubicka To: Martin =?iso-8859-2?Q?Li=B9ka?= Cc: gcc-patches@gcc.gnu.org, "hubicka >> Jan Hubicka" Subject: Re: [PATCH 3/5] IPA ICF pass Message-ID: <20141011081944.GD5172@kam.mff.cuni.cz> References: <20140705225351.GK16837@kam.mff.cuni.cz> <53C7E626.8080400@suse.cz> <54255A09.1090305@suse.cz> <20140926144441.GA4266@x4> <20140926232713.GC7334@kam.mff.cuni.cz> <20140927055921.GA299@x4> <5426940B.2060300@suse.cz> <20140928022057.GB21582@atrey.karlin.mff.cuni.cz> <54387267.3030106@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54387267.3030106@suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) X-SW-Source: 2014-10/txt/msg01045.txt.bz2 > > After few days of measurement and tuning, I was able to get numbers to the following shape: > Execution times (seconds) > phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 1412 kB ( 0%) ggc > phase opt and generate : 27.83 (59%) usr 0.66 (19%) sys 28.52 (37%) wall 1028813 kB (24%) ggc > phase stream in : 16.90 (36%) usr 0.63 (18%) sys 17.60 (23%) wall 3246453 kB (76%) ggc > phase stream out : 2.76 ( 6%) usr 2.19 (63%) sys 31.34 (40%) wall 2 kB ( 0%) ggc > callgraph optimization : 0.36 ( 1%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall 40 kB ( 0%) ggc > ipa dead code removal : 3.31 ( 7%) usr 0.01 ( 0%) sys 3.25 ( 4%) wall 0 kB ( 0%) ggc > ipa virtual call target : 3.69 ( 8%) usr 0.03 ( 1%) sys 3.80 ( 5%) wall 21 kB ( 0%) ggc > ipa devirtualization : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 13704 kB ( 0%) ggc > ipa cp : 1.11 ( 2%) usr 0.07 ( 2%) sys 1.17 ( 2%) wall 188558 kB ( 4%) ggc > ipa inlining heuristics : 8.17 (17%) usr 0.14 ( 4%) sys 8.27 (11%) wall 494738 kB (12%) ggc > ipa comdats : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc > ipa lto gimple in : 1.86 ( 4%) usr 0.40 (11%) sys 2.20 ( 3%) wall 537970 kB (13%) ggc > ipa lto gimple out : 0.19 ( 0%) usr 0.08 ( 2%) sys 0.27 ( 0%) wall 2 kB ( 0%) ggc > ipa lto decl in : 12.20 (26%) usr 0.37 (11%) sys 12.64 (16%) wall 2441687 kB (57%) ggc > ipa lto decl out : 2.51 ( 5%) usr 0.21 ( 6%) sys 2.71 ( 3%) wall 0 kB ( 0%) ggc > ipa lto constructors in : 0.13 ( 0%) usr 0.02 ( 1%) sys 0.17 ( 0%) wall 15692 kB ( 0%) ggc > ipa lto constructors out: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc > ipa lto cgraph I/O : 0.54 ( 1%) usr 0.09 ( 3%) sys 0.63 ( 1%) wall 407182 kB (10%) ggc > ipa lto decl merge : 1.34 ( 3%) usr 0.00 ( 0%) sys 1.34 ( 2%) wall 8220 kB ( 0%) ggc > ipa lto cgraph merge : 1.00 ( 2%) usr 0.00 ( 0%) sys 1.00 ( 1%) wall 14605 kB ( 0%) ggc > whopr wpa : 0.92 ( 2%) usr 0.00 ( 0%) sys 0.89 ( 1%) wall 1 kB ( 0%) ggc > whopr wpa I/O : 0.01 ( 0%) usr 1.90 (55%) sys 28.31 (37%) wall 0 kB ( 0%) ggc > whopr partitioning : 2.81 ( 6%) usr 0.01 ( 0%) sys 2.83 ( 4%) wall 4943 kB ( 0%) ggc > ipa reference : 1.34 ( 3%) usr 0.00 ( 0%) sys 1.35 ( 2%) wall 0 kB ( 0%) ggc > ipa profile : 0.20 ( 0%) usr 0.01 ( 0%) sys 0.21 ( 0%) wall 0 kB ( 0%) ggc > ipa pure const : 1.62 ( 3%) usr 0.00 ( 0%) sys 1.63 ( 2%) wall 0 kB ( 0%) ggc > ipa icf : 2.65 ( 6%) usr 0.02 ( 1%) sys 2.68 ( 3%) wall 1352 kB ( 0%) ggc > inline parameters : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc > tree SSA rewrite : 0.11 ( 0%) usr 0.01 ( 0%) sys 0.08 ( 0%) wall 18919 kB ( 0%) ggc > tree SSA other : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc > tree SSA incremental : 0.24 ( 1%) usr 0.01 ( 0%) sys 0.32 ( 0%) wall 11325 kB ( 0%) ggc > tree operand scan : 0.15 ( 0%) usr 0.02 ( 1%) sys 0.18 ( 0%) wall 116283 kB ( 3%) ggc > dominance frontiers : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc > dominance computation : 0.13 ( 0%) usr 0.01 ( 0%) sys 0.16 ( 0%) wall 0 kB ( 0%) ggc > varconst : 0.01 ( 0%) usr 0.02 ( 1%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc > loop fini : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc > unaccounted todo : 0.55 ( 1%) usr 0.00 ( 0%) sys 0.56 ( 1%) wall 0 kB ( 0%) ggc > TOTAL : 47.49 3.48 77.46 4276682 kB > > and I was able to reduce function bodies loaded in WPA to 35% (from previous 55%). The main problem 35% means that 35% of all function bodies are compared with something else? That feels pretty high. but overall numbers are not so terrible. > with speed was hidden in work list for congruence classes, where hash_set was used. I chose the data > structure to support delete operation, but it was really slow. Thus, hash_set was replaced with linked list > and a flag is used to identify if a set is removed or not. Interesting, I would not expect bottleneck in a congruence solving :) > > I have no clue who complicated can it be to implement release_body function to an operation that > really releases the memory? I suppose one can keep the caches from streamer and free trees read. Freeing gimple statemnts, cfg should be relatively easy. Lets however first try to tune the implementation rather than try to this hack implemented. Explicit ggc_free calls traditionally tended to cause some negative reactions wrt memory fragmentation concerns. > > Markus' problem with -fprofile-use has been removed, IPA-ICF is preceding devirtualization pass. I hope it is fine? Yes, I think devirtualization should actually work better with identical virutal methods merged. We just need to be sure it sees through the newly introduced aliases (there should be no thunks for virutal methods) Honza