From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32751 invoked by alias); 4 Apr 2004 13:57:49 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 32742 invoked from network); 4 Apr 2004 13:57:47 -0000 Received: from unknown (HELO mailout08.sul.t-online.com) (194.25.134.20) by sources.redhat.com with SMTP; 4 Apr 2004 13:57:47 -0000 Received: from fwd01.aul.t-online.de by mailout08.sul.t-online.com with smtp id 1BA888-00080s-01; Sun, 04 Apr 2004 15:57:44 +0200 Received: from kolme (SUCrXYZvYeQuIXjHAyUfKLG1zmSYgwfYsABkEnPgRd-xKEzdGatWc4@[80.138.171.115]) by fmrl01.sul.t-online.com with esmtp id 1BA883-0FhiBE0; Sun, 4 Apr 2004 15:57:39 +0200 Received: from goofy.hamnixda.de ([192.168.100.249] helo=tat.physik.uni-tuebingen.de) by kolme with esmtp (Exim 3.35 #1 (Debian)) id 1BA880-0000Xa-00; Sun, 04 Apr 2004 15:57:36 +0200 Message-ID: <4070144F.2070402@tat.physik.uni-tuebingen.de> Date: Sun, 04 Apr 2004 13:57:00 -0000 From: Richard Guenther User-Agent: Mozilla Thunderbird 0.5 (X11/20040313) MIME-Version: 1.0 To: Andrew Pinski CC: "gcc@gcc.gnu.org" Subject: Re: 3.4 / 3.5 / tree-ssa comparisons References: <406F236A.70901@tat.physik.uni-tuebingen.de> <1382219C-85B1-11D8-BD72-000393A6D2F2@physics.uc.edu> In-Reply-To: <1382219C-85B1-11D8-BD72-000393A6D2F2@physics.uc.edu> X-Enigmail-Version: 0.83.2.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Seen: false X-ID: SUCrXYZvYeQuIXjHAyUfKLG1zmSYgwfYsABkEnPgRd-xKEzdGatWc4@t-dialin.net X-SW-Source: 2004-04/txt/msg00189.txt.bz2 Andrew Pinski wrote: > > On Apr 3, 2004, at 15:49, Richard Guenther wrote: > >> The automated tester at >> http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/monitor- summary.html >> completed its first 3.5 build. I never checked 3.5, and so I'm >> surprised on the numbers it got: >> >> bootstrap time (52min) is inbetween 3.4 (50min) and tree-ssa (62min), >> build times for the tramp3d-v3 test, too(!), I did expect them to >> improve compared to 3.4, not already regress again..., they are now >> 2.43min vs. 2.28min (3.4) and 2.75min (tree-ssa). Also performance >> of the resulting binary is better(!) for 3.5 (6.9s/it) than for >> tree-ssa (7.68s/it) and of course 3.4 is slowest (8.85s/it). This >> means we'll regress in both compile and runtime if merging tree-ssa >> now, but we won't have a runtime regression towards 3.4 then, only a >> compile time performance regression. >> >> The obvious question is, why is 3.5 so much better than 3.4? And of >> course, why is tree-ssa not better than 3.5 for C++ expression >> template numeric code? > > > You could check the tree-ssa with my patch at > , > it should give both a runtime improvement and a compile time improvement. Numbers with this patch applied are 62min bootstrap time, TOTAL : 151.44 3.21 154.66 before vs. TOTAL : 155.70 3.18 158.89 after applying patch build time. Runtime is 7.73s/it compared to 7.64s/it beforer. So it's not helping, but instead pessimizing slightly!? before: tree gimplify : 2.04 ( 1%) usr 0.02 ( 1%) sys 2.06 ( 1%) wall tree eh : 1.33 ( 1%) usr 0.01 ( 0%) sys 1.34 ( 1%) wall tree CFG construction : 0.77 ( 0%) usr 0.02 ( 1%) sys 0.80 ( 1%) wall tree CFG cleanup : 0.96 ( 1%) usr 0.00 ( 0%) sys 1.00 ( 1%) wall tree PTA : 0.34 ( 0%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall tree alias analysis : 0.46 ( 0%) usr 0.00 ( 0%) sys 0.45 ( 0%) wall tree PHI insertion : 1.70 ( 1%) usr 0.03 ( 1%) sys 1.72 ( 1%) wall tree SSA rewrite : 1.53 ( 1%) usr 0.00 ( 0%) sys 1.52 ( 1%) wall tree SSA other : 2.31 ( 1%) usr 0.16 ( 5%) sys 2.54 ( 2%) wall tree operand scan : 2.08 ( 1%) usr 0.25 ( 8%) sys 2.27 ( 1%) wall dominator optimization: 6.37 ( 4%) usr 0.11 ( 3%) sys 6.49 ( 4%) wall tree SRA : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall tree CCP : 0.65 ( 0%) usr 0.00 ( 0%) sys 0.66 ( 0%) wall tree split crit edges : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall tree PRE : 2.21 ( 1%) usr 0.01 ( 0%) sys 2.21 ( 1%) wall tree linearize phis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall tree forward propagate: 0.38 ( 0%) usr 0.00 ( 0%) sys 0.37 ( 0%) wall tree conservative DCE : 1.03 ( 1%) usr 0.00 ( 0%) sys 1.04 ( 1%) wall tree aggressive DCE : 0.46 ( 0%) usr 0.00 ( 0%) sys 0.46 ( 0%) wall tree DSE : 0.91 ( 1%) usr 0.01 ( 0%) sys 0.91 ( 1%) wall tree copy headers : 0.88 ( 1%) usr 0.01 ( 0%) sys 0.88 ( 1%) wall tree SSA to normal : 1.13 ( 1%) usr 0.01 ( 0%) sys 1.16 ( 1%) wall tree rename SSA copies: 0.35 ( 0%) usr 0.01 ( 0%) sys 0.34 ( 0%) wall dominance frontiers : 0.17 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall control dependences : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall expand : 9.44 ( 6%) usr 0.05 ( 2%) sys 9.47 ( 6%) wall after: tree gimplify : 2.03 ( 1%) usr 0.02 ( 1%) sys 2.03 ( 1%) wall tree eh : 1.31 ( 1%) usr 0.01 ( 0%) sys 1.31 ( 1%) wall tree CFG construction : 0.74 ( 0%) usr 0.02 ( 1%) sys 0.76 ( 0%) wall tree CFG cleanup : 0.96 ( 1%) usr 0.00 ( 0%) sys 0.96 ( 1%) wall tree PTA : 0.30 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall tree alias analysis : 0.39 ( 0%) usr 0.01 ( 0%) sys 0.39 ( 0%) wall tree PHI insertion : 1.64 ( 1%) usr 0.05 ( 2%) sys 1.71 ( 1%) wall tree SSA rewrite : 1.47 ( 1%) usr 0.02 ( 0%) sys 1.49 ( 1%) wall tree SSA other : 2.36 ( 2%) usr 0.15 ( 5%) sys 2.48 ( 2%) wall tree operand scan : 2.23 ( 1%) usr 0.25 ( 8%) sys 2.48 ( 2%) wall dominator optimization: 6.44 ( 4%) usr 0.10 ( 3%) sys 6.54 ( 4%) wall tree SRA : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall tree CCP : 0.59 ( 0%) usr 0.01 ( 0%) sys 0.60 ( 0%) wall tree split crit edges : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall tree PRE : 1.96 ( 1%) usr 0.01 ( 0%) sys 1.96 ( 1%) wall tree linearize phis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall tree remove casts : 0.26 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall tree forward propagate: 0.36 ( 0%) usr 0.00 ( 0%) sys 0.36 ( 0%) wall tree conservative DCE : 1.05 ( 1%) usr 0.01 ( 0%) sys 1.06 ( 1%) wall tree aggressive DCE : 0.41 ( 0%) usr 0.00 ( 0%) sys 0.41 ( 0%) wall tree DSE : 0.86 ( 1%) usr 0.01 ( 0%) sys 0.87 ( 1%) wall tree copy headers : 0.82 ( 1%) usr 0.01 ( 0%) sys 0.83 ( 1%) wall tree SSA to normal : 1.04 ( 1%) usr 0.02 ( 1%) sys 1.06 ( 1%) wall tree rename SSA copies: 0.34 ( 0%) usr 0.01 ( 0%) sys 0.35 ( 0%) wall dominance frontiers : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall control dependences : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall expand : 9.26 ( 6%) usr 0.06 ( 2%) sys 9.32 ( 6%) wall So its not a win here. With the suggested -fno-gcse and --param max-cse-path-length=0 I get a compile time of TOTAL : 143.30 2.99 146.30 and runtimes of 7.87s/it. With just -fno-gcse I get TOTAL : 144.75 3.08 147.83 and 7.89s/it, with just --param max-cse-path-length=0 it's TOTAL : 150.02 3.09 153.12 and 7.77s/it. But maybe I'm chasing the wrong effects without enabling leafify as there are no nice loops to optimize then... Richard.