From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 31396 invoked by alias); 31 Aug 2004 13:50:30 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 31385 invoked from network); 31 Aug 2004 13:50:27 -0000 Received: from unknown (HELO Cantor.suse.de) (195.135.220.2) by sourceware.org with SMTP; 31 Aug 2004 13:50:27 -0000 Received: from extimap.suse.de (extimap.suse.de [195.135.220.6]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (No client certificate requested) by Cantor.suse.de (Postfix) with ESMTP id 7403EB497D8; Tue, 31 Aug 2004 15:50:26 +0200 (CEST) Received: from stevenb.home.suse.de (70-90.ipact.nl [82.210.90.70]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (Client did not present a certificate) by extimap.suse.de (Postfix) with ESMTP id A0BAE6C8A3; Tue, 31 Aug 2004 15:50:25 +0200 (CEST) From: Steven Bosscher To: Paolo Bonzini , gcc@gcc.gnu.org Subject: Re: Compile-time and execution-time impact of CSE - some numbers Date: Tue, 31 Aug 2004 14:39:00 -0000 User-Agent: KMail/1.5.4 References: <1093958955.41347d2b7d59f@webmail.polimi.it> In-Reply-To: <1093958955.41347d2b7d59f@webmail.polimi.it> Organization: SUSE Labs MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200408311550.27667.stevenb@suse.de> X-SW-Source: 2004-08/txt/msg01623.txt.bz2 On Tuesday 31 August 2004 15:29, Paolo Bonzini wrote: > Hello, these are the results of a simple attempt at trimming the time > spent in CSE passes. Not very encouraging really, but maybe it can > help more experienced people than me. > > The first thing I tried is to remove CSE1 and move EBB CSE to -O3, > using the attached patch. This also meant that we do not run local CSE > at -O1 anymore. Here are the results of this and other experiments; > all times were taken on a Pentium 4 machine running at 1.7 GHz. > > As for bootstrapping, I have only timed a C-only --disable-checking > bootstrap. Bootstrapping times are very similar but they are not very > representative of the effect of the patch, due to the large time spent > compiling stage2; but compiling stage3 takes 10:22 minutes instead of > 11:00, which is about 6% faster. > > I then timed combine.i files. I ran the compiler five times, took out > the runs with the best and worst overall time, and averaged the other > three (the machine was very lightly loaded and has plenty of memory, > so system time did not matter). Times are in the following table. The > headings are different at -O1 than for other optimization levels, > because CSE and/or GCSE are not run there: > > -O1 | -O2 | -O3 > tot CSE | tot GCSE CSE | tot GCSE CSE > patched 6.74 --- | 10.07 0.43 0.35 | 15.19 1.16 0.90 > HEAD 6.99 0.16 | 10.86 0.48 1.04 | 16.00 1.10 1.59 > improvement 4.1% | 7.3% | 5.0% > > For -O2 I got run-time numbers too, which I took from a CPU-intensive > sed benchmark (I used sed 4.1.1, compiled with IMA including the regex > matcher), doing measurements in the same way as above for both > compilation and the sed benchmark dc.sed. > > The results are in the table that follows and are for several compilers: > > 1) "patched" is as above > > 2) "HEAD, no EBB" is mainline with -fno-cse-skip-blocks > -fno-cse-follow-jumps: the results are even worse. > > 3) "patched+EBB" uses the attached patch but without the hunks that move > -fcse-skip-blocks and -fcse-follow-jumps to -O3, since it looks like CSE > on EBBs is (still :-( ...) doing good, but CSE1 is not. > > 4) "HEAD, no CSE2" is a final try... let's disable CSE2 instead, and run a > full-power CSE1 (no GCSE column in the table since the two GCSE's look > at exactly the same things): this means moving -frerun-cse-after-loop to > -O3 (or using HEAD's compiler with -O2 -fno-rerun-cse-after-loop). > > combine.i sed > tot GCSE CSE | compile GCSE CSE dc.sed > -----------------------------------+--------------------------------- > HEAD 10.86 0.48 1.04 | 10.50 0.33 1.08 11.77 > -----------------------------------+--------------------------------- > patched 10.07 0.43 0.35 | 9.69 0.35 0.38 11.96 > improvement 7.3% | 7.7% -1.6% > -----------------------------------+--------------------------------- > HEAD, no EBB 10.28 0.46 0.67 | 9.99 0.31 0.67 12.03 > improvement 5.3% | 4.8% -2.1% > -----------------------------------+--------------------------------- > patched+EBB 10.31 0.46 0.66 | 10.00 0.34 0.65 11.89 > improvement 5.1% | 4.8% -1.0% > -----------------------------------+--------------------------------- > HEAD, no CSE2 10.47 0.48 0.62 | 10.05 0.33 0.76 11.85 > improvement 3.6% | 4.3% -0.7% > > 4.1% on -O1 looks good to me, and I think we can safely lose 1-2% of > execution time at -O1. But for -O2 only the last two are worth running > SPEC on. If anybody wants to try, for the latter there's not even a > patch to apply. But it looks like at -O2 the RTL passes are not going > away soon. :-( You will find that you can spend your time better on fixing the bugs marked as "tree-optimization" and "missed-optimization". That is still where most of the things CSE1 and GCSE catch come from. Also, the only *real* good way of speeding up CSE is by making it work on extended basic blocks (ie. kill -fskip-blocks) and then teaching it to not rescan already visited blocks (by using a scoped hash table). Unfortunately, simply disabling -fskip-blocks doesn't give much speedup either. But that would be the first step. The second step would be to make cse.c use the CFG (ie. FOR_BB_INSNS/BB_HEAD/BB_ END/etc.) instead of relying on block notes. Next you'd clean up the path following code to track back only to the last visited block before following a jump. Gr. Steven