From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-101733-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 31396 invoked by alias); 31 Aug 2004 13:50:30 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 31385 invoked from network); 31 Aug 2004 13:50:27 -0000
Received: from unknown (HELO Cantor.suse.de) (195.135.220.2)
  by sourceware.org with SMTP; 31 Aug 2004 13:50:27 -0000
Received: from extimap.suse.de (extimap.suse.de [195.135.220.6])
	(using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits))
	(No client certificate requested)
	by Cantor.suse.de (Postfix) with ESMTP id 7403EB497D8;
	Tue, 31 Aug 2004 15:50:26 +0200 (CEST)
Received: from stevenb.home.suse.de (70-90.ipact.nl [82.210.90.70])
	(using TLSv1 with cipher RC4-MD5 (128/128 bits))
	(Client did not present a certificate)
	by extimap.suse.de (Postfix) with ESMTP
	id A0BAE6C8A3; Tue, 31 Aug 2004 15:50:25 +0200 (CEST)
From: Steven Bosscher <stevenb@suse.de>
To: Paolo Bonzini <paolo.bonzini@polimi.it>, gcc@gcc.gnu.org
Subject: Re: Compile-time and execution-time impact of CSE - some numbers
Date: Tue, 31 Aug 2004 14:39:00 -0000
User-Agent: KMail/1.5.4
References: <1093958955.41347d2b7d59f@webmail.polimi.it>
In-Reply-To: <1093958955.41347d2b7d59f@webmail.polimi.it>
Organization: SUSE Labs
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200408311550.27667.stevenb@suse.de>
X-SW-Source: 2004-08/txt/msg01623.txt.bz2

On Tuesday 31 August 2004 15:29, Paolo Bonzini wrote:
> Hello, these are the results of a simple attempt at trimming the time
> spent in CSE passes.  Not very encouraging really, but maybe it can
> help more experienced people than me.
>
> The first thing I tried is to remove CSE1 and move EBB CSE to -O3,
> using the attached patch.  This also meant that we do not run local CSE
> at -O1 anymore.  Here are the results of this and other experiments;
> all times were taken on a Pentium 4 machine running at 1.7 GHz.
>
> As for bootstrapping, I have only timed a C-only --disable-checking
> bootstrap.  Bootstrapping times are very similar but they are not very
> representative of the effect of the patch, due to the large time spent
> compiling stage2; but compiling stage3 takes 10:22 minutes instead of
> 11:00, which is about 6% faster.
>
> I then timed combine.i files.  I ran the compiler five times, took out
> the runs with the best and worst overall time, and averaged the other
> three (the machine was very lightly loaded and has plenty of memory,
> so system time did not matter).  Times are in the following table. The
> headings are different at -O1 than for other optimization levels,
> because CSE and/or GCSE are not run there:
>
>              -O1       | -O2             | -O3
>              tot   CSE | tot   GCSE  CSE | tot   GCSE  CSE
> patched      6.74  --- | 10.07 0.43 0.35 | 15.19 1.16 0.90
> HEAD         6.99 0.16 | 10.86 0.48 1.04 | 16.00 1.10 1.59
> improvement  4.1%      |  7.3%           |  5.0%
>
> For -O2 I got run-time numbers too, which I took from a CPU-intensive
> sed benchmark (I used sed 4.1.1, compiled with IMA including the regex
> matcher), doing measurements in the same way as above for both
> compilation and the sed benchmark dc.sed.
>
> The results are in the table that follows and are for several compilers:
>
> 1) "patched" is as above
>
> 2) "HEAD, no EBB" is mainline with -fno-cse-skip-blocks
> -fno-cse-follow-jumps: the results are even worse.
>
> 3) "patched+EBB" uses the attached patch but without the hunks that move
> -fcse-skip-blocks and -fcse-follow-jumps to -O3, since it looks like CSE
> on EBBs is (still :-( ...) doing good, but CSE1 is not.
>
> 4) "HEAD, no CSE2" is a final try... let's disable CSE2 instead, and run a
> full-power CSE1 (no GCSE column in the table since the two GCSE's look
> at exactly the same things): this means moving -frerun-cse-after-loop to
> -O3 (or using HEAD's compiler with -O2 -fno-rerun-cse-after-loop).
>
>                combine.i             sed
>                tot    GCSE    CSE  | compile  GCSE    CSE     dc.sed
> -----------------------------------+---------------------------------
> HEAD           10.86  0.48   1.04  | 10.50    0.33   1.08     11.77
> -----------------------------------+---------------------------------
> patched        10.07  0.43   0.35  |  9.69    0.35   0.38     11.96
> improvement     7.3%               |  7.7%                    -1.6%
> -----------------------------------+---------------------------------
> HEAD, no EBB   10.28  0.46   0.67  |  9.99    0.31   0.67     12.03
> improvement     5.3%               |  4.8%                    -2.1%
> -----------------------------------+---------------------------------
> patched+EBB    10.31  0.46   0.66  | 10.00    0.34   0.65     11.89
> improvement     5.1%               |  4.8%                    -1.0%
> -----------------------------------+---------------------------------
> HEAD, no CSE2  10.47  0.48   0.62  | 10.05    0.33   0.76     11.85
> improvement     3.6%               |  4.3%                    -0.7%
>
> 4.1% on -O1 looks good to me, and I think we can safely lose 1-2% of
> execution time at -O1.  But for -O2 only the last two are worth running
> SPEC on.  If anybody wants to try, for the latter there's not even a
> patch to apply.  But it looks like at -O2 the RTL passes are not going
> away soon. :-(

You will find that you can spend your time better on fixing the bugs
marked as "tree-optimization" and "missed-optimization".  That is still
where most of the things CSE1 and GCSE catch come from.

Also, the only *real* good way of speeding up CSE is by making it work
on extended basic blocks (ie. kill -fskip-blocks) and then teaching it
to not rescan already visited blocks (by using a scoped hash table).

Unfortunately, simply disabling -fskip-blocks doesn't give much speedup
either.  But that would be the first step.   The second step would be to
make cse.c use the CFG (ie. FOR_BB_INSNS/BB_HEAD/BB_ END/etc.) instead
of relying on block notes.  Next you'd clean up the path following code
to track back only to the last visited block before following a jump.

Gr.
Steven