Hello, these are the results of a simple attempt at trimming the time
spent in CSE passes.  Not very encouraging really, but maybe it can
help more experienced people than me. 

The first thing I tried is to remove CSE1 and move EBB CSE to -O3,
using the attached patch.  This also meant that we do not run local CSE
at -O1 anymore.  Here are the results of this and other experiments;
all times were taken on a Pentium 4 machine running at 1.7 GHz.

As for bootstrapping, I have only timed a C-only --disable-checking
bootstrap.  Bootstrapping times are very similar but they are not very
representative of the effect of the patch, due to the large time spent
compiling stage2; but compiling stage3 takes 10:22 minutes instead of
11:00, which is about 6% faster.

I then timed combine.i files.  I ran the compiler five times, took out
the runs with the best and worst overall time, and averaged the other
three (the machine was very lightly loaded and has plenty of memory,
so system time did not matter).  Times are in the following table. The
headings are different at -O1 than for other optimization levels,
because CSE and/or GCSE are not run there:

             -O1       | -O2             | -O3
             tot   CSE | tot   GCSE  CSE | tot   GCSE  CSE
patched      6.74  --- | 10.07 0.43 0.35 | 15.19 1.16 0.90
HEAD         6.99 0.16 | 10.86 0.48 1.04 | 16.00 1.10 1.59
improvement  4.1%      |  7.3%           |  5.0%

For -O2 I got run-time numbers too, which I took from a CPU-intensive
sed benchmark (I used sed 4.1.1, compiled with IMA including the regex
matcher), doing measurements in the same way as above for both
compilation and the sed benchmark dc.sed.

The results are in the table that follows and are for several compilers:

1) "patched" is as above

2) "HEAD, no EBB" is mainline with -fno-cse-skip-blocks -fno-cse-follow-jumps:
the results are even worse.

3) "patched+EBB" uses the attached patch but without the hunks that move
-fcse-skip-blocks and -fcse-follow-jumps to -O3, since it looks like CSE
on EBBs is (still :-( ...) doing good, but CSE1 is not.

4) "HEAD, no CSE2" is a final try... let's disable CSE2 instead, and run a
full-power CSE1 (no GCSE column in the table since the two GCSE's look
at exactly the same things): this means moving -frerun-cse-after-loop to
-O3 (or using HEAD's compiler with -O2 -fno-rerun-cse-after-loop).

               combine.i             sed
               tot    GCSE    CSE  | compile  GCSE    CSE     dc.sed
-----------------------------------+---------------------------------
HEAD           10.86  0.48   1.04  | 10.50    0.33   1.08     11.77
-----------------------------------+---------------------------------
patched        10.07  0.43   0.35  |  9.69    0.35   0.38     11.96
improvement     7.3%               |  7.7%                    -1.6%
-----------------------------------+---------------------------------
HEAD, no EBB   10.28  0.46   0.67  |  9.99    0.31   0.67     12.03
improvement     5.3%               |  4.8%                    -2.1%
-----------------------------------+---------------------------------
patched+EBB    10.31  0.46   0.66  | 10.00    0.34   0.65     11.89
improvement     5.1%               |  4.8%                    -1.0%
-----------------------------------+---------------------------------
HEAD, no CSE2  10.47  0.48   0.62  | 10.05    0.33   0.76     11.85
improvement     3.6%               |  4.3%                    -0.7%

4.1% on -O1 looks good to me, and I think we can safely lose 1-2% of
execution time at -O1.  But for -O2 only the last two are worth running
SPEC on.  If anybody wants to try, for the latter there's not even a
patch to apply.  But it looks like at -O2 the RTL passes are not going
away soon. :-(

Regards,

Paolo


-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/