Hello, these are the results of a simple attempt at trimming the time spent in CSE passes. Not very encouraging really, but maybe it can help more experienced people than me. The first thing I tried is to remove CSE1 and move EBB CSE to -O3, using the attached patch. This also meant that we do not run local CSE at -O1 anymore. Here are the results of this and other experiments; all times were taken on a Pentium 4 machine running at 1.7 GHz. As for bootstrapping, I have only timed a C-only --disable-checking bootstrap. Bootstrapping times are very similar but they are not very representative of the effect of the patch, due to the large time spent compiling stage2; but compiling stage3 takes 10:22 minutes instead of 11:00, which is about 6% faster. I then timed combine.i files. I ran the compiler five times, took out the runs with the best and worst overall time, and averaged the other three (the machine was very lightly loaded and has plenty of memory, so system time did not matter). Times are in the following table. The headings are different at -O1 than for other optimization levels, because CSE and/or GCSE are not run there: -O1 | -O2 | -O3 tot CSE | tot GCSE CSE | tot GCSE CSE patched 6.74 --- | 10.07 0.43 0.35 | 15.19 1.16 0.90 HEAD 6.99 0.16 | 10.86 0.48 1.04 | 16.00 1.10 1.59 improvement 4.1% | 7.3% | 5.0% For -O2 I got run-time numbers too, which I took from a CPU-intensive sed benchmark (I used sed 4.1.1, compiled with IMA including the regex matcher), doing measurements in the same way as above for both compilation and the sed benchmark dc.sed. The results are in the table that follows and are for several compilers: 1) "patched" is as above 2) "HEAD, no EBB" is mainline with -fno-cse-skip-blocks -fno-cse-follow-jumps: the results are even worse. 3) "patched+EBB" uses the attached patch but without the hunks that move -fcse-skip-blocks and -fcse-follow-jumps to -O3, since it looks like CSE on EBBs is (still :-( ...) doing good, but CSE1 is not. 4) "HEAD, no CSE2" is a final try... let's disable CSE2 instead, and run a full-power CSE1 (no GCSE column in the table since the two GCSE's look at exactly the same things): this means moving -frerun-cse-after-loop to -O3 (or using HEAD's compiler with -O2 -fno-rerun-cse-after-loop). combine.i sed tot GCSE CSE | compile GCSE CSE dc.sed -----------------------------------+--------------------------------- HEAD 10.86 0.48 1.04 | 10.50 0.33 1.08 11.77 -----------------------------------+--------------------------------- patched 10.07 0.43 0.35 | 9.69 0.35 0.38 11.96 improvement 7.3% | 7.7% -1.6% -----------------------------------+--------------------------------- HEAD, no EBB 10.28 0.46 0.67 | 9.99 0.31 0.67 12.03 improvement 5.3% | 4.8% -2.1% -----------------------------------+--------------------------------- patched+EBB 10.31 0.46 0.66 | 10.00 0.34 0.65 11.89 improvement 5.1% | 4.8% -1.0% -----------------------------------+--------------------------------- HEAD, no CSE2 10.47 0.48 0.62 | 10.05 0.33 0.76 11.85 improvement 3.6% | 4.3% -0.7% 4.1% on -O1 looks good to me, and I think we can safely lose 1-2% of execution time at -O1. But for -O2 only the last two are worth running SPEC on. If anybody wants to try, for the latter there's not even a patch to apply. But it looks like at -O2 the RTL passes are not going away soon. :-( Regards, Paolo ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/