* [lto] preliminary SPECint benchmark numbers
@ 2007-12-24 22:42 Nathan Froyd
2007-12-26 1:16 ` Vladimir N. Makarov
0 siblings, 1 reply; 3+ messages in thread
From: Nathan Froyd @ 2007-12-24 22:42 UTC (permalink / raw)
To: gcc
In one of my recent messages about a patch to the LTO branch, I
mentioned that we could compile and successfully run all of the C
SPECint benchmarks except 176.gcc. Chris Lattner asked if I had done
any benchmarking now that real programs could be run; I said that I
hadn't but would try to do some soon. This is the result of that.
I don't have numbers on what compile times look like, but I don't think
they're good. 176.gcc takes several minutes to compile (basically -flto
*.o, not counting the time to compile individual .o files); the other
benchmarks are all a minute or more apiece.
Executive summary: LTO is currently *not* a win.
In the table below, runtimes are in seconds. I ran the tests on an
8-core 1.6GHz machine with 8 GB RAM. I believe the machine was
relatively idle; I ran the tests over a weekend evening. The last merge
from mainline to the LTO branch was mainline r130155, so that's about
what the -O2 numbers correspond to--I don't think we've changed too much
core code on the branch. The % change are just in-my-head estimates,
using -O2 as a baseline.
-O2 -flto % change
164.gzip 174 176 + 1
175.vpr 139 143 + 3
181.mcf 162 166 + 3
186.crafty 65.2 66.6 + < 1
197.parser 240 261 + 9
253.perlbmk 119 133 + 13
254.gap 84.4 87 + 4
256.bzip2 131 145 + 11
300.twolf 202 193 - 4 (!)
176.gcc doesn't run correctly with LTO yet; 255.vortex didn't run
correctly with "mainline", but it did with -flto, which is curious. We
don't do C++ yet, so 252.eon is not included.
In general, things get worse with LTO, sometimes much worse. I can
think of at least three possible reasons off the top of my head:
- Alias information. We don't have any type-based alias information in
-flto, which hurts.
- We don't merge types between compilation units, which could account
for poor optimization behavior.
- I believe we lose some information in the LTO write/read process; edge
probabilities, estimated # instructions in functions, etc. get lost.
This hurts inlining decisions, block layout, alignment of jump
targets, etc. So there's information we need to write out or
recompute.
-Nathan
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [lto] preliminary SPECint benchmark numbers
2007-12-24 22:42 [lto] preliminary SPECint benchmark numbers Nathan Froyd
@ 2007-12-26 1:16 ` Vladimir N. Makarov
2007-12-26 2:11 ` Chris Lattner
0 siblings, 1 reply; 3+ messages in thread
From: Vladimir N. Makarov @ 2007-12-26 1:16 UTC (permalink / raw)
To: Nathan Froyd; +Cc: gcc
Here is mine benchmarking of the current LTO branch on 2.66Ghz Core2
under RHEL 5 in 64- and 32-bits mode. The vortex violates type
aliasing rules, therefore it should be compiled with
-fno-strict-aliasing. Perlbmk crashed in tree.c::build2_stat in
32-bits mode when LTO used. LTO currently generates wrong code for
176.gcc. I've also checked Specfp2000 benchmarks written in C.
In brief,
o the code size (text segment) with LTO is much smaller (2.7% and
2.4% for SpecInt and 0.16% and 0.6% for SpecFp correspondingly in 64-
and 32-bit mode). That is very promising.
o the compilation is 2 times slower with LTO.
o The generated code is slower 3.6% and 2.2% for SPECint2000 and
SpecFp2000 in 64-bit mode. It is also 6.7% slower for SpecInt2000 in
32-bit mode. But SpecFp2000 in 32-bit mode code generated with LTO
is 20% faster! It is because art is almost 2.5 times faster with
LTO.
The more details can be found below.
--------------------------64-bit mode----------------------------
base: -O2 -mtune=generic
peak: -O2 -mtune=generic -flto
base peak
164.gzip 1363* 1340*
175.vpr 1600* 1571*
176.gcc X X
181.mcf 1658* 1531*
186.crafty 2576* 2569*
197.parser 1269* 1158*
252.eon X X
253.perlbmk 2546* 2373*
254.gap 1987* 1965*
255.vortex 2259* 2208*
256.bzip2 1874* 1721*
300.twolf 2548* 2627*
SPECin2000 mean 1910 1841 -3.6%
Compilation time of SPECInt2000 (except for eon and gcc):
base: 65.02user 6.25system 1:15.41elapsed 94%CPU
peak: 130.62user 9.68system 2:45.20elapsed 84%CPU
base peak
168.wupwise X X
171.swim X X
172.mgrid X X
173.applu X X
177.mesa 2426* 2314*
178.galgel X X
179.art 6276* 5519*
183.equake 1826* 1808*
187.facerec X X
188.ammp 1770* 1666*
189.lucas X X
191.fma3d X X
200.sixtrack X X
301.apsi X X
SPECfp_base2000 2649 2491 -2.2%
Compilation time of SPECFp2000 (only mesa, art, equake ammp):
17.32user 1.74system 0:20.42elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
35.52user 2.88system 0:42.86elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k
text segment:
----------------CINT2000-----------------
-6.144% 38962 36568 164.gzip
-3.500% 147426 142266 175.vpr
-4.313% 12613 12069 181.mcf
-2.544% 172319 167935 186.crafty
-5.566% 108797 102741 197.parser
-5.436% 575443 544160 253.perlbmk
-5.214% 494375 468599 254.gap
-5.617% 556589 525325 255.vortex
-3.209% 32532 31488 256.bzip2
1.132% 198639 200887 300.twolf
Average = -2.69418%
----------------CFP2000-----------------
-5.093% 522117 495526 177.mesa
2.542% 16362 16778 179.art
2.745% 19778 20321 183.equake
-2.919% 142532 138372 188.ammp
Average = -0.160212%
--------------------------32-bit mode----------------------------
base: -m32 -O2 -mtune=generic
peak: -m32 -O2 -mtune=generic -flto
base peak
164.gzip 1261* 1125*
175.vpr 1603* 1483*
176.gcc X X
181.mcf 3057* 2801*
186.crafty 1764* 1691*
197.parser 1397* 1224*
252.eon X X
253.perlbmk X X
254.gap 1981* 1778*
255.vortex 2013* 1914*
256.bzip2 1666* 1580*
300.twolf 2376* 2484*
SPECint2000mean 1839 1716 -6.7%
Compilation time of SPECInt2000 (except for eon, gcc, and perlbmk):
49.36user 5.13system 0:58.57elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
99.32user 7.90system 1:56.63elapsed 91%CPU (0avgtext+0avgdata 0maxresident)k
base peak
168.wupwise X X
171.swim X X
172.mgrid X X
173.applu X X
177.mesa 1362* 1325*
178.galgel X X
179.art 2786* 6197*
183.equake 1784* 1772*
187.facerec X X
188.ammp 1144* 1102*
189.lucas X X
191.fma3d X X
200.sixtrack X X
301.apsi X X
SPECfp2000 mean 1668 2001 +20%
Compilation time of SPECFp2000 (only mesa, art, equake ammp):
17.88user 1.85system 0:21.17elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
36.76user 2.83system 0:43.81elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
text segment:
----------------CINT2000-----------------
-5.936% 35005 32927 164.gzip
-5.125% 137683 130627 175.vpr
-3.739% 10270 9886 181.mcf
-1.379% 195472 192776 186.crafty
-5.192% 94770 89850 197.parser
-5.436% 575443 544160 253.perlbmk
-4.400% 449316 429544 254.gap
-2.219% 564982 552446 255.vortex
-2.884% 30515 29635 256.bzip2
0.167% 193748 194072 300.twolf
Average = -2.40954%
----------------CFP2000-----------------
-5.796% 499738 470775 177.mesa
0.458% 13971 14035 179.art
0.303% 17467 17520 183.equake
-5.176% 111429 105661 188.ammp
Average = -0.600618%
Nathan Froyd wrote:
>In one of my recent messages about a patch to the LTO branch, I
>mentioned that we could compile and successfully run all of the C
>SPECint benchmarks except 176.gcc. Chris Lattner asked if I had done
>any benchmarking now that real programs could be run; I said that I
>hadn't but would try to do some soon. This is the result of that.
>
>I don't have numbers on what compile times look like, but I don't think
>they're good. 176.gcc takes several minutes to compile (basically -flto
>*.o, not counting the time to compile individual .o files); the other
>benchmarks are all a minute or more apiece.
>
>Executive summary: LTO is currently *not* a win.
>
>In the table below, runtimes are in seconds. I ran the tests on an
>8-core 1.6GHz machine with 8 GB RAM. I believe the machine was
>relatively idle; I ran the tests over a weekend evening. The last merge
>from mainline to the LTO branch was mainline r130155, so that's about
>what the -O2 numbers correspond to--I don't think we've changed too much
>core code on the branch. The % change are just in-my-head estimates,
>using -O2 as a baseline.
>
> -O2 -flto % change
>164.gzip 174 176 + 1
>175.vpr 139 143 + 3
>181.mcf 162 166 + 3
>186.crafty 65.2 66.6 + < 1
>197.parser 240 261 + 9
>253.perlbmk 119 133 + 13
>254.gap 84.4 87 + 4
>256.bzip2 131 145 + 11
>300.twolf 202 193 - 4 (!)
>
>176.gcc doesn't run correctly with LTO yet; 255.vortex didn't run
>correctly with "mainline", but it did with -flto, which is curious. We
>don't do C++ yet, so 252.eon is not included.
>
>In general, things get worse with LTO, sometimes much worse. I can
>think of at least three possible reasons off the top of my head:
>
>- Alias information. We don't have any type-based alias information in
> -flto, which hurts.
>
>- We don't merge types between compilation units, which could account
> for poor optimization behavior.
>
>- I believe we lose some information in the LTO write/read process; edge
> probabilities, estimated # instructions in functions, etc. get lost.
> This hurts inlining decisions, block layout, alignment of jump
> targets, etc. So there's information we need to write out or
> recompute.
>
>-Nathan
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [lto] preliminary SPECint benchmark numbers
2007-12-26 1:16 ` Vladimir N. Makarov
@ 2007-12-26 2:11 ` Chris Lattner
0 siblings, 0 replies; 3+ messages in thread
From: Chris Lattner @ 2007-12-26 2:11 UTC (permalink / raw)
To: Vladimir N. Makarov; +Cc: Nathan Froyd, gcc
On Dec 25, 2007, at 5:02 PM, Vladimir N. Makarov wrote:
> Here is mine benchmarking of the current LTO branch on 2.66Ghz Core2
> under RHEL 5 in 64- and 32-bits mode. The vortex violates type
> aliasing rules, therefore it should be compiled with
> -fno-strict-aliasing. Perlbmk crashed in tree.c::build2_stat in
> 32-bits mode when LTO used. LTO currently generates wrong code for
> 176.gcc. I've also checked Specfp2000 benchmarks written in C.
>
> In brief,
>
> o the code size (text segment) with LTO is much smaller (2.7% and
> 2.4% for SpecInt and 0.16% and 0.6% for SpecFp correspondingly in
> 64-
> and 32-bit mode). That is very promising.
> o the compilation is 2 times slower with LTO.
> o The generated code is slower 3.6% and 2.2% for SPECint2000 and
> SpecFp2000 in 64-bit mode. It is also 6.7% slower for SpecInt2000
> in
> 32-bit mode. But SpecFp2000 in 32-bit mode code generated with LTO
> is 20% faster! It is because art is almost 2.5 times faster with
> LTO.
Wow, nice numbers! Is it possible to compare this to -combine, or does
-combine work anymore? In theory, lto and IMA should yield the same
codegen, lto should just be usable with normal makefiles.
-Chris
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-12-26 1:16 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-24 22:42 [lto] preliminary SPECint benchmark numbers Nathan Froyd
2007-12-26 1:16 ` Vladimir N. Makarov
2007-12-26 2:11 ` Chris Lattner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).