public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* [lto] preliminary SPECint benchmark numbers
@ 2007-12-24 22:42 Nathan Froyd
  2007-12-26  1:16 ` Vladimir N. Makarov
  0 siblings, 1 reply; 3+ messages in thread
From: Nathan Froyd @ 2007-12-24 22:42 UTC (permalink / raw)
  To: gcc

In one of my recent messages about a patch to the LTO branch, I
mentioned that we could compile and successfully run all of the C
SPECint benchmarks except 176.gcc.  Chris Lattner asked if I had done
any benchmarking now that real programs could be run; I said that I
hadn't but would try to do some soon.  This is the result of that.

I don't have numbers on what compile times look like, but I don't think
they're good.  176.gcc takes several minutes to compile (basically -flto
*.o, not counting the time to compile individual .o files); the other
benchmarks are all a minute or more apiece.

Executive summary: LTO is currently *not* a win.

In the table below, runtimes are in seconds.  I ran the tests on an
8-core 1.6GHz machine with 8 GB RAM.  I believe the machine was
relatively idle; I ran the tests over a weekend evening.  The last merge
from mainline to the LTO branch was mainline r130155, so that's about
what the -O2 numbers correspond to--I don't think we've changed too much
core code on the branch.  The % change are just in-my-head estimates,
using -O2 as a baseline.

		-O2	-flto	% change
164.gzip	174	176	+ 1
175.vpr		139	143	+ 3
181.mcf		162	166	+ 3
186.crafty	65.2	66.6	+ < 1
197.parser	240	261	+ 9
253.perlbmk	119	133	+ 13
254.gap		84.4	87	+ 4
256.bzip2	131	145	+ 11
300.twolf	202	193	- 4 (!)

176.gcc doesn't run correctly with LTO yet; 255.vortex didn't run
correctly with "mainline", but it did with -flto, which is curious.  We
don't do C++ yet, so 252.eon is not included.

In general, things get worse with LTO, sometimes much worse.  I can
think of at least three possible reasons off the top of my head:

- Alias information.  We don't have any type-based alias information in
  -flto, which hurts.

- We don't merge types between compilation units, which could account
  for poor optimization behavior.

- I believe we lose some information in the LTO write/read process; edge
  probabilities, estimated # instructions in functions, etc. get lost.
  This hurts inlining decisions, block layout, alignment of jump
  targets, etc.  So there's information we need to write out or
  recompute.

-Nathan

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [lto] preliminary SPECint benchmark numbers
  2007-12-24 22:42 [lto] preliminary SPECint benchmark numbers Nathan Froyd
@ 2007-12-26  1:16 ` Vladimir N. Makarov
  2007-12-26  2:11   ` Chris Lattner
  0 siblings, 1 reply; 3+ messages in thread
From: Vladimir N. Makarov @ 2007-12-26  1:16 UTC (permalink / raw)
  To: Nathan Froyd; +Cc: gcc

  Here is mine benchmarking of the current LTO branch on 2.66Ghz Core2
under RHEL 5 in 64- and 32-bits mode.  The vortex violates type
aliasing rules, therefore it should be compiled with
-fno-strict-aliasing.  Perlbmk crashed in tree.c::build2_stat in
32-bits mode when LTO used.  LTO currently generates wrong code for
176.gcc.  I've also checked Specfp2000 benchmarks written in C.

In brief,

  o the code size (text segment) with LTO is much smaller (2.7% and
    2.4% for SpecInt and 0.16% and 0.6% for SpecFp correspondingly in 64-
    and 32-bit mode).  That is very promising.
  o the compilation is 2 times slower with LTO.
  o The generated code is slower 3.6% and 2.2% for SPECint2000 and
    SpecFp2000 in 64-bit mode.  It is also 6.7% slower for SpecInt2000 in
    32-bit mode.  But SpecFp2000 in 32-bit mode code generated with LTO
    is 20% faster!  It is because art is almost 2.5 times faster with
    LTO.

The more details can be found below.

--------------------------64-bit mode----------------------------
base: -O2 -mtune=generic
peak: -O2 -mtune=generic -flto

                      base           peak
164.gzip              1363*          1340*
175.vpr               1600*          1571*
176.gcc                   X              X
181.mcf               1658*          1531*
186.crafty            2576*          2569*
197.parser            1269*          1158*
252.eon                   X              X
253.perlbmk           2546*          2373*
254.gap               1987*          1965*
255.vortex            2259*          2208*
256.bzip2             1874*          1721*
300.twolf             2548*          2627*
SPECin2000 mean       1910           1841    -3.6%

Compilation time of SPECInt2000 (except for eon and gcc):
base: 65.02user 6.25system 1:15.41elapsed 94%CPU
peak: 130.62user 9.68system 2:45.20elapsed 84%CPU

                    base        peak
168.wupwise            X           X
171.swim               X           X
172.mgrid              X           X
173.applu              X           X
177.mesa           2426*       2314*
178.galgel             X           X
179.art            6276*       5519*
183.equake         1826*       1808*
187.facerec            X           X
188.ammp           1770*       1666*
189.lucas              X           X
191.fma3d              X           X
200.sixtrack           X           X
301.apsi               X           X
SPECfp_base2000     2649        2491    -2.2%

Compilation time of SPECFp2000 (only mesa, art, equake ammp):
17.32user 1.74system 0:20.42elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
35.52user 2.88system 0:42.86elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k

text segment:
----------------CINT2000-----------------
-6.144%          38962          36568 164.gzip
-3.500%         147426         142266 175.vpr
-4.313%          12613          12069 181.mcf
-2.544%         172319         167935 186.crafty
-5.566%         108797         102741 197.parser
-5.436%         575443         544160 253.perlbmk
-5.214%         494375         468599 254.gap
-5.617%         556589         525325 255.vortex
-3.209%          32532          31488 256.bzip2
 1.132%         198639         200887 300.twolf
Average = -2.69418%
----------------CFP2000-----------------
-5.093%         522117         495526 177.mesa
 2.542%          16362          16778 179.art
 2.745%          19778          20321 183.equake
-2.919%         142532         138372 188.ammp
Average = -0.160212%

--------------------------32-bit mode----------------------------
base: -m32 -O2 -mtune=generic
peak: -m32 -O2 -mtune=generic -flto

                 base        peak
164.gzip         1261*        1125*
175.vpr          1603*        1483*
176.gcc              X            X
181.mcf          3057*        2801*
186.crafty       1764*        1691*
197.parser       1397*        1224*
252.eon              X            X
253.perlbmk          X            X
254.gap          1981*        1778*
255.vortex       2013*        1914*
256.bzip2        1666*        1580*
300.twolf        2376*        2484*
SPECint2000mean  1839         1716  -6.7%

Compilation time of SPECInt2000 (except for eon, gcc, and perlbmk):
49.36user 5.13system 0:58.57elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
99.32user 7.90system 1:56.63elapsed 91%CPU (0avgtext+0avgdata 0maxresident)k

                      base        peak
168.wupwise              X           X
171.swim                 X           X
172.mgrid                X           X
173.applu                X           X
177.mesa             1362*       1325*
178.galgel               X           X
179.art              2786*       6197*
183.equake           1784*       1772*
187.facerec              X           X
188.ammp             1144*       1102*
189.lucas                X           X
191.fma3d                X           X
200.sixtrack             X           X
301.apsi                 X           X
SPECfp2000 mean       1668        2001  +20%

Compilation time of SPECFp2000 (only mesa, art, equake ammp):
17.88user 1.85system 0:21.17elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
36.76user 2.83system 0:43.81elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k

text segment:
----------------CINT2000-----------------
-5.936%          35005          32927 164.gzip
-5.125%         137683         130627 175.vpr
-3.739%          10270           9886 181.mcf
-1.379%         195472         192776 186.crafty
-5.192%          94770          89850 197.parser
-5.436%         575443         544160 253.perlbmk
-4.400%         449316         429544 254.gap
-2.219%         564982         552446 255.vortex
-2.884%          30515          29635 256.bzip2
 0.167%         193748         194072 300.twolf
Average = -2.40954%
----------------CFP2000-----------------
-5.796%         499738         470775 177.mesa
 0.458%          13971          14035 179.art
 0.303%          17467          17520 183.equake
-5.176%         111429         105661 188.ammp
Average = -0.600618%


Nathan Froyd wrote:

>In one of my recent messages about a patch to the LTO branch, I
>mentioned that we could compile and successfully run all of the C
>SPECint benchmarks except 176.gcc.  Chris Lattner asked if I had done
>any benchmarking now that real programs could be run; I said that I
>hadn't but would try to do some soon.  This is the result of that.
>
>I don't have numbers on what compile times look like, but I don't think
>they're good.  176.gcc takes several minutes to compile (basically -flto
>*.o, not counting the time to compile individual .o files); the other
>benchmarks are all a minute or more apiece.
>
>Executive summary: LTO is currently *not* a win.
>
>In the table below, runtimes are in seconds.  I ran the tests on an
>8-core 1.6GHz machine with 8 GB RAM.  I believe the machine was
>relatively idle; I ran the tests over a weekend evening.  The last merge
>from mainline to the LTO branch was mainline r130155, so that's about
>what the -O2 numbers correspond to--I don't think we've changed too much
>core code on the branch.  The % change are just in-my-head estimates,
>using -O2 as a baseline.
>
>		-O2	-flto	% change
>164.gzip	174	176	+ 1
>175.vpr		139	143	+ 3
>181.mcf		162	166	+ 3
>186.crafty	65.2	66.6	+ < 1
>197.parser	240	261	+ 9
>253.perlbmk	119	133	+ 13
>254.gap		84.4	87	+ 4
>256.bzip2	131	145	+ 11
>300.twolf	202	193	- 4 (!)
>
>176.gcc doesn't run correctly with LTO yet; 255.vortex didn't run
>correctly with "mainline", but it did with -flto, which is curious.  We
>don't do C++ yet, so 252.eon is not included.
>
>In general, things get worse with LTO, sometimes much worse.  I can
>think of at least three possible reasons off the top of my head:
>
>- Alias information.  We don't have any type-based alias information in
>  -flto, which hurts.
>
>- We don't merge types between compilation units, which could account
>  for poor optimization behavior.
>
>- I believe we lose some information in the LTO write/read process; edge
>  probabilities, estimated # instructions in functions, etc. get lost.
>  This hurts inlining decisions, block layout, alignment of jump
>  targets, etc.  So there's information we need to write out or
>  recompute.
>
>-Nathan
>  
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [lto] preliminary SPECint benchmark numbers
  2007-12-26  1:16 ` Vladimir N. Makarov
@ 2007-12-26  2:11   ` Chris Lattner
  0 siblings, 0 replies; 3+ messages in thread
From: Chris Lattner @ 2007-12-26  2:11 UTC (permalink / raw)
  To: Vladimir N. Makarov; +Cc: Nathan Froyd, gcc


On Dec 25, 2007, at 5:02 PM, Vladimir N. Makarov wrote:

> Here is mine benchmarking of the current LTO branch on 2.66Ghz Core2
> under RHEL 5 in 64- and 32-bits mode.  The vortex violates type
> aliasing rules, therefore it should be compiled with
> -fno-strict-aliasing.  Perlbmk crashed in tree.c::build2_stat in
> 32-bits mode when LTO used.  LTO currently generates wrong code for
> 176.gcc.  I've also checked Specfp2000 benchmarks written in C.
>
> In brief,
>
> o the code size (text segment) with LTO is much smaller (2.7% and
>   2.4% for SpecInt and 0.16% and 0.6% for SpecFp correspondingly in  
> 64-
>   and 32-bit mode).  That is very promising.
> o the compilation is 2 times slower with LTO.
> o The generated code is slower 3.6% and 2.2% for SPECint2000 and
>   SpecFp2000 in 64-bit mode.  It is also 6.7% slower for SpecInt2000  
> in
>   32-bit mode.  But SpecFp2000 in 32-bit mode code generated with LTO
>   is 20% faster!  It is because art is almost 2.5 times faster with
>   LTO.

Wow, nice numbers! Is it possible to compare this to -combine, or does  
-combine work anymore?  In theory, lto and IMA should yield the same  
codegen, lto should just be usable with normal makefiles.

-Chris

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-12-26  1:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-24 22:42 [lto] preliminary SPECint benchmark numbers Nathan Froyd
2007-12-26  1:16 ` Vladimir N. Makarov
2007-12-26  2:11   ` Chris Lattner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).