public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug lto/54966] New: Does LTO requires a larger inline-unit-growth?
@ 2012-10-18  6:29 vincenzo.innocente at cern dot ch
  2012-10-18  7:47 ` [Bug lto/54966] " dominiq at lps dot ens.fr
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2012-10-18  6:29 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54966

             Bug #: 54966
           Summary: Does LTO requires a larger inline-unit-growth?
    Classification: Unclassified
           Product: gcc
           Version: 4.7.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: lto
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: vincenzo.innocente@cern.ch


experimenting with our large application I'm under the impression that in order
to get the same level of inlining with lto I need to increase
--param inline-unit-growth to at least 50 (my understanding is that the default
is 30).
Is this something expected / already observed?

Do you have any plan to retune heuristics for lto?

It is quite critical with some meta-template "expression expansion" code


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug lto/54966] Does LTO requires a larger inline-unit-growth?
  2012-10-18  6:29 [Bug lto/54966] New: Does LTO requires a larger inline-unit-growth? vincenzo.innocente at cern dot ch
@ 2012-10-18  7:47 ` dominiq at lps dot ens.fr
  2012-10-18  9:44 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-10-18  7:47 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54966

--- Comment #1 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-10-18 07:47:15 UTC ---
This seems related to pr48636. Could you try the patch in comment #20:
http://gcc.gnu.org/bugzilla/attachment.cgi?id=28456 ?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug lto/54966] Does LTO requires a larger inline-unit-growth?
  2012-10-18  6:29 [Bug lto/54966] New: Does LTO requires a larger inline-unit-growth? vincenzo.innocente at cern dot ch
  2012-10-18  7:47 ` [Bug lto/54966] " dominiq at lps dot ens.fr
@ 2012-10-18  9:44 ` rguenth at gcc dot gnu.org
  2012-10-19  8:37 ` vincenzo.innocente at cern dot ch
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-10-18  9:44 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54966

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |lto
                 CC|                            |hubicka at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> 2012-10-18 09:43:45 UTC ---
I'm not sure how we count the initial unit size, given that when not using
LTO not merged comdats are probably counted here, so overall they add up
while the initial LTO unit size may be considerably smaller than the sum
of the non-LTO unit sizes.

Honza?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug lto/54966] Does LTO requires a larger inline-unit-growth?
  2012-10-18  6:29 [Bug lto/54966] New: Does LTO requires a larger inline-unit-growth? vincenzo.innocente at cern dot ch
  2012-10-18  7:47 ` [Bug lto/54966] " dominiq at lps dot ens.fr
  2012-10-18  9:44 ` rguenth at gcc dot gnu.org
@ 2012-10-19  8:37 ` vincenzo.innocente at cern dot ch
  2012-10-23 13:59 ` hubicka at ucw dot cz
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2012-10-19  8:37 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54966

--- Comment #3 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2012-10-19 08:36:20 UTC ---
the patch fails w.r.t. 4.7 

patch -p0 < ../../inline.patch 
patching file ipa-inline.c
Hunk #1 FAILED at 473.
Hunk #2 FAILED at 491.
Hunk #3 FAILED at 545.
3 out of 3 hunks FAILED -- saving rejects to file ipa-inline.c.rej

we are not ready to upgrade to 4.8


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug lto/54966] Does LTO requires a larger inline-unit-growth?
  2012-10-18  6:29 [Bug lto/54966] New: Does LTO requires a larger inline-unit-growth? vincenzo.innocente at cern dot ch
                   ` (2 preceding siblings ...)
  2012-10-19  8:37 ` vincenzo.innocente at cern dot ch
@ 2012-10-23 13:59 ` hubicka at ucw dot cz
  2012-10-23 14:02 ` rguenther at suse dot de
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: hubicka at ucw dot cz @ 2012-10-23 13:59 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54966

--- Comment #4 from Jan Hubicka <hubicka at ucw dot cz> 2012-10-23 13:59:38 UTC ---
> I'm not sure how we count the initial unit size, given that when not using
> LTO not merged comdats are probably counted here, so overall they add up
> while the initial LTO unit size may be considerably smaller than the sum
> of the non-LTO unit sizes.

I do not realy see problem here.   We simply count size of the unit by summing
all the functions in the callgraph prior inlining (after merging). So in the
case of LTO we count COMDATs once and if they are unused by non-LTO we promote
them to static and get better inlining due removing offline copies (that we are
acccounting as inline decisions are made).  In the case of non-LTO we count
COMDAT in every unit that has the COMDAT used + we have heuristic predicting
that most likely the COMDAT will be eliminated in the other units, too, if it
is eliminated in the current unit.  So we inline them almost as aggressively as
statics, but not quite.

What kind of problem are you looking into?
Honza


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug lto/54966] Does LTO requires a larger inline-unit-growth?
  2012-10-18  6:29 [Bug lto/54966] New: Does LTO requires a larger inline-unit-growth? vincenzo.innocente at cern dot ch
                   ` (3 preceding siblings ...)
  2012-10-23 13:59 ` hubicka at ucw dot cz
@ 2012-10-23 14:02 ` rguenther at suse dot de
  2012-10-23 14:13 ` hubicka at ucw dot cz
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenther at suse dot de @ 2012-10-23 14:02 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54966

--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> 2012-10-23 14:02:05 UTC ---
On Tue, 23 Oct 2012, hubicka at ucw dot cz wrote:

> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54966
> 
> --- Comment #4 from Jan Hubicka <hubicka at ucw dot cz> 2012-10-23 13:59:38 UTC ---
> > I'm not sure how we count the initial unit size, given that when not using
> > LTO not merged comdats are probably counted here, so overall they add up
> > while the initial LTO unit size may be considerably smaller than the sum
> > of the non-LTO unit sizes.
> 
> I do not realy see problem here.   We simply count size of the unit by summing
> all the functions in the callgraph prior inlining (after merging). So in the
> case of LTO we count COMDATs once and if they are unused by non-LTO we promote
> them to static and get better inlining due removing offline copies (that we are
> acccounting as inline decisions are made).  In the case of non-LTO we count
> COMDAT in every unit that has the COMDAT used + we have heuristic predicting
> that most likely the COMDAT will be eliminated in the other units, too, if it
> is eliminated in the current unit.  So we inline them almost as aggressively as
> statics, but not quite.
> 
> What kind of problem are you looking into?

I was just guessing why our overall unit-growth heuristics would
lead to different overall inlining with LTO vs. single TUs.

Richard.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug lto/54966] Does LTO requires a larger inline-unit-growth?
  2012-10-18  6:29 [Bug lto/54966] New: Does LTO requires a larger inline-unit-growth? vincenzo.innocente at cern dot ch
                   ` (4 preceding siblings ...)
  2012-10-23 14:02 ` rguenther at suse dot de
@ 2012-10-23 14:13 ` hubicka at ucw dot cz
  2012-11-08 16:44 ` hubicka at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: hubicka at ucw dot cz @ 2012-10-23 14:13 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54966

--- Comment #6 from Jan Hubicka <hubicka at ucw dot cz> 2012-10-23 14:12:43 UTC ---
The patch suggesed by Dminique is not going to help here.

> I was just guessing why our overall unit-growth heuristics would
> lead to different overall inlining with LTO vs. single TUs.

Well, the situation is usually as follow.  You have relatively large X that
needs inlining into Y in an unit that contains few extra inlining candidates
(usually smaller than Y).

When compiling without/LTO the inliner do not hit the unit-growth limit at all
because there are relatively few candidates and Y is one of them.

With LTO there are very many cross module inlining candidates.  Many of htem
will end up in the priority queue before Y and consequentely Y may fall out of
the threshold.

Just the common case we discussed many times when you make inliner's
profitability based mostly on the global properties of the program.

One way I handled this is to introduce inliner's hints to prioritize inlining
of large functions that seems profitable for other reasons. (so it may be worth
trying 4.8 and see if it fares better and if not giving me some example on how
the function to be inlined looks like)

Other thing I wondered about is possibility of increasing badness of cross
module inlining counting on the fact that programs are usually organized
in a way that hot inline candidates are in the same unit.
Of course this is bit backwards to the overall goal of making LTO to
simplify .h files.

Honza


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug lto/54966] Does LTO requires a larger inline-unit-growth?
  2012-10-18  6:29 [Bug lto/54966] New: Does LTO requires a larger inline-unit-growth? vincenzo.innocente at cern dot ch
                   ` (5 preceding siblings ...)
  2012-10-23 14:13 ` hubicka at ucw dot cz
@ 2012-11-08 16:44 ` hubicka at gcc dot gnu.org
  2012-11-09  6:39 ` vincenzo.innocente at cern dot ch
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: hubicka at gcc dot gnu.org @ 2012-11-08 16:44 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54966

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2012-11-08
     Ever Confirmed|0                           |1

--- Comment #7 from Jan Hubicka <hubicka at gcc dot gnu.org> 2012-11-08 16:44:36 UTC ---
The inline metrics for 4.8 was changed significandly, I would be curious if
your application now behaves better (or worse)?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug lto/54966] Does LTO requires a larger inline-unit-growth?
  2012-10-18  6:29 [Bug lto/54966] New: Does LTO requires a larger inline-unit-growth? vincenzo.innocente at cern dot ch
                   ` (6 preceding siblings ...)
  2012-11-08 16:44 ` hubicka at gcc dot gnu.org
@ 2012-11-09  6:39 ` vincenzo.innocente at cern dot ch
  2012-11-09  6:52 ` vincenzo.innocente at cern dot ch
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2012-11-09  6:39 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54966

--- Comment #8 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2012-11-09 06:39:33 UTC ---
Created attachment 28646
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28646
test case  (preprocessed with gcc 4.7.2)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug lto/54966] Does LTO requires a larger inline-unit-growth?
  2012-10-18  6:29 [Bug lto/54966] New: Does LTO requires a larger inline-unit-growth? vincenzo.innocente at cern dot ch
                   ` (7 preceding siblings ...)
  2012-11-09  6:39 ` vincenzo.innocente at cern dot ch
@ 2012-11-09  6:52 ` vincenzo.innocente at cern dot ch
  2012-11-09 11:33 ` vincenzo.innocente at cern dot ch
  2012-11-12 13:20 ` vincenzo.innocente at cern dot ch
  10 siblings, 0 replies; 12+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2012-11-09  6:52 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54966

--- Comment #9 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2012-11-09 06:52:22 UTC ---
better and worse!
better than 4.7.2 
lto is worse in 4.8
Attached is a test case, just one file
bzip2 -d smatrix.ii.bz2 

the main component is this
three different way of computing the same matrix multiplication with more and
more
explicit temporaries..
    tvc = 
      //33       3N          NN  NM  MM
      -ve * (Transpose(a)) * tw * b * s;
  }


  void compute2() {
    //                        NM    NM  MM
    AlgebraicMatrixNM twbs =  tw *  b * s;
    tvc = 
      //33       3N         
    -ve * (Transpose(a)) * twbs;
  }

  void compute3() {
    //                        NM    NM  MM
    AlgebraicMatrixNM twbs =  tw *  b * s;
    //                       33       3N         
    AlgebraicMatrix3N tmpM1 = -ve * (Transpose(a));
    tvc = tmpM1 * twbs;
  }

timing is in cycle using __rdtsc(); on Inel x86_64
stability maybe 10%…

Target: x86_64-unknown-linux-gnu
gcc version 4.7.2 (GCC) 

[vocms123] ~/public/ctest/bugs48 $ c++ -O2 smatrix.ii; ./a.out 
size 5.  v1: time in cycles 30159.6
size 5.  v2: time in cycles 6031.1
size 5.  v3: time in cycles 3388.17
size 6.  v1: time in cycles 60336.6
size 6.  v2: time in cycles 10366.8
size 6.  v3: time in cycles 5955.81
[vocms123] ~/public/ctest/bugs48 $ c++ -O2 smatrix.ii -flto; ./a.out
size 5.  v1: time in cycles 12818.8
size 5.  v2: time in cycles 10453.9
size 5.  v3: time in cycles 5954.23
size 6.  v1: time in cycles 46293.1
size 6.  v2: time in cycles 12309.2
size 6.  v3: time in cycles 11621.8
[vocms123] ~/public/ctest/bugs48 $ c++ -O3 smatrix.ii; ./a.out
size 5.  v1: time in cycles 39630.3
size 5.  v2: time in cycles 5869.96
size 5.  v3: time in cycles 1966.87
size 6.  v1: time in cycles 69531.4
size 6.  v2: time in cycles 9020.06
size 6.  v3: time in cycles 4732.99
[vocms123] ~/public/ctest/bugs48 $ c++ -O3 smatrix.ii -flto; ./a.out
size 5.  v1: time in cycles 12425.1
size 5.  v2: time in cycles 9650.03
size 5.  v3: time in cycles 5340.79
size 6.  v1: time in cycles 45998
size 6.  v2: time in cycles 11128.1
size 6.  v3: time in cycles 10383


gcc version 4.8.0 20121108 (experimental) [trunk revision 193333] (GCC) 

[vocms123] ~/public/ctest/bugs48 $ c++ -O2 smatrix.ii ; ./a.out
size 5.  v1: time in cycles 14040.5
size 5.  v2: time in cycles 3264.85
size 5.  v3: time in cycles 3457.25
size 6.  v1: time in cycles 37368.4
size 6.  v2: time in cycles 5813.81
size 6.  v3: time in cycles 6224.37
[vocms123] ~/public/ctest/bugs48 $ c++ -O2 smatrix.ii -flto ; ./a.out
size 5.  v1: time in cycles 20705.2
size 5.  v2: time in cycles 6333.17
size 5.  v3: time in cycles 6654.85
size 6.  v1: time in cycles 49788.8
size 6.  v2: time in cycles 6828.72
size 6.  v3: time in cycles 7188.28
[vocms123] ~/public/ctest/bugs48 $ c++ -O3 smatrix.ii ; ./a.out
size 5.  v1: time in cycles 17350.4
size 5.  v2: time in cycles 2355.68
size 5.  v3: time in cycles 1891.62
size 6.  v1: time in cycles 38002
size 6.  v2: time in cycles 4046.09
size 6.  v3: time in cycles 3954.97
[vocms123] ~/public/ctest/bugs48 $ c++ -O3 smatrix.ii -flto; ./a.out
size 5.  v1: time in cycles 20380.9
size 5.  v2: time in cycles 4504.76
size 5.  v3: time in cycles 4576.5
size 6.  v1: time in cycles 42327.9
size 6.  v2: time in cycles 3405.72
size 6.  v3: time in cycles 3314.2

stability test


[vocms123] ~/public/ctest/bugs48 $ c++ -O3 smatrix.ii -flto ; ./a.out
size 5.  v1: time in cycles 20447.2
size 5.  v2: time in cycles 4509.64
size 5.  v3: time in cycles 4580.46
size 6.  v1: time in cycles 42361.2
size 6.  v2: time in cycles 3407.53
size 6.  v3: time in cycles 3316.36
[vocms123] ~/public/ctest/bugs48 $ ./a.out 
size 5.  v1: time in cycles 23968.7
size 5.  v2: time in cycles 5277.67
size 5.  v3: time in cycles 5353.72
size 6.  v1: time in cycles 49573
size 6.  v2: time in cycles 4023.34
size 6.  v3: time in cycles 3862.6


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug lto/54966] Does LTO requires a larger inline-unit-growth?
  2012-10-18  6:29 [Bug lto/54966] New: Does LTO requires a larger inline-unit-growth? vincenzo.innocente at cern dot ch
                   ` (8 preceding siblings ...)
  2012-11-09  6:52 ` vincenzo.innocente at cern dot ch
@ 2012-11-09 11:33 ` vincenzo.innocente at cern dot ch
  2012-11-12 13:20 ` vincenzo.innocente at cern dot ch
  10 siblings, 0 replies; 12+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2012-11-09 11:33 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54966

--- Comment #10 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2012-11-09 11:33:37 UTC ---
I've repeated the tests again on a different machine and the result are the
same 
gcc version 4.8.0 20121108 (experimental) [trunk revision 193333] (GCC) 
at O3 lto degrades the performances in two cases, improves in another…
at O2 lto just degrades performance for two cases (different than O3) and does
not improve for the others
with Ofast the differences are even more dramatic

[innocent@vinavx0 bugs48]$ c++ -Ofast smatrix.ii -march=native -flto ; taskset
-c 2 ./a.out
size 5.  v1: time in cycles 14255.6
size 5.  v2: time in cycles 3699.28
size 5.  v3: time in cycles 3715.55
size 6.  v1: time in cycles 9179.85
size 6.  v2: time in cycles 1906.91
size 6.  v3: time in cycles 1812.73

[innocent@vinavx0 bugs48]$ c++ -Ofast smatrix.ii -march=native ; taskset -c 2
./a.out
size 5.  v1: time in cycles 13933.9
size 5.  v2: time in cycles 2125.56
size 5.  v3: time in cycles 1028.43
size 6.  v1: time in cycles 28168
size 6.  v2: time in cycles 3528.72
size 6.  v3: time in cycles 2533.5


c++ -O3 smatrix.ii -march=native;  taskset -c 2 ./a.out
size 5.  v1: time in cycles 13896.1
size 5.  v2: time in cycles 2107.25
size 5.  v3: time in cycles 1647.42
size 6.  v1: time in cycles 31095.6
size 6.  v2: time in cycles 3862.43
size 6.  v3: time in cycles 3510.14



c++ -O3 smatrix.ii -march=native -flto; ./a.out
size 5.  v1: time in cycles 16183.5
size 5.  v2: time in cycles 3696.15
size 5.  v3: time in cycles 3698.27
size 6.  v1: time in cycles 36323.5
size 6.  v2: time in cycles 2799.47
size 6.  v3: time in cycles 2705.73

[innocent@vinavx0 bugs48]$ taskset -c 2 ./a.out
size 5.  v1: time in cycles 16150.1
size 5.  v2: time in cycles 3718.54
size 5.  v3: time in cycles 3784.38
size 6.  v1: time in cycles 36326.3
size 6.  v2: time in cycles 2785.33
size 6.  v3: time in cycles 2714.69



c++ -O2 smatrix.ii -march=native -flto ;  taskset -c 2 ./a.out
size 5.  v1: time in cycles 13809.2
size 5.  v2: time in cycles 3999.39
size 5.  v3: time in cycles 4186.2
size 6.  v1: time in cycles 35057.3
size 6.  v2: time in cycles 4657.59
size 6.  v3: time in cycles 4766.62

c++ -O2 smatrix.ii -march=native;  taskset -c 2 ./a.out
size 5.  v1: time in cycles 11300.6
size 5.  v2: time in cycles 2877.27
size 5.  v3: time in cycles 2947.01
size 6.  v1: time in cycles 30520
size 6.  v2: time in cycles 4623.54
size 6.  v3: time in cycles 5287.95


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug lto/54966] Does LTO requires a larger inline-unit-growth?
  2012-10-18  6:29 [Bug lto/54966] New: Does LTO requires a larger inline-unit-growth? vincenzo.innocente at cern dot ch
                   ` (9 preceding siblings ...)
  2012-11-09 11:33 ` vincenzo.innocente at cern dot ch
@ 2012-11-12 13:20 ` vincenzo.innocente at cern dot ch
  10 siblings, 0 replies; 12+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2012-11-12 13:20 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54966

--- Comment #11 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2012-11-12 13:19:42 UTC ---
much better with 
gcc version 4.8.0 20121112 (experimental) [trunk revision 193427] (GCC) 
but for "size 6.  v1" with lto

[innocent@vinavx0 bugs48]$ c++ -Ofast smatrix.ii -march=native ; taskset -c 2
./a.out
size 5.  v1: time in cycles 6925.32
size 5.  v2: time in cycles 2123.49
size 5.  v3: time in cycles 1067.43
size 6.  v1: time in cycles 31216.7
size 6.  v2: time in cycles 3521.98
size 6.  v3: time in cycles 2523.74
[innocent@vinavx0 bugs48]$ c++ -Ofast smatrix.ii -march=native -flto; taskset
-c 2 ./a.out
size 5.  v1: time in cycles 6367.09
size 5.  v2: time in cycles 1181.97
size 5.  v3: time in cycles 1194.82
size 6.  v1: time in cycles 34811.5
size 6.  v2: time in cycles 1909.71
size 6.  v3: time in cycles 1803.48

of course inlining also the case "  v1" would be even better !(the code is
equivalent to v2 and v3)

I've some other more complex functions where inline is "different" than 4.7.2
but not necessarily better
will try to cut a test case


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-11-12 13:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-18  6:29 [Bug lto/54966] New: Does LTO requires a larger inline-unit-growth? vincenzo.innocente at cern dot ch
2012-10-18  7:47 ` [Bug lto/54966] " dominiq at lps dot ens.fr
2012-10-18  9:44 ` rguenth at gcc dot gnu.org
2012-10-19  8:37 ` vincenzo.innocente at cern dot ch
2012-10-23 13:59 ` hubicka at ucw dot cz
2012-10-23 14:02 ` rguenther at suse dot de
2012-10-23 14:13 ` hubicka at ucw dot cz
2012-11-08 16:44 ` hubicka at gcc dot gnu.org
2012-11-09  6:39 ` vincenzo.innocente at cern dot ch
2012-11-09  6:52 ` vincenzo.innocente at cern dot ch
2012-11-09 11:33 ` vincenzo.innocente at cern dot ch
2012-11-12 13:20 ` vincenzo.innocente at cern dot ch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).