[Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
@ 2005-10-29 22:36 ` steven at gcc dot gnu dot org
  2005-10-31  0:37 ` mmitchel at gcc dot gnu dot org
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-10-29 22:36 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #26 from steven at gcc dot gnu dot org  2005-10-29 22:36 -------
Waiting for someone to look into this...


-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |WAITING


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
  2005-10-29 22:36 ` [Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much steven at gcc dot gnu dot org
@ 2005-10-31  0:37 ` mmitchel at gcc dot gnu dot org
  2005-10-31 18:36 ` hubicka at gcc dot gnu dot org
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-10-31  0:37 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #27 from mmitchel at gcc dot gnu dot org  2005-10-31 00:37 -------
It seems unlikely to me that this is going to be release-critical, so I've
downgraded it to P4.

Our inlining heuristics are notoriously easy to perturb.  Probably, to do
substantially better, we'll need a more sophisticated cost model; or, as Jan
suggests, actually try inlining things see how much that helps.  I'm absolutely
certain that for every release we'll have at least one program that was inlined
worse than the previous release; the question is how we're doing overall, and
whether we're being particularly stupid.  So, I think the question is if we're
actually being particularly stupid on this test case.

-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P4

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
  2005-10-29 22:36 ` [Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much steven at gcc dot gnu dot org
  2005-10-31  0:37 ` mmitchel at gcc dot gnu dot org
@ 2005-10-31 18:36 ` hubicka at gcc dot gnu dot org
  2005-10-31 19:15 ` hubicka at gcc dot gnu dot org
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2005-10-31 18:36 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #28 from hubicka at gcc dot gnu dot org  2005-10-31 18:36 -------
I get 0m8.052s on 3.4 and 0m8.127s on mainline on Athlon.  This hardly counts
as an regression.
This is actually effect of some cost tweaks we did relatie to gimplifier a
while ago.
Reduced testcase fits in limits now too.
Unlike Mark I would not claim that our inliner heuristics are actually more
notorious than any others ;))


-- 

hubicka at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2005-10-31 18:36 ` hubicka at gcc dot gnu dot org
@ 2005-10-31 19:15 ` hubicka at gcc dot gnu dot org
  2005-11-24 17:34 ` [Bug tree-optimization/17863] [4.0/4.1/4.2 " phython at gcc dot gnu dot org
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2005-10-31 19:15 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #29 from hubicka at gcc dot gnu dot org  2005-10-31 19:15 -------
Actually I have to reopen this.  When playing around on pentiumM or opteron, I
still get roughly 20% regression (6s to 8s), 4.1 and 4.0 scores are about the
same on both machines. For some reason this don't reproduce on my Athlon.
This however don't seem to be related to inlining limits as ramping them up so
everything gets inlined still won't make the testcase work better, so someone
would probably have to oprofile it and figure out what is still regressing.
I think it is non-critical tough.


-- 

hubicka at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1/4.2 Regression] threefold performance loss, not inlining as much
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2005-10-31 19:15 ` hubicka at gcc dot gnu dot org
@ 2005-11-24 17:34 ` phython at gcc dot gnu dot org
  2006-02-09 23:14 ` [Bug tree-optimization/17863] [4.0/4.1/4.2 Regression] performance loss (not inlining as much??) steven at gcc dot gnu dot org
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: phython at gcc dot gnu dot org @ 2005-11-24 17:34 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #30 from phython at gcc dot gnu dot org  2005-11-24 17:34 -------
On powerpc-linux, I get the following timings:
Using the following command line:
g++ -O3 -o t41 -mcpu=7450 -mtune=7450 pr17863.cc -static


    real          user
3.4 0m11.761s   0m11.148s
4.0 0m10.196s   0m9.495s
4.1 0m17.824s   0m16.832s
4.1 0m11.547s   0m10.502s -- With attribute flatten


-- 

phython at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2004-12-24 20:36:17         |2005-11-24 17:34:11
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1/4.2 Regression] performance loss (not inlining as much??)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2005-11-24 17:34 ` [Bug tree-optimization/17863] [4.0/4.1/4.2 " phython at gcc dot gnu dot org
@ 2006-02-09 23:14 ` steven at gcc dot gnu dot org
  2006-03-11  3:20 ` mmitchel at gcc dot gnu dot org
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: steven at gcc dot gnu dot org @ 2006-02-09 23:14 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #31 from steven at gcc dot gnu dot org  2006-02-09 23:14 -------
Someone will have to investigate this one better.

According to comment #29, inlining is no longer a problem so the bug subject
line was no longer correct.


-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|stevenb at suse dot de      |steven at gcc dot gnu dot
                   |                            |org
   Last reconfirmed|2005-11-24 17:34:11         |2006-02-09 23:14:56
               date|                            |
            Summary|[4.0/4.1/4.2 Regression]    |[4.0/4.1/4.2 Regression]
                   |threefold performance loss, |performance loss (not
                   |not inlining as much        |inlining as much??)


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1/4.2 Regression] performance loss (not inlining as much??)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2006-02-09 23:14 ` [Bug tree-optimization/17863] [4.0/4.1/4.2 Regression] performance loss (not inlining as much??) steven at gcc dot gnu dot org
@ 2006-03-11  3:20 ` mmitchel at gcc dot gnu dot org
  2007-01-18  3:06 ` [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 " gdr at gcc dot gnu dot org
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2006-03-11  3:20 UTC (permalink / raw)
  To: gcc-bugs



-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.0.3                       |4.0.4


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as much??)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2006-03-11  3:20 ` mmitchel at gcc dot gnu dot org
@ 2007-01-18  3:06 ` gdr at gcc dot gnu dot org
  2007-01-21 21:49 ` pinskia at gcc dot gnu dot org
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: gdr at gcc dot gnu dot org @ 2007-01-18  3:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #32 from gdr at gcc dot gnu dot org  2007-01-18 03:06 -------
No fix will happen for GCC-4.0.x


-- 

gdr at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.0.4                       |---


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as much??)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2007-01-18  3:06 ` [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 " gdr at gcc dot gnu dot org
@ 2007-01-21 21:49 ` pinskia at gcc dot gnu dot org
  2007-02-14  9:23 ` mmitchel at gcc dot gnu dot org
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-01-21 21:49 UTC (permalink / raw)
  To: gcc-bugs



-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.1.2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as much??)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2007-01-21 21:49 ` pinskia at gcc dot gnu dot org
@ 2007-02-14  9:23 ` mmitchel at gcc dot gnu dot org
  2007-12-18 20:02 ` steven at gcc dot gnu dot org
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2007-02-14  9:23 UTC (permalink / raw)
  To: gcc-bugs



-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.1.2                       |4.1.3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as much??)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2007-02-14  9:23 ` mmitchel at gcc dot gnu dot org
@ 2007-12-18 20:02 ` steven at gcc dot gnu dot org
  2008-01-30 17:35 ` hubicka at gcc dot gnu dot org
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: steven at gcc dot gnu dot org @ 2007-12-18 20:02 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #33 from steven at gcc dot gnu dot org  2007-12-18 20:02 -------
Honza, since you re-opened this, perhaps you can give new timings?


-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |hubicka at gcc dot gnu dot
                   |dot org                     |org
             Status|REOPENED                    |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as much??)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2007-12-18 20:02 ` steven at gcc dot gnu dot org
@ 2008-01-30 17:35 ` hubicka at gcc dot gnu dot org
  2008-01-30 18:27 ` hubicka at gcc dot gnu dot org
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-01-30 17:35 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #34 from hubicka at gcc dot gnu dot org  2008-01-30 17:13 -------
hubicka@occam:/aux/hubicka/trunk-write/buidl2$ time ./test-3.4

real    0m5.692s
user    0m5.588s
sys     0m0.012s
hubicka@occam:/aux/hubicka/trunk-write/buidl2$ time ./test-3.4

real    0m5.536s
user    0m5.492s
sys     0m0.016s
hubicka@occam:/aux/hubicka/trunk-write/buidl2$ time ./a.out

real    0m7.438s
user    0m7.388s
sys     0m0.024s
hubicka@occam:/aux/hubicka/trunk-write/buidl2$ time ./a.out

real    0m7.392s
user    0m7.336s
sys     0m0.008s

This is with current mainline and -O2 -march=athlon-xp.  I had to remove two
"static" keywords in explicit template instantiation.
So we still regress here.

Honza


-- 

hubicka at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2006-02-09 23:14:56         |2008-01-30 17:13:54
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as much??)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2008-01-30 17:35 ` hubicka at gcc dot gnu dot org
@ 2008-01-30 18:27 ` hubicka at gcc dot gnu dot org
  2008-01-30 18:34 ` hubicka at gcc dot gnu dot org
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-01-30 18:27 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #35 from hubicka at gcc dot gnu dot org  2008-01-30 17:58 -------
So for more proper analysis. The testcase is quite challenging for inlining
heuristics and by introducing early inlining and reducing call cost we now
inline less that we used to at a time I claimed that we inline everything. 
However making inlined everything again is still not solving the problem.

For inline decisions, the problematic bit seems to be accu1 and friends.  They
are templates using easier templates of same form.  For n=1:
double accu1(const double*, const double*) [with int n = 0] (p1, p2)
{
  double D.4655;
  double D.4654;
  double D.4653;

<bb 2>:
  D.4654_2 = *p1_1(D);
  D.4655_4 = *p2_3(D);
  D.4653_5 = D.4654_2 * D.4655_4;
  return D.4653_5;

}

With n>1 we simply copy the body few times:
double accu1(const double*, const double*) [with int n = 1] (p1, p2)
{
  double D.17506;
  double D.17507;
  double D.17505;
  double D.17505;
  double d;
  double D.6664;
  double D.6663;
  double D.6662;

<bb 2>:
  D.6662_2 = *p1_1(D);
  D.6663_4 = *p2_3(D);
  d_5 = D.6662_2 * D.6663_4;
  p2_6 = p2_3(D) + 8;
  p1_7 = p1_1(D) + 8;
  D.17506_11 = *p1_7;
  D.17507_12 = *p2_6;
  D.17505_13 = D.17506_11 * D.17507_12;
  D.6664_9 = d_5 + D.17505_13;
  return D.6664_9;

}
Early inlinier handles this well until the function grows up, that happens on
n=4 and for n=5 we end up not inlining:
double accu1(const double*, const double*) [with int n = 5] (p1, p2)
{
  double d;
  double D.6697;
  double D.6696;
  double D.6695;
  double D.6694;

<bb 2>:
  D.6694_2 = *p1_1(D);
  D.6695_4 = *p2_3(D);
  d_5 = D.6694_2 * D.6695_4;
  p2_6 = p2_3(D) + 8;
  p1_7 = p1_1(D) + 8;
  D.6697_8 = accu1 (p1_7, p2_6);
  D.6696_9 = D.6697_8 + d_5;
  return D.6696_9;

}
This is as expected, for n=4 the code is definitely longer than call sequence,
having 4 FP multiples, 4 adds, 8 loads, I don't think simple heuristic can
resonably expect it to simplify.

We inline these functions later in late inlining as expected, but since there
are just too many calls of them, we end up eventually on large function and
large unit limits.

Now to get everything inlined one needs --param inline-call-cost=9999 --param
max-inline-insns-single=999999 (the second is needed for DCubuc::DCubic that is
just big IMO).

Now with this:
hubicka@occam:/aux/hubicka/trunk-write/buidl2$ time
/aux/hubicka/gcc-install/bin/g++  -O3 ttest.cc  -fpermissive --static
-march=athlon-xp  -Winline --param inline-call-cost=9999 --param
max-inline-insns-single=999999
ttest.cc: In function 'void testv4c()':
ttest.cc:21: warning: inlining failed in call to 'tcdata::tcdata()': --param
inline-unit-growth limit reached
ttest.cc:468: warning: called from here

real    1m0.934s
user    0m59.736s
sys     0m1.204s
hubicka@occam:/aux/hubicka/trunk-write/buidl2$ time ./a.out

real    0m7.055s
user    0m7.052s
sys     0m0.000s

We still have long way to GCC 3-4 perfomrance (5s, see my previous post).  I
suspect that alising simply give up. Setting inline-call-cost to 1 (the other
extreme) leads to 6.9s.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as much??)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2008-01-30 18:27 ` hubicka at gcc dot gnu dot org
@ 2008-01-30 18:34 ` hubicka at gcc dot gnu dot org
  2008-01-30 20:41 ` stevenb dot gcc at gmail dot com
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-01-30 18:34 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #36 from hubicka at gcc dot gnu dot org  2008-01-30 18:14 -------
Looking at the .optimized dump, one obvious problem is that we keep a lot of
pointer arithmetic that should be forward propagated:
<L147>:;
  D.184420 = *pz;
  p1 = pz + 8;
  D.184422 = *p1;
  p1 = p1 + 8;
  D.184424 = *p1;
  p1 = p1 + 8;
  D.184426 = *p1;
  p1 = p1 + 8;
  D.184428 = *p1;
  p1 = p1 + 8;
  D.184430 = *p1;
  p1 = p1 + 8;
  D.184432 = *p1;
  D.184434 = *(p1 + 8);

Those seems to be all just array manipulations.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as much??)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2008-01-30 18:34 ` hubicka at gcc dot gnu dot org
@ 2008-01-30 20:41 ` stevenb dot gcc at gmail dot com
  2008-01-31  1:13 ` hubicka at ucw dot cz
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: stevenb dot gcc at gmail dot com @ 2008-01-30 20:41 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #37 from stevenb dot gcc at gmail dot com  2008-01-30 20:13 -------
Subject: Re:  [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as
much??)

> Those seems to be all just array manipulations.

AFAICT, they are exactly in the form that some targets like it (e.g.
auto-inc/dec and SMALL_REGISTER_CLASS targets).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as much??)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (14 preceding siblings ...)
  2008-01-30 20:41 ` stevenb dot gcc at gmail dot com
@ 2008-01-31  1:13 ` hubicka at ucw dot cz
  2008-01-31 10:00 ` rguenther at suse dot de
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: hubicka at ucw dot cz @ 2008-01-31  1:13 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #38 from hubicka at ucw dot cz  2008-01-30 23:19 -------
Subject: Re:  [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as
much??)

> AFAICT, they are exactly in the form that some targets like it (e.g.
> auto-inc/dec and SMALL_REGISTER_CLASS targets).

Yep, but all the pointer arithmetic makes us not to realize we are doing
quite simple manipulations with array and propagate load/stores through.
CSE undoes this later in the game, so we end up with normal offsetted
addressing. Doing it earlier should make load/store elimination happier.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as much??)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (15 preceding siblings ...)
  2008-01-31  1:13 ` hubicka at ucw dot cz
@ 2008-01-31 10:00 ` rguenther at suse dot de
  2008-02-01 16:48 ` hubicka at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: rguenther at suse dot de @ 2008-01-31 10:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #39 from rguenther at suse dot de  2008-01-31 09:39 -------
Subject: Re:  [4.0/4.1/4.2/4.3 Regression]
 performance loss (not inlining as much??)

On Wed, 30 Jan 2008, stevenb dot gcc at gmail dot com wrote:

> 
> 
> ------- Comment #37 from stevenb dot gcc at gmail dot com  2008-01-30 20:13 -------
> Subject: Re:  [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as
> much??)
> 
> > Those seems to be all just array manipulations.
> 
> AFAICT, they are exactly in the form that some targets like it (e.g.
> auto-inc/dec and SMALL_REGISTER_CLASS targets).

They also don't operate on arrays, so we cannot use ARRAY_REF in the
IL.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as much??)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (16 preceding siblings ...)
  2008-01-31 10:00 ` rguenther at suse dot de
@ 2008-02-01 16:48 ` hubicka at gcc dot gnu dot org
  2008-02-01 17:46 ` amacleod at redhat dot com
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-01 16:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #40 from hubicka at gcc dot gnu dot org  2008-02-01 16:47 -------
Well, I still meant that simplifying the cascaded addition into accumulator
into direct addition from base makes the code to simplify. I implemented
experimentally the trick in fwprop and will attach later, but the patch itself
doesn't help.

What happens is now obvious.  For sequence like:

  D.185211 = *py.4861;
  d = D.180590 * D.185211;
  p1 = py.4861 + 8;
  D.185213 = *p1;
  d = D.180606 * D.185213;
  p1 = py.4861 + 16;
  D.185215 = *p1;
  d = D.180642 * D.185215;
  p1 = py.4861 + 24;
  D.185217 = *p1;
  d = D.180728 * D.185217;
  p1 = py.4861 + 32;
  D.185219 = *p1;
  d = D.180888 * D.185219;
  p1 = py.4861 + 40;
  D.185221 = *p1;
  d = D.181157 * D.185221;
  p1 = py.4861 + 48;
  D.185223 = *p1;
  D.185098 = D.181571 * D.185223;
  D.185094 = d + D.185098;
  D.185090 = d + D.185094;
  D.185086 = d + D.185090;
  D.185082 = d + D.185086;
  D.185078 = d + D.185082;
  D.185074 = d + D.185078;
  D.185210 = *pz;
  d = D.185074 * D.185210;
  py = pz + 8; 
  d = D.180606 * D.185211;
  d = D.180642 * D.185213;
  d = D.180728 * D.185215;
  d = D.180888 * D.185217;
  d = D.181157 * D.185219;
  d = D.181571 * D.185221;
  D.185130 = D.182177 * D.185223;
  D.185126 = d + D.185130;
  D.185122 = d + D.185126;
  D.185118 = d + D.185122;
  D.185114 = d + D.185118;
  D.185110 = d + D.185114;
  D.185106 = d + D.185110;
  D.185225 = *py;
  d = D.185106 * D.185225;
  py = pz + 16;
  d = D.180642 * D.185211;
  d = D.180728 * D.185213;
  d = D.180888 * D.185215;
  d = D.181157 * D.185217;
  d = D.181571 * D.185219;
  d = D.182177 * D.185221;
  D.185162 = D.183023 * D.185223;
  D.185158 = d + D.185162;
  D.185154 = d + D.185158;
  D.185150 = d + D.185154;
  D.185146 = d + D.185150;
  D.185142 = d + D.185146;
  D.185138 = d + D.185142;
  D.185240 = *py;
  d = D.185138 * D.185240;
  py = pz + 24;
  d = D.180728 * D.185211;
  d = D.180888 * D.185213;
  d = D.181157 * D.185215;
  d = D.181571 * D.185217;
  d = D.182177 * D.185219;
  d = D.183023 * D.185221;
  D.185194 = D.184168 * D.185223;
  D.185190 = d + D.185194;
  D.185186 = d + D.185190;
  D.185182 = d + D.185186;
  D.185178 = d + D.185182;
  D.185174 = d + D.185178;
  D.185170 = d + D.185174;
  D.185255 = *py;
  d = D.185170 * D.185255;
  D.185134 = d + d;
  D.185102 = d + D.185134;
  D.185195 = d + D.185102;
  *ap1.4607 = D.185195;
  if (z1.4734 == 0)
    goto <bb 339> (<L351>);
  else
    goto <bb 144>;

that are accumulating values from array into few variables, TER merges all the
arithmetic into single giant expression leaving the loads in the front of it.

<L262>:;
  D.197135 = *pz;
  D.197137 = *(pz + 8);
  D.197139 = *(pz + 16);
  D.197141 = *(pz + 24);
  D.197143 = *(pz + 32);
  D.197145 = *(pz + 40);
  D.197147 = *(pz + 48);
  D.197149 = *(pz + 56);
  D.197151 = *(pz + 64);
  D.197153 = *(pz + 72);
  D.197155 = *(pz + 80);
  D.197157 = *(pz + 88);
  D.197159 = *(pz + 96);
  *ap1.4658 = (D.180590 * D.197135 + (D.180606 * D.197137 + (D.180642 *
D.197139 + (D.180728 * D.197141 + (D.180888 * D.197143 + (D.181157 * D.197145 +
(D.181571 * D.197147 + (D.182177 * D.197149 + (D.183023 * D.197151 + (D.184168
* D.197153 + (D.185672 * D.197155 + (D.187606 * D.197157 + D.190042 *
D.197159)))))))))))) * *py.4912 + ((D.180606 * D.197135 + (D.180642 * D.197137
+ (D.180728 * D.197139 + (D.180888 * D.197141 + (D.181157 * D.197143 +
(D.181571 * D.197145 + (D.182177 * D.197147 + (D.183023 * D.197149 + (D.184168
* D.197151 + (D.185672 * D.197153 + (D.187606 * D.197155 + (D.190042 * D.197157
+ D.193063 * D.197159)))))))))))) * *(py.4912 + 8) + (D.180642 * D.197135 +
(D.180728 * D.197137 + (D.180888 * D.197139 + (D.181157 * D.197141 + (D.181571
* D.197143 + (D.182177 * D.197145 + (D.183023 * D.197147 + (D.184168 * D.197149
+ (D.185672 * D.197151 + (D.187606 * D.197153 + (D.190042 * D.197155 +
(D.193063 * D.197157 + D.196753 * D.197159)))))))))))) * *(py.4912 + 16));
  if (z1.4780 == 0)
    goto <bb 339> (<L351>);
  else
    goto <bb 251>;


With the patch for fwprop and -fno-tree-ter I get 5.1s, that is same as in pre
GCC-4.0.  Why TER is not placing loads into expressions at first place?  This
seems like quite common pattern to kill register pressure to me.

I have to leave but will play with it further, try if fwprop patch is needed
and polish it.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as much??)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (17 preceding siblings ...)
  2008-02-01 16:48 ` hubicka at gcc dot gnu dot org
@ 2008-02-01 17:46 ` amacleod at redhat dot com
  2008-02-01 22:34 ` [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (TER register presure and inlining limits problems) hubicka at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: amacleod at redhat dot com @ 2008-02-01 17:46 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #41 from amacleod at redhat dot com  2008-02-01 17:46 -------
TER will not replace any load into an expression if there is more than one use
of the load. Your sample shows multiple uses of each load. If it did this
substitution, it could be introducing worse code, it doesn't know.   (TER is
also strictly a single block replacement as well).

A stand alone register pressure analysis could determine that those loads
should all be substituted because pressure is too high on an 8 register
machine. If there were 128 register available however, then it may not want to
substitute the loads. TER just doesn't have that kind of information.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (TER register presure and inlining limits problems)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (18 preceding siblings ...)
  2008-02-01 17:46 ` amacleod at redhat dot com
@ 2008-02-01 22:34 ` hubicka at gcc dot gnu dot org
  2008-02-01 22:45 ` hubicka at ucw dot cz
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-01 22:34 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #42 from hubicka at gcc dot gnu dot org  2008-02-01 22:33 -------
So for summary. 

With TER disabled, I get 6.2s, so we are still worse than 3.4 that is 5.6s. 

With call-cost inline parameter increased and TER disabled, I get 5.3s.

With forwprop fix, ter disabled and inline parameter increased, I get 5.2s. 

Forwprop alone we get 7.1s.

WIth forwprop and TER disabled is 5.8s.

Other combinations brings no difference. TER increasing register pressure is
major offender here masking the other improvements.

I don't see how to track the TER and inlining limits issue with current
organization of compiler. It is probably GCC-4.4 thing if we get lucky.

Honza


-- 

hubicka at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|hubicka at gcc dot gnu dot  |unassigned at gcc dot gnu
                   |org                         |dot org
             Status|ASSIGNED                    |NEW
            Summary|[4.0/4.1/4.2/4.3 Regression]|[4.0/4.1/4.2/4.3 Regression]
                   |performance loss (not       |performance loss (TER
                   |inlining as much??)         |register presure and
                   |                            |inlining limits problems)


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (TER register presure and inlining limits problems)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (19 preceding siblings ...)
  2008-02-01 22:34 ` [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (TER register presure and inlining limits problems) hubicka at gcc dot gnu dot org
@ 2008-02-01 22:45 ` hubicka at ucw dot cz
  2008-02-01 22:55 ` [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (register pressure due to RA sucking " pinskia at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: hubicka at ucw dot cz @ 2008-02-01 22:45 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #43 from hubicka at ucw dot cz  2008-02-01 22:45 -------
Subject: Re:  [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as
much??)

> TER will not replace any load into an expression if there is more than one use
> of the load. Your sample shows multiple uses of each load. If it did this
> substitution, it could be introducing worse code, it doesn't know.   (TER is
> also strictly a single block replacement as well).

I noticed that now too.   The code after reordering by TER simply need
even more registers alive by changling how the temporaries overlap.
There is probably no simple heuristics to control this...

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (register pressure due to RA sucking and inlining limits problems)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (20 preceding siblings ...)
  2008-02-01 22:45 ` hubicka at ucw dot cz
@ 2008-02-01 22:55 ` pinskia at gcc dot gnu dot org
  2008-07-04 16:34 ` [Bug tree-optimization/17863] [4.2/4.3/4.4 Regression] performance loss performance loss (TER register presure " jsm28 at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-02-01 22:55 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #44 from pinskia at gcc dot gnu dot org  2008-02-01 22:55 -------
(In reply to comment #42)
> I don't see how to track the TER and inlining limits issue with current
> organization of compiler. It is probably GCC-4.4 thing if we get lucky.

I rather not have TER changed, because I will make a mention the register
pressure issues is really because our RA is not able to deal with it correctly.
 Also people can write their code that way.  TER in some cases can be thought
about a scheduler and I will tell you pushing loads further up helps targets
like PowerPC and others.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.2/4.3/4.4 Regression] performance loss performance loss (TER register presure and inlining limits problems)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (21 preceding siblings ...)
  2008-02-01 22:55 ` [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (register pressure due to RA sucking " pinskia at gcc dot gnu dot org
@ 2008-07-04 16:34 ` jsm28 at gcc dot gnu dot org
  2009-02-08 11:50 ` [Bug tree-optimization/17863] [4.2/4.3/4.4 Regression] " hubicka at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: jsm28 at gcc dot gnu dot org @ 2008-07-04 16:34 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #45 from jsm28 at gcc dot gnu dot org  2008-07-04 16:33 -------
Closing 4.1 branch.


-- 

jsm28 at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[4.1/4.2/4.3/4.4 Regression]|[4.2/4.3/4.4 Regression]
                   |performance loss performance|performance loss performance
                   |loss (TER register presure  |loss (TER register presure
                   |and inlining limits         |and inlining limits
                   |problems)                   |problems)
   Target Milestone|4.1.3                       |4.2.5


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.2/4.3/4.4 Regression] performance loss (TER register presure and inlining limits problems)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (22 preceding siblings ...)
  2008-07-04 16:34 ` [Bug tree-optimization/17863] [4.2/4.3/4.4 Regression] performance loss performance loss (TER register presure " jsm28 at gcc dot gnu dot org
@ 2009-02-08 11:50 ` hubicka at gcc dot gnu dot org
  2009-03-31 16:20 ` [Bug tree-optimization/17863] [4.3/4.4/4.5 " jsm28 at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2009-02-08 11:50 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #46 from hubicka at gcc dot gnu dot org  2009-02-08 11:50 -------
With new-RA we seem to do better on this testcase now:

hubicka@occam:~$ time ./a.out-3.4

real    0m5.448s
user    0m5.440s
sys     0m0.012s
hubicka@occam:~$ time ./a.out

real    0m5.834s
user    0m5.836s
sys     0m0.000s
hubicka@occam:~$ 

still there is small regression.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.3/4.4/4.5 Regression] performance loss (TER register presure and inlining limits problems)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (23 preceding siblings ...)
  2009-02-08 11:50 ` [Bug tree-optimization/17863] [4.2/4.3/4.4 Regression] " hubicka at gcc dot gnu dot org
@ 2009-03-31 16:20 ` jsm28 at gcc dot gnu dot org
  2009-08-04 12:27 ` rguenth at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 28+ messages in thread
From: jsm28 at gcc dot gnu dot org @ 2009-03-31 16:20 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #47 from jsm28 at gcc dot gnu dot org  2009-03-31 16:20 -------
Closing 4.2 branch.


-- 

jsm28 at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[4.2/4.3/4.4/4.5 Regression]|[4.3/4.4/4.5 Regression]
                   |performance loss (TER       |performance loss (TER
                   |register presure and        |register presure and
                   |inlining limits problems)   |inlining limits problems)
   Target Milestone|4.2.5                       |4.3.4


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.3/4.4/4.5 Regression] performance loss (TER register presure and inlining limits problems)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (24 preceding siblings ...)
  2009-03-31 16:20 ` [Bug tree-optimization/17863] [4.3/4.4/4.5 " jsm28 at gcc dot gnu dot org
@ 2009-08-04 12:27 ` rguenth at gcc dot gnu dot org
  2009-12-28 15:40 ` steven at gcc dot gnu dot org
  2010-03-21 12:13 ` steven at gcc dot gnu dot org
  27 siblings, 0 replies; 28+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-08-04 12:27 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #48 from rguenth at gcc dot gnu dot org  2009-08-04 12:26 -------
GCC 4.3.4 is being released, adjusting target milestone.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.3.4                       |4.3.5


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.3/4.4/4.5 Regression] performance loss (TER register presure and inlining limits problems)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (25 preceding siblings ...)
  2009-08-04 12:27 ` rguenth at gcc dot gnu dot org
@ 2009-12-28 15:40 ` steven at gcc dot gnu dot org
  2010-03-21 12:13 ` steven at gcc dot gnu dot org
  27 siblings, 0 replies; 28+ messages in thread
From: steven at gcc dot gnu dot org @ 2009-12-28 15:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #49 from steven at gcc dot gnu dot org  2009-12-28 15:40 -------
To make the test case work, I had to solve two errors by removing "static"
keywords:

ttest.cc:105: error: explicit template specialization cannot have a storage
class
ttest.cc:117: error: explicit template specialization cannot have a storage
class

With that fixed, I timed the compiled binaries for x86_64 and for i386


Compiled for x86_64 (with "g++-4.5.0 -O3 ttest.cc -static -fpermissive -o
ttest45" etc.):

stevenb@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest36 ; done

real    0m4.238s
user    0m4.210s
sys     0m0.030s

real    0m4.209s
user    0m4.190s
sys     0m0.000s

real    0m4.193s
user    0m4.170s
sys     0m0.010s
stevenb@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest41 ; done

real    0m3.733s
user    0m3.720s
sys     0m0.010s

real    0m3.632s
user    0m3.620s
sys     0m0.000s

real    0m3.662s
user    0m3.630s
sys     0m0.010s
stevenb@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest42 ; done

real    0m3.292s
user    0m3.260s
sys     0m0.020s

real    0m3.338s
user    0m3.300s
sys     0m0.010s

real    0m3.264s
user    0m3.260s
sys     0m0.010s
stevenb@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest43 ; done

real    0m3.515s
user    0m3.500s
sys     0m0.020s

real    0m3.463s
user    0m3.420s
sys     0m0.000s

real    0m3.518s
user    0m3.490s
sys     0m0.000s
stevenb@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest44 ; done

real    0m3.467s
user    0m3.420s
sys     0m0.010s

real    0m3.378s
user    0m3.380s
sys     0m0.000s

real    0m3.434s
user    0m3.400s
sys     0m0.000s
stevenb@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest45 ; done

real    0m0.284s
user    0m0.280s
sys     0m0.000s

real    0m0.202s
user    0m0.180s
sys     0m0.000s

real    0m0.183s
user    0m0.180s
sys     0m0.000s




Compiled for i386 (with "g++-4.5.0 -O3 -m32 -march=pentium4 ttest.cc -static
-fpermissive -o ttest45" etc.):

stevenb@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest36 ; done

real    0m4.092s
user    0m4.080s
sys     0m0.010s

real    0m3.954s
user    0m3.940s
sys     0m0.020s

real    0m3.988s
user    0m3.970s
sys     0m0.010s
stevenb@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest42 ; done

real    0m5.818s
user    0m5.810s
sys     0m0.010s

real    0m5.828s
user    0m5.770s
sys     0m0.030s

real    0m5.813s
user    0m5.790s
sys     0m0.000s
stevenb@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest43 ; done

real    0m5.379s
user    0m5.360s
sys     0m0.010s

real    0m5.419s
user    0m5.370s
sys     0m0.030s

real    0m5.382s
user    0m5.360s
sys     0m0.010s
stevenb@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest44 ; done

real    0m4.430s
user    0m4.410s
sys     0m0.020s

real    0m4.433s
user    0m4.390s
sys     0m0.010s

real    0m4.389s
user    0m4.380s
sys     0m0.000s
stevenb@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest45 ; done

real    0m0.230s
user    0m0.220s
sys     0m0.010s

real    0m0.236s
user    0m0.220s
sys     0m0.000s

real    0m0.216s
user    0m0.210s
sys     0m0.000s


So GCC 4.4 with -m32 still has a ~10% performance regression compared to CC
3.4, but GCC 4.5 appears to optimize the test case away (but I am not sure that
the result is correct -- how to check for correctness?).

For -m64 (x86-64), all GCC4 versions are better than GCC 3.4, and GCC 4.2 gives
the best performance.

Reconfirmed for 32-bits x86, then.


-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2008-01-30 17:13:54         |2009-12-28 15:40:33
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug tree-optimization/17863] [4.3/4.4/4.5 Regression] performance loss (TER register presure and inlining limits problems)
       [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
                   ` (26 preceding siblings ...)
  2009-12-28 15:40 ` steven at gcc dot gnu dot org
@ 2010-03-21 12:13 ` steven at gcc dot gnu dot org
  27 siblings, 0 replies; 28+ messages in thread
From: steven at gcc dot gnu dot org @ 2010-03-21 12:13 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #50 from steven at gcc dot gnu dot org  2010-03-21 12:13 -------
Performance loss within acceptable limits (by the "you give some, you take
some" principle). GCC 4.5 optimizes the test case away completely. I see no
reason to do anything more here. Fixed for GCC 4.5 and GCC 4.4. Won't fix for
GCC 4.3.


-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2010-03-21 12:13 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-17863-2109@http.gcc.gnu.org/bugzilla/>
2005-10-29 22:36 ` [Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much steven at gcc dot gnu dot org
2005-10-31  0:37 ` mmitchel at gcc dot gnu dot org
2005-10-31 18:36 ` hubicka at gcc dot gnu dot org
2005-10-31 19:15 ` hubicka at gcc dot gnu dot org
2005-11-24 17:34 ` [Bug tree-optimization/17863] [4.0/4.1/4.2 " phython at gcc dot gnu dot org
2006-02-09 23:14 ` [Bug tree-optimization/17863] [4.0/4.1/4.2 Regression] performance loss (not inlining as much??) steven at gcc dot gnu dot org
2006-03-11  3:20 ` mmitchel at gcc dot gnu dot org
2007-01-18  3:06 ` [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 " gdr at gcc dot gnu dot org
2007-01-21 21:49 ` pinskia at gcc dot gnu dot org
2007-02-14  9:23 ` mmitchel at gcc dot gnu dot org
2007-12-18 20:02 ` steven at gcc dot gnu dot org
2008-01-30 17:35 ` hubicka at gcc dot gnu dot org
2008-01-30 18:27 ` hubicka at gcc dot gnu dot org
2008-01-30 18:34 ` hubicka at gcc dot gnu dot org
2008-01-30 20:41 ` stevenb dot gcc at gmail dot com
2008-01-31  1:13 ` hubicka at ucw dot cz
2008-01-31 10:00 ` rguenther at suse dot de
2008-02-01 16:48 ` hubicka at gcc dot gnu dot org
2008-02-01 17:46 ` amacleod at redhat dot com
2008-02-01 22:34 ` [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (TER register presure and inlining limits problems) hubicka at gcc dot gnu dot org
2008-02-01 22:45 ` hubicka at ucw dot cz
2008-02-01 22:55 ` [Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (register pressure due to RA sucking " pinskia at gcc dot gnu dot org
2008-07-04 16:34 ` [Bug tree-optimization/17863] [4.2/4.3/4.4 Regression] performance loss performance loss (TER register presure " jsm28 at gcc dot gnu dot org
2009-02-08 11:50 ` [Bug tree-optimization/17863] [4.2/4.3/4.4 Regression] " hubicka at gcc dot gnu dot org
2009-03-31 16:20 ` [Bug tree-optimization/17863] [4.3/4.4/4.5 " jsm28 at gcc dot gnu dot org
2009-08-04 12:27 ` rguenth at gcc dot gnu dot org
2009-12-28 15:40 ` steven at gcc dot gnu dot org
2010-03-21 12:13 ` steven at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).