public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/43884]  New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0
@ 2010-04-25  7:18 yuri at tsoft dot com
  2010-04-25 11:42 ` [Bug c++/43884] " steven at gcc dot gnu dot org
                   ` (19 more replies)
  0 siblings, 20 replies; 26+ messages in thread
From: yuri at tsoft dot com @ 2010-04-25  7:18 UTC (permalink / raw)
  To: gcc-bugs

I ran this simple example with the argument 45 through various versions of gcc
(option -O3):

#include <stdlib.h>
#include <stdio.h>

int fib(int AnArg) {
 if (AnArg <= 2) return (1);
 return (fib(AnArg-1)+fib(AnArg-2));
}

int main(int argc, char* argv[]) {
 int n = atoi(argv[1]);
 printf("fib(%i)=%i\n", n, fib(n));
}

Here are the average runtimes I got:
version    time
4.3.1      3.930s
4.3.2      3.500s
4.3.3      3.470s
4.4.1      3.930s
4.4.3      3.940s
4.5.0      3.860s

I ran ~10 samples so values are approximate, but it's quite obvious that 4.5.0
has very significant degradation compared to 4.3.3.

Is there a performance suite for gcc that is ran for each release, are results
available online?

This case is pretty simple, basic. I would expect gcc to produce quite optimal
code for it.


-- 
           Summary: Performance degradation of the simple example
                    (fibonacci) 4.3.3->4.5.0
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: yuri at tsoft dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug c++/43884] Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
@ 2010-04-25 11:42 ` steven at gcc dot gnu dot org
  2010-04-25 12:13 ` [Bug c++/43884] [4.4/4.5 Regression] Performance degradation for simple fibonacci numbers calculation steven at gcc dot gnu dot org
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: steven at gcc dot gnu dot org @ 2010-04-25 11:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from steven at gcc dot gnu dot org  2010-04-25 11:42 -------
You can compare the code if you compile with -S (output .s assembler file). Or
you can compile with -S and attach the output of both compilers here, so
someone else can have a look.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug c++/43884] [4.4/4.5 Regression] Performance degradation for simple fibonacci numbers calculation
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
  2010-04-25 11:42 ` [Bug c++/43884] " steven at gcc dot gnu dot org
@ 2010-04-25 12:13 ` steven at gcc dot gnu dot org
  2010-04-25 20:03 ` [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation due to extra stack alignment rguenth at gcc dot gnu dot org
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: steven at gcc dot gnu dot org @ 2010-04-25 12:13 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from steven at gcc dot gnu dot org  2010-04-25 12:13 -------
Confirmed on x86_64-linux by comparing gcc 4.3.3 vs. gcc 4.6.0 (r158482). The
average of 10 runs on each is 5.1s with gcc 4.3.3 vs. 5.7s for gcc 4.4.2, gcc
4.5.0 and gcc 4.6.0.

One interesting difference is that GCC 4.5 does not inline fib. But that
shouldn't have a big impact on performance. The generated code for fib is
completely different (maybe an IRA thing??), so it's hard to tell where the
slow-down comes from.


-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2010-04-25 12:13:24
               date|                            |
            Summary|Performance degradation of  |[4.4/4.5 Regression]
                   |the simple example          |Performance degradation for
                   |(fibonacci) 4.3.3->4.5.0    |simple fibonacci numbers
                   |                            |calculation


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation due to extra stack alignment
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
  2010-04-25 11:42 ` [Bug c++/43884] " steven at gcc dot gnu dot org
  2010-04-25 12:13 ` [Bug c++/43884] [4.4/4.5 Regression] Performance degradation for simple fibonacci numbers calculation steven at gcc dot gnu dot org
@ 2010-04-25 20:03 ` rguenth at gcc dot gnu dot org
  2010-04-25 20:07 ` rguenth at gcc dot gnu dot org
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-04-25 20:03 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from rguenth at gcc dot gnu dot org  2010-04-25 20:03 -------
Well, the innermost loop with current trunk is

.L3:
        leal    -1(%ebx), %eax
        subl    $2, %ebx
        movl    %eax, (%esp)
        call    fib
        addl    %eax, %esi
        cmpl    $2, %ebx
        jg      .L3

which is pretty much optimal.  The intel compiler doesn't detect the
tail-recursion (huh) but has multiple entry-points into the function
and uses register passing conventions for the recursions.

With -fwhole-program GCC does the same (or with static fib), and we
then end up with a program faster than what ICC produces (16s)
A 4.3 compiled version is indeed a bit faster (as fast as 4.4 on i?86, 15.4s).
A 4.1 compiled version is even faster (14.1s), the 3.4 baseline is 21.5s.

That's on i?86-linux, all -O2.

4.1 assembly, fib is not inlined:

fib:
        pushl   %esi
        pushl   %ebx
        movl    %eax, %ebx
        cmpl    $2, %ebx
        movl    $1, %eax
        jle     .L5
        xorl    %esi, %esi
        .p2align 4,,7
.L6:
        leal    -1(%ebx), %eax
        subl    $2, %ebx
        call    fib
        addl    %eax, %esi
        cmpl    $2, %ebx
        jg      .L6
        leal    1(%esi), %eax
.L5:
        popl    %ebx
        popl    %esi
        ret

trunk assembler:

fib:
        pushl   %esi
        pushl   %ebx
        movl    %eax, %ebx
        subl    $4, %esp
        cmpl    $2, %ebx
        movl    $1, %eax
        jle     .L2
        xorl    %esi, %esi
        .p2align 4,,7
        .p2align 3
.L3:
        leal    -1(%ebx), %eax
        subl    $2, %ebx
        call    fib
        addl    %eax, %esi
        cmpl    $2, %ebx
        jg      .L3
        leal    1(%esi), %eax
.L2:
        addl    $4, %esp
        popl    %ebx
        popl    %esi
        ret

where the only difference is different loop alignment and keeping the
stack 16-bytes aligned.  Indeed we get the same speed as 4.1 when
building with -mpreffered-stack-boundary=2.  Why do we bother to
keep the stack aligned for leaf functions?


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hjl at gcc dot gnu dot org,
                   |                            |hubicka at gcc dot gnu dot
                   |                            |org
          Component|c++                         |target
 GCC target triplet|                            |i?86-*-*
           Keywords|                            |missed-optimization
      Known to work|                            |4.1.3
            Summary|[4.4/4.5 Regression]        |[4.4/4.5/4.6 Regression]
                   |Performance degradation for |Performance degradation for
                   |simple fibonacci numbers    |simple fibonacci numbers
                   |calculation                 |calculation due to extra
                   |                            |stack alignment
   Target Milestone|---                         |4.4.4


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation due to extra stack alignment
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
                   ` (2 preceding siblings ...)
  2010-04-25 20:03 ` [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation due to extra stack alignment rguenth at gcc dot gnu dot org
@ 2010-04-25 20:07 ` rguenth at gcc dot gnu dot org
  2010-04-25 22:02 ` hjl dot tools at gmail dot com
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-04-25 20:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from rguenth at gcc dot gnu dot org  2010-04-25 20:06 -------
Btw, with the "optimal" options -O2 -fwhole-program -fomit-frame-pointer
-mpreferred-stack-boundary=2 GCC 4.3 and 4.4 are slower than 4.1 and 4.5
(14.3s vs. 13.8s).  The extra stack alignment drops us to 16.4s(!).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation due to extra stack alignment
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
                   ` (3 preceding siblings ...)
  2010-04-25 20:07 ` rguenth at gcc dot gnu dot org
@ 2010-04-25 22:02 ` hjl dot tools at gmail dot com
  2010-04-25 23:42 ` hubicka at ucw dot cz
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: hjl dot tools at gmail dot com @ 2010-04-25 22:02 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from hjl dot tools at gmail dot com  2010-04-25 22:01 -------
(In reply to comment #4)
> Btw, with the "optimal" options -O2 -fwhole-program -fomit-frame-pointer
> -mpreferred-stack-boundary=2 GCC 4.3 and 4.4 are slower than 4.1 and 4.5
> (14.3s vs. 13.8s).  The extra stack alignment drops us to 16.4s(!).
>

The slowdown also happens on x86-64. Stack alignment checks
leaf function. But I am sure if it detects tail-recursion.
Is such information available to ix86_finalize_stack_realign_flags? 


-- 

hjl dot tools at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|hjl at gcc dot gnu dot org  |hjl dot tools at gmail dot
                   |                            |com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation due to extra stack alignment
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
                   ` (4 preceding siblings ...)
  2010-04-25 22:02 ` hjl dot tools at gmail dot com
@ 2010-04-25 23:42 ` hubicka at ucw dot cz
  2010-04-25 23:43 ` hubicka at ucw dot cz
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: hubicka at ucw dot cz @ 2010-04-25 23:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from hubicka at ucw dot cz  2010-04-25 23:42 -------
Subject: Re:  [4.4/4.5/4.6 Regression] Performance
        degradation for simple fibonacci numbers calculation due to extra
        stack alignment

> where the only difference is different loop alignment and keeping the
> stack 16-bytes aligned.  Indeed we get the same speed as 4.1 when
> building with -mpreffered-stack-boundary=2.  Why do we bother to
> keep the stack aligned for leaf functions?
We should not.  Probably fallout of stack alignment patches? I will check out
later.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation due to extra stack alignment
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
                   ` (5 preceding siblings ...)
  2010-04-25 23:42 ` hubicka at ucw dot cz
@ 2010-04-25 23:43 ` hubicka at ucw dot cz
  2010-04-26 10:37 ` rguenth at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: hubicka at ucw dot cz @ 2010-04-25 23:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from hubicka at ucw dot cz  2010-04-25 23:43 -------
Subject: Re:  [4.4/4.5/4.6 Regression] Performance
        degradation for simple fibonacci numbers calculation due to extra
        stack alignment

> The slowdown also happens on x86-64. Stack alignment checks
> leaf function. But I am sure if it detects tail-recursion.
> Is such information available to ix86_finalize_stack_realign_flags? 
Tail recursion is recognized at gimple level, so rtl code should not be at all
bothered here.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation due to extra stack alignment
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
                   ` (6 preceding siblings ...)
  2010-04-25 23:43 ` hubicka at ucw dot cz
@ 2010-04-26 10:37 ` rguenth at gcc dot gnu dot org
  2010-04-26 12:40 ` jakub at gcc dot gnu dot org
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-04-26 10:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from rguenth at gcc dot gnu dot org  2010-04-26 10:36 -------
(In reply to comment #7)
> Subject: Re:  [4.4/4.5/4.6 Regression] Performance
>         degradation for simple fibonacci numbers calculation due to extra
>         stack alignment
> 
> > The slowdown also happens on x86-64. Stack alignment checks
> > leaf function. But I am sure if it detects tail-recursion.
> > Is such information available to ix86_finalize_stack_realign_flags? 
> Tail recursion is recognized at gimple level, so rtl code should not be at all
> bothered here.

There is a recursive self-call left (but that's the only call, so its still
a leaf function).

> Honza
> 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation due to extra stack alignment
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
                   ` (7 preceding siblings ...)
  2010-04-26 10:37 ` rguenth at gcc dot gnu dot org
@ 2010-04-26 12:40 ` jakub at gcc dot gnu dot org
  2010-04-26 13:45 ` hjl dot tools at gmail dot com
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-04-26 12:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from jakub at gcc dot gnu dot org  2010-04-26 12:40 -------
In the leaf_function_p sense it is non-leaf.  For the stack alignment it of
course would be possible to change the stack alignment requirements of the
function if it calls itself, doesn't call other functions (nor tail call them)
and it is changed not to assume the standard alignment in the whole function.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation due to extra stack alignment
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
                   ` (8 preceding siblings ...)
  2010-04-26 12:40 ` jakub at gcc dot gnu dot org
@ 2010-04-26 13:45 ` hjl dot tools at gmail dot com
  2010-04-26 13:58 ` jakub at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: hjl dot tools at gmail dot com @ 2010-04-26 13:45 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from hjl dot tools at gmail dot com  2010-04-26 13:44 -------
(In reply to comment #9)
> In the leaf_function_p sense it is non-leaf.  For the stack alignment it of
> course would be possible to change the stack alignment requirements of the
> function if it calls itself, doesn't call other functions (nor tail call them)
> and it is changed not to assume the standard alignment in the whole function.
> 

That is true. For tail call, we only need to align outgoing stack to
minimum of maximum local stack alignment and incoming stack alignment.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation due to extra stack alignment
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
                   ` (9 preceding siblings ...)
  2010-04-26 13:45 ` hjl dot tools at gmail dot com
@ 2010-04-26 13:58 ` jakub at gcc dot gnu dot org
  2010-04-26 14:28 ` hubicka at ucw dot cz
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-04-26 13:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from jakub at gcc dot gnu dot org  2010-04-26 13:57 -------
Tail call needs to consider incoming alignment requirements of the target
function (which is often in other CU).  In this case it is not a tail call, but
non-tail recursion (tail-recursion would be handled by wrapping the function's
body into a loop).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation due to extra stack alignment
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
                   ` (10 preceding siblings ...)
  2010-04-26 13:58 ` jakub at gcc dot gnu dot org
@ 2010-04-26 14:28 ` hubicka at ucw dot cz
  2010-04-26 14:48 ` hjl dot tools at gmail dot com
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: hubicka at ucw dot cz @ 2010-04-26 14:28 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from hubicka at ucw dot cz  2010-04-26 14:27 -------
Subject: Re:  [4.4/4.5/4.6 Regression] Performance
        degradation for simple fibonacci numbers calculation due to extra
        stack alignment

> That is true. For tail call, we only need to align outgoing stack to
> minimum of maximum local stack alignment and incoming stack alignment.

Well, the tail call gets the same stack alignment as the function itself,
so I guess when expanding a tail call, we need to bump up the incomming
stack alignment to one needed by the call.

We should special case the self recursion and do nothing in case of tail
calls and in case of normal calls.  In normal self recursive calls we need
to remember the fact that function is self recursive and when finalizing
be sure that outgoing stack alignment is at least as good as incomming.
This can not be decided at expansion time since we do not know yet what
alignment function has.

Old preferred alignment code had this logic, I guess somehow this got
broken during the merge of stack alignment branch?

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation due to extra stack alignment
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
                   ` (11 preceding siblings ...)
  2010-04-26 14:28 ` hubicka at ucw dot cz
@ 2010-04-26 14:48 ` hjl dot tools at gmail dot com
  2010-04-26 18:55 ` [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation hjl dot tools at gmail dot com
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: hjl dot tools at gmail dot com @ 2010-04-26 14:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from hjl dot tools at gmail dot com  2010-04-26 14:47 -------
(In reply to comment #12)
> Subject: Re:  [4.4/4.5/4.6 Regression] Performance
>         degradation for simple fibonacci numbers calculation due to extra
>         stack alignment
> 
> > That is true. For tail call, we only need to align outgoing stack to
> > minimum of maximum local stack alignment and incoming stack alignment.
> 
> Well, the tail call gets the same stack alignment as the function itself,
> so I guess when expanding a tail call, we need to bump up the incomming
> stack alignment to one needed by the call.
> 
> We should special case the self recursion and do nothing in case of tail
> calls and in case of normal calls.  In normal self recursive calls we need
> to remember the fact that function is self recursive and when finalizing
> be sure that outgoing stack alignment is at least as good as incomming.

The outgoing stack alignment should be the minimum of incoming and
local.  If incoming stack is 16byte aligned and local variable only
needs 4byte alignment, there is no difference in stack realignment
when incoming stack is 4byte, 8byte and 16byte aligned.

> This can not be decided at expansion time since we do not know yet what
> alignment function has.
> 
> Old preferred alignment code had this logic, I guess somehow this got
> broken during the merge of stack alignment branch?
> 

I will investigate.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
                   ` (12 preceding siblings ...)
  2010-04-26 14:48 ` hjl dot tools at gmail dot com
@ 2010-04-26 18:55 ` hjl dot tools at gmail dot com
  2010-04-29  1:21 ` hjl dot tools at gmail dot com
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: hjl dot tools at gmail dot com @ 2010-04-26 18:55 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from hjl dot tools at gmail dot com  2010-04-26 18:54 -------
It is caused by revision 131576:

http://gcc.gnu.org/ml/gcc-cvs/2008-01/msg00337.html


-- 

hjl dot tools at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[4.4/4.5/4.6 Regression]    |[4.4/4.5/4.6 Regression]
                   |Performance degradation for |Performance degradation for
                   |simple fibonacci numbers    |simple fibonacci numbers
                   |calculation due to extra    |calculation
                   |stack alignment             |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
                   ` (13 preceding siblings ...)
  2010-04-26 18:55 ` [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation hjl dot tools at gmail dot com
@ 2010-04-29  1:21 ` hjl dot tools at gmail dot com
  2010-04-29  2:20 ` hjl dot tools at gmail dot com
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: hjl dot tools at gmail dot com @ 2010-04-29  1:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from hjl dot tools at gmail dot com  2010-04-29 01:20 -------
Revision 139756:

http://gcc.gnu.org/ml/gcc-cvs/2008-08/msg01321.html

also contributed to this regression.

[hjl@gnu-26 rrs]$ ./131573/usr/bin/gcc -O3 pr43884.c
[hjl@gnu-26 rrs]$ time ./a.out 45
fib(45)=1134903170

real    0m3.237s
user    0m3.236s
sys     0m0.000s
[hjl@gnu-26 rrs]$ ./131573/usr/bin/gcc -O3 pr43884.c -m32
[hjl@gnu-26 rrs]$ time ./a.out 45
fib(45)=1134903170

real    0m3.667s
user    0m3.665s
sys     0m0.001s
[hjl@gnu-26 rrs]$ ./131576/usr/bin/gcc -O3 pr43884.c 
[hjl@gnu-26 rrs]$ time ./a.out 45
fib(45)=1134903170

real    0m3.687s
user    0m3.685s
sys     0m0.000s
[hjl@gnu-26 rrs]$ ./131576/usr/bin/gcc -O3 pr43884.c -m32
[hjl@gnu-26 rrs]$ time ./a.out 45
fib(45)=1134903170

real    0m3.685s
user    0m3.683s
sys     0m0.001s
[hjl@gnu-26 rrs]$ 

[hjl@gnu-26 rrs]$ ./139755/usr/bin/gcc -O3 pr43884.c 
[hjl@gnu-26 rrs]$ time ./a.out 45
fib(45)=1134903170

real    0m3.261s
user    0m3.260s
sys     0m0.002s
[hjl@gnu-26 rrs]$ ./139755/usr/bin/gcc -O3 pr43884.c -m32
[hjl@gnu-26 rrs]$ time ./a.out 45
fib(45)=1134903170

real    0m3.043s
user    0m3.041s
sys     0m0.001s
[hjl@gnu-26 rrs]$ ./139756/usr/bin/gcc -O3 pr43884.c 
[hjl@gnu-26 rrs]$ time ./a.out 45
fib(45)=1134903170

real    0m3.909s
user    0m3.906s
sys     0m0.001s
[hjl@gnu-26 rrs]$ ./139756/usr/bin/gcc -O3 pr43884.c -m32
[hjl@gnu-26 rrs]$ time ./a.out 45
fib(45)=1134903170

real    0m3.883s
user    0m3.881s
sys     0m0.000s
[hjl@gnu-26 rrs]$ 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
                   ` (14 preceding siblings ...)
  2010-04-29  1:21 ` hjl dot tools at gmail dot com
@ 2010-04-29  2:20 ` hjl dot tools at gmail dot com
  2010-04-29  9:27 ` hubicka at ucw dot cz
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: hjl dot tools at gmail dot com @ 2010-04-29  2:20 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from hjl dot tools at gmail dot com  2010-04-29 02:20 -------
This patch:

diff --git a/gcc/predict.c b/gcc/predict.c
index eb5ddef..a05e796 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -120,7 +120,8 @@ maybe_hot_frequency_p (int freq)
       if (cfun->function_frequency == FUNCTION_FREQUENCY_HOT)
         return true;
     }
-  if (profile_status == PROFILE_ABSENT)
+  if (profile_status == PROFILE_ABSENT
+      || profile_status == PROFILE_GUESSED)
     return true;
   if (freq < BB_FREQ_MAX / PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION))
     return false;

seems to work for me. Since profile_status is PROFILE_GUESSED,
it may be hot as shown here.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
                   ` (15 preceding siblings ...)
  2010-04-29  2:20 ` hjl dot tools at gmail dot com
@ 2010-04-29  9:27 ` hubicka at ucw dot cz
  2010-04-30  9:01 ` jakub at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: hubicka at ucw dot cz @ 2010-04-29  9:27 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from hubicka at ucw dot cz  2010-04-29 09:27 -------
Subject: Re:  [4.4/4.5/4.6 Regression] Performance
        degradation for simple fibonacci numbers calculation

This is not correct, when profile is guessed we should look into the
frequencies.
I guess profile is wrong after tail recursion elimination or horked by
recursive inlining,
I will take a look.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
                   ` (16 preceding siblings ...)
  2010-04-29  9:27 ` hubicka at ucw dot cz
@ 2010-04-30  9:01 ` jakub at gcc dot gnu dot org
  2010-05-19 12:37 ` rguenth at gcc dot gnu dot org
  2010-06-25 14:10 ` hjl dot tools at gmail dot com
  19 siblings, 0 replies; 26+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-04-30  9:01 UTC (permalink / raw)
  To: gcc-bugs



-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.4                       |4.4.5


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
                   ` (17 preceding siblings ...)
  2010-04-30  9:01 ` jakub at gcc dot gnu dot org
@ 2010-05-19 12:37 ` rguenth at gcc dot gnu dot org
  2010-06-25 14:10 ` hjl dot tools at gmail dot com
  19 siblings, 0 replies; 26+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-05-19 12:37 UTC (permalink / raw)
  To: gcc-bugs



-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2
            Version|unknown                     |4.4.3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation
  2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
                   ` (18 preceding siblings ...)
  2010-05-19 12:37 ` rguenth at gcc dot gnu dot org
@ 2010-06-25 14:10 ` hjl dot tools at gmail dot com
  19 siblings, 0 replies; 26+ messages in thread
From: hjl dot tools at gmail dot com @ 2010-06-25 14:10 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #18 from hjl dot tools at gmail dot com  2010-06-25 14:09 -------
(In reply to comment #17)
> Subject: Re:  [4.4/4.5/4.6 Regression] Performance
>         degradation for simple fibonacci numbers calculation
> 
> This is not correct, when profile is guessed we should look into the
> frequencies.
> I guess profile is wrong after tail recursion elimination or horked by
> recursive inlining,
> I will take a look.
> 

Any updates?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation
       [not found] <bug-43884-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2011-01-22 21:59 ` hubicka at gcc dot gnu.org
@ 2011-01-22 22:10 ` hubicka at gcc dot gnu.org
  4 siblings, 0 replies; 26+ messages in thread
From: hubicka at gcc dot gnu.org @ 2011-01-22 22:10 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884

--- Comment #21 from Jan Hubicka <hubicka at gcc dot gnu.org> 2011-01-22 21:47:43 UTC ---
Author: hubicka
Date: Sat Jan 22 21:47:40 2011
New Revision: 169136

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169136
Log:
    PR tree-optimization/43884
    PR lto/44334
    * predict.c (maybe_hot_frequency_p): Use entry block frequency as an base.
    * doc/invoke.texi (hot-bb-frequency-fraction): Update docs.
    * gcc.dg/autopar/outer-2.c: Increase array size.
    * gcc.dg/tree-ssa/ldist-pr45948.c: Update test.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/predict.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/autopar/outer-2.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ldist-pr45948.c


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation
       [not found] <bug-43884-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2011-01-22 21:51 ` hubicka at gcc dot gnu.org
@ 2011-01-22 21:59 ` hubicka at gcc dot gnu.org
  2011-01-22 22:10 ` hubicka at gcc dot gnu.org
  4 siblings, 0 replies; 26+ messages in thread
From: hubicka at gcc dot gnu.org @ 2011-01-22 21:59 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED

--- Comment #22 from Jan Hubicka <hubicka at gcc dot gnu.org> 2011-01-22 21:49:51 UTC ---
Fixed.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation
       [not found] <bug-43884-4@http.gcc.gnu.org/bugzilla/>
  2010-10-01 12:08 ` jakub at gcc dot gnu.org
  2011-01-22 17:31 ` hubicka at gcc dot gnu.org
@ 2011-01-22 21:51 ` hubicka at gcc dot gnu.org
  2011-01-22 21:59 ` hubicka at gcc dot gnu.org
  2011-01-22 22:10 ` hubicka at gcc dot gnu.org
  4 siblings, 0 replies; 26+ messages in thread
From: hubicka at gcc dot gnu.org @ 2011-01-22 21:51 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884

--- Comment #20 from Jan Hubicka <hubicka at gcc dot gnu.org> 2011-01-22 21:45:23 UTC ---
Patch posted http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01597.html
I tested that is seems to bring us back to the 4.3 speed
jh@gcc10:~/trunk/build/gcc$ time ./a.out 45
fib(45)=1134903170

real    0m7.978s
user    0m7.976s
sys     0m0.000s
jh@gcc10:~/trunk/build/gcc$ gcc-4.3
Display all 708 possibilities? (y or n)
jh@gcc10:~/trunk/build/gcc$ gcc-4.3 -O3 tt.c
jh@gcc10:~/trunk/build/gcc$ time ./a.out 45
fib(45)=1134903170

real    0m7.902s
user    0m7.888s
sys     0m0.000s

and before patch
jh@gcc10:~/trunk/build2/gcc$ time ./a.out 45
fib(45)=1134903170

real    0m8.222s
user    0m8.213s
sys     0m0.000s


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation
       [not found] <bug-43884-4@http.gcc.gnu.org/bugzilla/>
  2010-10-01 12:08 ` jakub at gcc dot gnu.org
@ 2011-01-22 17:31 ` hubicka at gcc dot gnu.org
  2011-01-22 21:51 ` hubicka at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 26+ messages in thread
From: hubicka at gcc dot gnu.org @ 2011-01-22 17:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
         AssignedTo|unassigned at gcc dot       |hubicka at gcc dot gnu.org
                   |gnu.org                     |

--- Comment #19 from Jan Hubicka <hubicka at gcc dot gnu.org> 2011-01-22 16:24:36 UTC ---
The profile is consistent, but due to recursive inlining we create deep loop
nest in the function making profile estimation to believe that code outside the
loop nest is cold.
Path for PR44334 should cure this testcase too.  I will look into if I can get
the testsuite updated and the patch comitted.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation
       [not found] <bug-43884-4@http.gcc.gnu.org/bugzilla/>
@ 2010-10-01 12:08 ` jakub at gcc dot gnu.org
  2011-01-22 17:31 ` hubicka at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 26+ messages in thread
From: jakub at gcc dot gnu.org @ 2010-10-01 12:08 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.5                       |4.4.6


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2011-01-22 21:54 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-25  7:18 [Bug c++/43884] New: Performance degradation of the simple example (fibonacci) 4.3.3->4.5.0 yuri at tsoft dot com
2010-04-25 11:42 ` [Bug c++/43884] " steven at gcc dot gnu dot org
2010-04-25 12:13 ` [Bug c++/43884] [4.4/4.5 Regression] Performance degradation for simple fibonacci numbers calculation steven at gcc dot gnu dot org
2010-04-25 20:03 ` [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation due to extra stack alignment rguenth at gcc dot gnu dot org
2010-04-25 20:07 ` rguenth at gcc dot gnu dot org
2010-04-25 22:02 ` hjl dot tools at gmail dot com
2010-04-25 23:42 ` hubicka at ucw dot cz
2010-04-25 23:43 ` hubicka at ucw dot cz
2010-04-26 10:37 ` rguenth at gcc dot gnu dot org
2010-04-26 12:40 ` jakub at gcc dot gnu dot org
2010-04-26 13:45 ` hjl dot tools at gmail dot com
2010-04-26 13:58 ` jakub at gcc dot gnu dot org
2010-04-26 14:28 ` hubicka at ucw dot cz
2010-04-26 14:48 ` hjl dot tools at gmail dot com
2010-04-26 18:55 ` [Bug target/43884] [4.4/4.5/4.6 Regression] Performance degradation for simple fibonacci numbers calculation hjl dot tools at gmail dot com
2010-04-29  1:21 ` hjl dot tools at gmail dot com
2010-04-29  2:20 ` hjl dot tools at gmail dot com
2010-04-29  9:27 ` hubicka at ucw dot cz
2010-04-30  9:01 ` jakub at gcc dot gnu dot org
2010-05-19 12:37 ` rguenth at gcc dot gnu dot org
2010-06-25 14:10 ` hjl dot tools at gmail dot com
     [not found] <bug-43884-4@http.gcc.gnu.org/bugzilla/>
2010-10-01 12:08 ` jakub at gcc dot gnu.org
2011-01-22 17:31 ` hubicka at gcc dot gnu.org
2011-01-22 21:51 ` hubicka at gcc dot gnu.org
2011-01-22 21:59 ` hubicka at gcc dot gnu.org
2011-01-22 22:10 ` hubicka at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).