[Bug tree-optimization/17863] New: threefold performance loss

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/17863] New: threefold performance loss
@ 2004-10-06 15:33 kunert at physik dot tu-dresden dot de
  2004-10-06 15:34 ` [Bug tree-optimization/17863] " kunert at physik dot tu-dresden dot de
                   ` (25 more replies)
  0 siblings, 26 replies; 27+ messages in thread
From: kunert at physik dot tu-dresden dot de @ 2004-10-06 15:33 UTC (permalink / raw)
  To: gcc-bugs

I see a threefold performance loss. This is a rather big chunk of code and it is
probably difficult to extract a small testcase, because there is no pronounced
hot spot. To reproduce just compile and run the code.

commandline: g++ -O3 -march=pentium4 ttest.cc -o t34a -static

Execution time of the compiled code:

with GCC 3.4.1: 6.6s
with todays GCC 4.0.0: 18.9s

machine is a 2.4 GHz P4.

-- 
           Summary: threefold performance loss
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: kunert at physik dot tu-dresden dot de
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] threefold performance loss
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
@ 2004-10-06 15:34 ` kunert at physik dot tu-dresden dot de
  2004-10-06 17:02 ` [Bug tree-optimization/17863] [4.0 Regression] " pinskia at gcc dot gnu dot org
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: kunert at physik dot tu-dresden dot de @ 2004-10-06 15:34 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From kunert at physik dot tu-dresden dot de  2004-10-06 15:34 -------
Created an attachment (id=7295)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=7295&action=view)
testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0 Regression] threefold performance loss
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
  2004-10-06 15:34 ` [Bug tree-optimization/17863] " kunert at physik dot tu-dresden dot de
@ 2004-10-06 17:02 ` pinskia at gcc dot gnu dot org
  2004-10-06 18:40 ` pinskia at gcc dot gnu dot org
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-10-06 17:02 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-10-06 17:02 -------
Looks like IV-OPTS is doing something wrong.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
            Summary|threefold performance loss  |[4.0 Regression] threefold
                   |                            |performance loss
   Target Milestone|---                         |4.0.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0 Regression] threefold performance loss
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
  2004-10-06 15:34 ` [Bug tree-optimization/17863] " kunert at physik dot tu-dresden dot de
  2004-10-06 17:02 ` [Bug tree-optimization/17863] [4.0 Regression] " pinskia at gcc dot gnu dot org
@ 2004-10-06 18:40 ` pinskia at gcc dot gnu dot org
  2004-11-02 15:48 ` pinskia at gcc dot gnu dot org
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-10-06 18:40 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-10-06 18:40 -------
I was wrong, this is just the case where we are not inlining as much as we should be.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|                            |1
   Last reconfirmed|0000-00-00 00:00:00         |2004-10-06 18:40:54
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0 Regression] threefold performance loss
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (2 preceding siblings ...)
  2004-10-06 18:40 ` pinskia at gcc dot gnu dot org
@ 2004-11-02 15:48 ` pinskia at gcc dot gnu dot org
  2004-12-06  5:21 ` pinskia at gcc dot gnu dot org
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-02 15:48 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-02 15:48 -------
Jan can you look into this testcase, this is one where we don't inline as much as 3.4 did.
(for ppc-darwin, it is much worse as templates have to go through a stub now so it is much worse 
there).  Maybe the CALL_INSN_COST should be bumped.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0 Regression] threefold performance loss
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (3 preceding siblings ...)
  2004-11-02 15:48 ` pinskia at gcc dot gnu dot org
@ 2004-12-06  5:21 ` pinskia at gcc dot gnu dot org
  2004-12-24 20:36 ` [Bug tree-optimization/17863] [4.0 Regression] threefold performance loss, not inlining as much pinskia at gcc dot gnu dot org
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-12-06  5:21 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-12-06 05:20 -------
*** Bug 18704 has been marked as a duplicate of this bug. ***

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at tat dot physik
                   |                            |dot uni-tuebingen dot de


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (4 preceding siblings ...)
  2004-12-06  5:21 ` pinskia at gcc dot gnu dot org
@ 2004-12-24 20:36 ` pinskia at gcc dot gnu dot org
  2004-12-24 21:09 ` hubicka at ucw dot cz
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-12-24 20:36 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-12-24 20:36 -------
Reduced testcase:
const int LMAX = 4;
const int LMAX41 = 4*LMAX+1;
const int LMAX12 = (LMAX+1)*(LMAX+2)/2;
template<int n>
inline double accu1( const double* p1, const double* p2 )
{
     double d = *p1 * *p2;
     return d + accu1<n-1>( ++p1, ++p2 );
}
template <> inline double accu1<0>( const double* p1, const double* p2 )
{
    return p1[0] * p2[0]; 
}
template <int ny, int nz>
inline double accu2( const double* py, const double* pz, const double* h )
{
    const double d = accu1<nz>( pz, h ) * *py;
    if( ny == 0 ) return d;
    return d + accu2<(ny ? ny-1 : -1), (ny ? nz : -1 )>( ++py, pz, ++h );
}
template<>
inline double accu2<-1, -1>( const double* , const double* , const double* )
{
    return 0.0;
}
template <int ny, int nz>
inline double accu( const double* py, const double* pz, const double* h )
{
    if( ny == 0 ) return accu1<nz>( pz, h );
    else  if( nz == 0 ) return accu1<ny>( py, h );
    else if( nz >= ny ) return accu2<ny, nz>( py, pz, h );
    else return accu2<nz, ny>( pz, py, h );
}
template <>
inline double accu<0,0>( const double* , const double* , const double* h )
{
    return *h;
}
#define SWYZ( Y, Z ) ((Y+Z) * (Y+Z+1) / 2+Z)
#define CASA( Y, Z ) case SWYZ( Y, Z ):         \
        *ap1 = accu<Y, Z>( py, pz, dxb );      \
    if( z1 == 0 ) break;                        \
    ++ap1;                                      \
    z1--; py += LMAX41; pz -= LMAX41;
#define CAS( Y, Z ) case SWYZ( Y, Z ): *ap1 = accu<Y, Z>( py, pz, dxb ); break
#define CAS1( Y ) CASA( Y, 1 );  CAS( Y+1, 0 );
#define CAS2( Y ) CASA( Y, 2 ); CAS1( Y+1 );
#define CAS3( Y ) CASA( Y, 3 ); CAS2( Y+1 );
#define CAS4( Y ) CASA( Y, 4 ); CAS3( Y+1 );

double f(const double *py, const double *pz, double *dxb, double *ap1, int mh_z1234, unsigned int 
z1)
{
  switch( mh_z1234 )
 {
    CAS( 0, 0 );
    CAS1(0);
    CAS2(0);
    CAS3(0);
    CAS4(0);
  }
}


When we do -O3 or -O2, we don't inline accu1<1> into accu1<2> at all, why?????????

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2004-10-06 18:40:54         |2004-12-24 20:36:17
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (5 preceding siblings ...)
  2004-12-24 20:36 ` [Bug tree-optimization/17863] [4.0 Regression] threefold performance loss, not inlining as much pinskia at gcc dot gnu dot org
@ 2004-12-24 21:09 ` hubicka at ucw dot cz
  2005-01-28  1:04 ` steven at gcc dot gnu dot org
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: hubicka at ucw dot cz @ 2004-12-24 21:09 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From hubicka at ucw dot cz  2004-12-24 21:09 -------
Subject: Re:  [4.0 Regression] threefold performance loss, not inlining as much

> 
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2004-12-24 20:36 -------
> Reduced testcase:
> const int LMAX = 4;
> const int LMAX41 = 4*LMAX+1;
> const int LMAX12 = (LMAX+1)*(LMAX+2)/2;
> template<int n>
> inline double accu1( const double* p1, const double* p2 )
> {
>      double d = *p1 * *p2;
>      return d + accu1<n-1>( ++p1, ++p2 );
> }
> template <> inline double accu1<0>( const double* p1, const double* p2 )
> {
>     return p1[0] * p2[0]; 
> }
> template <int ny, int nz>
> inline double accu2( const double* py, const double* pz, const double* h )
> {
>     const double d = accu1<nz>( pz, h ) * *py;
>     if( ny == 0 ) return d;
>     return d + accu2<(ny ? ny-1 : -1), (ny ? nz : -1 )>( ++py, pz, ++h );
> }
> template<>
> inline double accu2<-1, -1>( const double* , const double* , const double* )
> {
>     return 0.0;
> }
> template <int ny, int nz>
> inline double accu( const double* py, const double* pz, const double* h )
> {
>     if( ny == 0 ) return accu1<nz>( pz, h );
>     else  if( nz == 0 ) return accu1<ny>( py, h );
>     else if( nz >= ny ) return accu2<ny, nz>( py, pz, h );
>     else return accu2<nz, ny>( pz, py, h );
> }
> template <>
> inline double accu<0,0>( const double* , const double* , const double* h )
> {
>     return *h;
> }
> #define SWYZ( Y, Z ) ((Y+Z) * (Y+Z+1) / 2+Z)
> #define CASA( Y, Z ) case SWYZ( Y, Z ):         \
>         *ap1 = accu<Y, Z>( py, pz, dxb );      \
>     if( z1 == 0 ) break;                        \
>     ++ap1;                                      \
>     z1--; py += LMAX41; pz -= LMAX41;
> #define CAS( Y, Z ) case SWYZ( Y, Z ): *ap1 = accu<Y, Z>( py, pz, dxb ); break
> #define CAS1( Y ) CASA( Y, 1 );  CAS( Y+1, 0 );
> #define CAS2( Y ) CASA( Y, 2 ); CAS1( Y+1 );
> #define CAS3( Y ) CASA( Y, 3 ); CAS2( Y+1 );
> #define CAS4( Y ) CASA( Y, 4 ); CAS3( Y+1 );
> 
> double f(const double *py, const double *pz, double *dxb, double *ap1, int mh_z1234, unsigned int 
> z1)
> {
>   switch( mh_z1234 )
>  {
>     CAS( 0, 0 );
>     CAS1(0);
>     CAS2(0);
>     CAS3(0);
>     CAS4(0);
>   }
> }
> 
> 
> When we do -O3 or -O2, we don't inline accu1<1> into accu1<2> at all, why?????????

Because we inline other functions before we get into this one and
inline-unit-growth is reached...
Actually for very small units the inline-unit-growth limit seems to be
bit too tight, so we might think about bypassing this limit for very
small units, but this won't solve original testcase anyway....

Profiling branch seems to get this testcase right and inline everything
due to slightly different code size estimates, but it does not work
particularly well on tramp3d testcase (right now it is slightly worse
than mainline without profiling, I have patch to bring it back to
mainlie levels that is obviously far from optimum...)

I am unsure if we can come with more realistic cost model without
actually trying to inline the function and see how much it optimize as
suggested by some papers (but apparently not very suitable for
production compiler I would say), but I am all ears about ideas ;))

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (6 preceding siblings ...)
  2004-12-24 21:09 ` hubicka at ucw dot cz
@ 2005-01-28  1:04 ` steven at gcc dot gnu dot org
  2005-02-08 16:56 ` kunert at physik dot tu-dresden dot de
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-01-28  1:04 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From steven at gcc dot gnu dot org  2005-01-28 01:04 -------
Final callgraph for amd64: 
double accu1(const double*, const double*) [with int n = 2]/21: 22 insns (29 
after inlining) needed inlinable asm_written 
  called by: 
  calls: 
double f(const double*, const double*, double*, double*, int, unsigned 
int)/17: 281 insns (2583 after inlining) needed inlinable asm_written 
  called by: 
  calls: 
 
Final callgraph for i686: 
 
double accu2(const double*, const double*, const double*) [with int ny = 2, 
int nz = 1]/29: 43 insns (102 after inlining) reachable inlinable asm_written 
  called by: 
  calls: 
double accu1(const double*, const double*) [with int n = 3]/25: 28 insns 
needed inlinable asm_written 
  called by: 
  calls: 
double accu1(const double*, const double*) [with int n = 2]/21: 28 insns 
needed inlinable asm_written 
  called by: 
  calls: 
double accu1(const double*, const double*) [with int n = 1]/18: 28 insns (27 
after inlining) needed inlinable asm_written 
  called by: 
  calls: 
double f(const double*, const double*, double*, double*, int, unsigned 
int)/17: 311 insns (3003 after inlining) needed inlinable asm_written 
  called by: 
  calls: 
 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (7 preceding siblings ...)
  2005-01-28  1:04 ` steven at gcc dot gnu dot org
@ 2005-02-08 16:56 ` kunert at physik dot tu-dresden dot de
  2005-02-24 21:24 ` rguenth at gcc dot gnu dot org
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: kunert at physik dot tu-dresden dot de @ 2005-02-08 16:56 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From kunert at physik dot tu-dresden dot de  2005-02-08 10:13 -------
Subject: Re:  [4.0 Regression] threefold performance
 loss, not inlining as much

Good idea.

However, this provides just another knob to tune the inlining and it is not obvious where to apply that attribute for the best performance. I played with the present dozen or so parameters for some hours to come close to the old (aka 3.4) performance, and I definitely can't handle even more. 

Thank you
Thomas Kunert



bonzini at gcc dot gnu dot org wrote:
> ------- Additional Comments From bonzini at gcc dot gnu dot org  2005-02-03 16:49 -------
> To the reporter: in this case you probably want __attribute__ ((leafify)), just 
> in case, though you are right in expecting the compiler to inline it.
> 



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (8 preceding siblings ...)
  2005-02-08 16:56 ` kunert at physik dot tu-dresden dot de
@ 2005-02-24 21:24 ` rguenth at gcc dot gnu dot org
  2005-02-25 16:39 ` kunert at physik dot tu-dresden dot de
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2005-02-24 21:24 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From rguenth at gcc dot gnu dot org  2005-02-24 17:09 -------
With __attribute__((leafify)) sticked to v4c_quad and mult_pq runtime goes down
from 16.0s to 4.4s with recent gcc 4.0.  For gcc 3.4.3 runtimes are 5.0s and 4.9s.

We indeed do not very well on estimating the size of template metaprograms in 4.0.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (9 preceding siblings ...)
  2005-02-24 21:24 ` rguenth at gcc dot gnu dot org
@ 2005-02-25 16:39 ` kunert at physik dot tu-dresden dot de
  2005-02-25 16:43 ` rguenth at gcc dot gnu dot org
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: kunert at physik dot tu-dresden dot de @ 2005-02-25 16:39 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From kunert at physik dot tu-dresden dot de  2005-02-25 09:52 -------
Subject: Re:  [4.0 Regression] threefold performance
 loss, not inlining as much

Wow. Many thanks for that analysis.  Now I will go and fetch the patch. Since nobody seems to care about improving the inlining parameters, I'd love to see this patch in 4.0.

Thomas Kunert

rguenth at gcc dot gnu dot org wrote:
> ------- Additional Comments From rguenth at gcc dot gnu dot org  2005-02-24 17:09 -------
> With __attribute__((leafify)) sticked to v4c_quad and mult_pq runtime goes down
> from 16.0s to 4.4s with recent gcc 4.0.  For gcc 3.4.3 runtimes are 5.0s and 4.9s.
> 
> We indeed do not very well on estimating the size of template metaprograms in 4.0.
> 



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (10 preceding siblings ...)
  2005-02-25 16:39 ` kunert at physik dot tu-dresden dot de
@ 2005-02-25 16:43 ` rguenth at gcc dot gnu dot org
  2005-02-25 23:03 ` giovannibajo at libero dot it
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2005-02-25 16:43 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From rguenth at gcc dot gnu dot org  2005-02-25 10:06 -------
Patch at
http://gcc.gnu.org/ml/gcc-patches/2005-02/msg01571.html

improves the testcase from 16.2s to 12.1s (3.4: 5.0s) - aka, still not good
enough.  As we have (with the patch) still size estimates for the functions
that are 15-40% higher than for 3.4 we'd probably need to bump our inlining
limits accordingly, say by 20%.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (11 preceding siblings ...)
  2005-02-25 16:43 ` rguenth at gcc dot gnu dot org
@ 2005-02-25 23:03 ` giovannibajo at libero dot it
  2005-02-25 23:53 ` rguenth at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: giovannibajo at libero dot it @ 2005-02-25 23:03 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From giovannibajo at libero dot it  2005-02-25 16:43 -------
Why isn't this a critical regression? We're regressing *badly* on code 
generation.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |critical


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (12 preceding siblings ...)
  2005-02-25 23:03 ` giovannibajo at libero dot it
@ 2005-02-25 23:53 ` rguenth at gcc dot gnu dot org
  2005-03-02 11:35 ` [Bug tree-optimization/17863] [4.0/4.1 " steven at gcc dot gnu dot org
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2005-02-25 23:53 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From rguenth at gcc dot gnu dot org  2005-02-25 16:56 -------
Yes, the regression is even worse on the closed-duplicate #18704.  There you can
also find some analysis of inline parameter tuning.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (13 preceding siblings ...)
  2005-02-25 23:53 ` rguenth at gcc dot gnu dot org
@ 2005-03-02 11:35 ` steven at gcc dot gnu dot org
  2005-03-02 11:36 ` steven at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-03-02 11:35 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From steven at gcc dot gnu dot org  2005-03-02 11:35 -------
Performance bugs are never critical. 

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|critical                    |normal


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (14 preceding siblings ...)
  2005-03-02 11:35 ` [Bug tree-optimization/17863] [4.0/4.1 " steven at gcc dot gnu dot org
@ 2005-03-02 11:36 ` steven at gcc dot gnu dot org
  2005-03-05 18:49 ` steven at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-03-02 11:36 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From steven at gcc dot gnu dot org  2005-03-02 11:36 -------
Updated patch here: 
http://gcc.gnu.org/ml/gcc-patches/2005-02/msg01796.html 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (15 preceding siblings ...)
  2005-03-02 11:36 ` steven at gcc dot gnu dot org
@ 2005-03-05 18:49 ` steven at gcc dot gnu dot org
  2005-03-05 19:03 ` rguenth at tat dot physik dot uni-tuebingen dot de
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-03-05 18:49 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From steven at gcc dot gnu dot org  2005-03-05 18:49 -------
Even with Richard Guenther's patches, the only thing that really helps is 
setting --param large-function-growth=200, or more.  The default is 100. 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (16 preceding siblings ...)
  2005-03-05 18:49 ` steven at gcc dot gnu dot org
@ 2005-03-05 19:03 ` rguenth at tat dot physik dot uni-tuebingen dot de
  2005-04-21  5:06 ` mmitchel at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at tat dot physik dot uni-tuebingen dot de @ 2005-03-05 19:03 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de  2005-03-05 19:03 -------
Subject: Re:  [4.0/4.1 Regression] threefold
 performance loss, not inlining as much

steven at gcc dot gnu dot org wrote:
> ------- Additional Comments From steven at gcc dot gnu dot org  2005-03-05 18:49 -------
> Even with Richard Guenther's patches, the only thing that really helps is 
> setting --param large-function-growth=200, or more.  The default is 100. 

Yup, this is probably one of the testcases, where -fobey-inline would 
help.  Or of course profile directed inlining.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (17 preceding siblings ...)
  2005-03-05 19:03 ` rguenth at tat dot physik dot uni-tuebingen dot de
@ 2005-04-21  5:06 ` mmitchel at gcc dot gnu dot org
  2005-06-27  4:54 ` dank at kegel dot com
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-04-21  5:06 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.0.0                       |4.0.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (18 preceding siblings ...)
  2005-04-21  5:06 ` mmitchel at gcc dot gnu dot org
@ 2005-06-27  4:54 ` dank at kegel dot com
  2005-06-27  6:38 ` steven at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: dank at kegel dot com @ 2005-06-27  4:54 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From dank at kegel dot com  2005-06-27 04:54 -------
I just verified the regression here with -march=pentium on a pentium 4.
On the original testcase, I got runtimes of 7.0, 4.9, 8.1, and 7.0
seconds with gcc-2.95.3, gcc-3.4.3, gcc-4.0.0, and gcc-4.1-20050603
using just -O3 (no -static).

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dank at kegel dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (19 preceding siblings ...)
  2005-06-27  4:54 ` dank at kegel dot com
@ 2005-06-27  6:38 ` steven at gcc dot gnu dot org
  2005-06-30 22:16 ` danalis at cis dot udel dot edu
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-06-27  6:38 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From steven at gcc dot gnu dot org  2005-06-27 06:38 -------
As you can see from those numbers Dan Kegel posted, this kind of test case 
is very sensitive to the intermediate representation presented to the inliner 
and to inliner heuristics.  Personally, I don't think it is worth keeping a 
bug report like this opened.  This kind of almost-random behavior is similar 
to the reload failures on x86: You'll always be able to construct a test case 
that will fail with a "fixed" compiler for one particular case. 
 
However, in this case there are still a number of things that are interesting 
to look at: 
1) GCC 4.1 has profile guided inlining.  Does it help in this case? 
2) Does the early inlining patch [1] help? 
3) Does the pre-inlining patch and the IPA stuff help? (i.e. try the 
   tree-profiling-branch with all pistons firing ;-) 
 
Personally, I expect that this is a case where the pre-inline optimizations 
may be helpful 
 
Could someone construct a graphical representation of the call graphs for 
GCC 3.4 and GCC 4.1 and compare them?  I'm very curious which function (or 
functions) are apparently inlined only by GCC 3.4 and not by any other 
release. 
   
 
[1] http://gcc.gnu.org/ml/gcc-patches/2005-06/msg01839.html 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (20 preceding siblings ...)
  2005-06-27  6:38 ` steven at gcc dot gnu dot org
@ 2005-06-30 22:16 ` danalis at cis dot udel dot edu
  2005-06-30 22:24 ` danalis at cis dot udel dot edu
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: danalis at cis dot udel dot edu @ 2005-06-30 22:16 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From danalis at cis dot udel dot edu  2005-06-30 22:16 -------
I'm looking at the reduced testcase from comment #6,
and I noticed that f() is declared double, but does not return anything.
Thus the code doesn't compile with -O3 -Wall -Werror.
If I fix the bug adding a "return(return *ap1)",
or by declaring f() to be void, the performance regression dissappears.

Here's the test harness I used to call the minimized testcase:

int main(int argc, char *argv[]){
    double ay[100][100];
    const double *py, *pz;
    double *dxb, *ap1;
    double sum=0;
    int i,j,k;

    for(i=0; i<100; i++){
        for(j=0; j<100; j++){
            ay[i][j] = 1000*(i+1)+2*(j+1);
        }
    }
    py  = ay[0];
    pz  = ay[1];
    dxb = ay[2];
    ap1 = ay[3];

    for(k=0; k<100; k++){
        for(i=0; i<10000; i++){
            for(j=0; j<12; j++){
                sum += f(py,pz,dxb,ap1,j,5);
                sum /= 2;
            }
        }
    }
    cout << sum << endl;
    return 0;
}

Is that ok?   I compiled this with -O3 -mtune=pentium.

Runtimes *without* the fix to f() were
0.31s, 8.72s, 8.83s and 8.80s when compiled with g++
2.95.3, 3.4.3, 4.0.0 and 4.1.0-20050625, respectively
(making this a large performance regression relative to gcc-2.95.3).
Runtimes *with* the fix were
0.34s, 0.28s, 0.36s, 0.32s when compiled with g++
2.95.3, 3.4.3, 4.0.0 and 4.1.0-20050625, respectively.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (21 preceding siblings ...)
  2005-06-30 22:16 ` danalis at cis dot udel dot edu
@ 2005-06-30 22:24 ` danalis at cis dot udel dot edu
  2005-07-01  3:44 ` dank at kegel dot com
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: danalis at cis dot udel dot edu @ 2005-06-30 22:24 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From danalis at cis dot udel dot edu  2005-06-30 22:24 -------
I meant to say "return(*ap1)" not "return(return *ap1)"

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (22 preceding siblings ...)
  2005-06-30 22:24 ` danalis at cis dot udel dot edu
@ 2005-07-01  3:44 ` dank at kegel dot com
  2005-07-08  1:45 ` mmitchel at gcc dot gnu dot org
  2005-09-27 16:25 ` mmitchel at gcc dot gnu dot org
  25 siblings, 0 replies; 27+ messages in thread
From: dank at kegel dot com @ 2005-07-01  3:44 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From dank at kegel dot com  2005-07-01 03:44 -------
Anthony, it looks like the runtimes with the fix
match the runtimes from the larger testcase reasonably
well; at least they're faster on gcc-3.4.3 where they're
supposed to be.

So maybe we should try to answer the questions from comment # 22
for the reduced testcase.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (23 preceding siblings ...)
  2005-07-01  3:44 ` dank at kegel dot com
@ 2005-07-08  1:45 ` mmitchel at gcc dot gnu dot org
  2005-09-27 16:25 ` mmitchel at gcc dot gnu dot org
  25 siblings, 0 replies; 27+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-07-08  1:45 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.0.1                       |4.0.2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/17863] [4.0/4.1 Regression] threefold performance loss, not inlining as much
  2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
                   ` (24 preceding siblings ...)
  2005-07-08  1:45 ` mmitchel at gcc dot gnu dot org
@ 2005-09-27 16:25 ` mmitchel at gcc dot gnu dot org
  25 siblings, 0 replies; 27+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-27 16:25 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.0.2                       |4.0.3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2005-09-27 16:25 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-06 15:33 [Bug tree-optimization/17863] New: threefold performance loss kunert at physik dot tu-dresden dot de
2004-10-06 15:34 ` [Bug tree-optimization/17863] " kunert at physik dot tu-dresden dot de
2004-10-06 17:02 ` [Bug tree-optimization/17863] [4.0 Regression] " pinskia at gcc dot gnu dot org
2004-10-06 18:40 ` pinskia at gcc dot gnu dot org
2004-11-02 15:48 ` pinskia at gcc dot gnu dot org
2004-12-06  5:21 ` pinskia at gcc dot gnu dot org
2004-12-24 20:36 ` [Bug tree-optimization/17863] [4.0 Regression] threefold performance loss, not inlining as much pinskia at gcc dot gnu dot org
2004-12-24 21:09 ` hubicka at ucw dot cz
2005-01-28  1:04 ` steven at gcc dot gnu dot org
2005-02-08 16:56 ` kunert at physik dot tu-dresden dot de
2005-02-24 21:24 ` rguenth at gcc dot gnu dot org
2005-02-25 16:39 ` kunert at physik dot tu-dresden dot de
2005-02-25 16:43 ` rguenth at gcc dot gnu dot org
2005-02-25 23:03 ` giovannibajo at libero dot it
2005-02-25 23:53 ` rguenth at gcc dot gnu dot org
2005-03-02 11:35 ` [Bug tree-optimization/17863] [4.0/4.1 " steven at gcc dot gnu dot org
2005-03-02 11:36 ` steven at gcc dot gnu dot org
2005-03-05 18:49 ` steven at gcc dot gnu dot org
2005-03-05 19:03 ` rguenth at tat dot physik dot uni-tuebingen dot de
2005-04-21  5:06 ` mmitchel at gcc dot gnu dot org
2005-06-27  4:54 ` dank at kegel dot com
2005-06-27  6:38 ` steven at gcc dot gnu dot org
2005-06-30 22:16 ` danalis at cis dot udel dot edu
2005-06-30 22:24 ` danalis at cis dot udel dot edu
2005-07-01  3:44 ` dank at kegel dot com
2005-07-08  1:45 ` mmitchel at gcc dot gnu dot org
2005-09-27 16:25 ` mmitchel at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).