public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: -fprofile-generate and -fprofile-use
@ 2005-07-20 17:45 girish vaitheeswaran
  2005-07-20 19:01 ` Janis Johnson
  0 siblings, 1 reply; 29+ messages in thread
From: girish vaitheeswaran @ 2005-07-20 17:45 UTC (permalink / raw)
  To: gcc

I am using gcc 3.4.3
-girish


> 
> 
> --- Steven Bosscher <stevenb@suse.de> wrote:
> 
> > On Wednesday 20 July 2005 18:53, girish
> > vaitheeswaran wrote:
> > > I am seeing a 20% slowdown with feedback
> > optimization.
> > > Does anyone have any thoughts on this.
> > 
> > My first thought is that you should probably first
> > tell what compiler
> > you are using.
> > 
> > Gr.
> > Steven
> > 
> > 
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-07-20 17:45 -fprofile-generate and -fprofile-use girish vaitheeswaran
@ 2005-07-20 19:01 ` Janis Johnson
  2005-07-20 20:16   ` girish vaitheeswaran
  2005-07-20 22:44   ` Jan Hubicka
  0 siblings, 2 replies; 29+ messages in thread
From: Janis Johnson @ 2005-07-20 19:01 UTC (permalink / raw)
  To: girish vaitheeswaran; +Cc: gcc

On Wed, Jul 20, 2005 at 10:45:01AM -0700, girish vaitheeswaran wrote:
> > --- Steven Bosscher <stevenb@suse.de> wrote:
> > 
> > > On Wednesday 20 July 2005 18:53, girish vaitheeswaran wrote:
> > > > I am seeing a 20% slowdown with feedback optimization.
> > > > Does anyone have any thoughts on this.
> > > 
> > > My first thought is that you should probably first
> > > tell what compiler
> > > you are using.
>
> I am using gcc 3.4.3
> -girish

Which platform?  I've seen slower code for profile-directed optimizations
on powerpc64-linux with GCC 4.0 and mainline.  It's a bug, but I haven't
looked into it enough to provide a small test case for a problem report.

Janis

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-07-20 19:01 ` Janis Johnson
@ 2005-07-20 20:16   ` girish vaitheeswaran
  2005-07-20 22:44   ` Jan Hubicka
  1 sibling, 0 replies; 29+ messages in thread
From: girish vaitheeswaran @ 2005-07-20 20:16 UTC (permalink / raw)
  To: Janis Johnson; +Cc: gcc

This is on Intel Pentium4 on Linux.

-girish

--- Janis Johnson <janis187@us.ibm.com> wrote:

> On Wed, Jul 20, 2005 at 10:45:01AM -0700, girish
> vaitheeswaran wrote:
> > > --- Steven Bosscher <stevenb@suse.de> wrote:
> > > 
> > > > On Wednesday 20 July 2005 18:53, girish
> vaitheeswaran wrote:
> > > > > I am seeing a 20% slowdown with feedback
> optimization.
> > > > > Does anyone have any thoughts on this.
> > > > 
> > > > My first thought is that you should probably
> first
> > > > tell what compiler
> > > > you are using.
> >
> > I am using gcc 3.4.3
> > -girish
> 
> Which platform?  I've seen slower code for
> profile-directed optimizations
> on powerpc64-linux with GCC 4.0 and mainline.  It's
> a bug, but I haven't
> looked into it enough to provide a small test case
> for a problem report.
> 
> Janis
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-07-20 19:01 ` Janis Johnson
  2005-07-20 20:16   ` girish vaitheeswaran
@ 2005-07-20 22:44   ` Jan Hubicka
  2005-07-20 23:38     ` girish vaitheeswaran
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Hubicka @ 2005-07-20 22:44 UTC (permalink / raw)
  To: Janis Johnson; +Cc: girish vaitheeswaran, gcc

> On Wed, Jul 20, 2005 at 10:45:01AM -0700, girish vaitheeswaran wrote:
> > > --- Steven Bosscher <stevenb@suse.de> wrote:
> > > 
> > > > On Wednesday 20 July 2005 18:53, girish vaitheeswaran wrote:
> > > > > I am seeing a 20% slowdown with feedback optimization.
> > > > > Does anyone have any thoughts on this.
> > > > 
> > > > My first thought is that you should probably first
> > > > tell what compiler
> > > > you are using.
> >
> > I am using gcc 3.4.3
> > -girish
> 
> Which platform?  I've seen slower code for profile-directed optimizations
> on powerpc64-linux with GCC 4.0 and mainline.  It's a bug, but I haven't
> looked into it enough to provide a small test case for a problem report.

Actually I would be very interested in seeing testcases such as those.
(and the Girish' slowdown too if possible).  In general some slowdowns
in side corners are probably unavoidable but both 3.4.3 and 4.0 seems to
have pretty consistent improvements with profiling at least for SPEC and
i386 I am testing pretty regularly.
Such slodowns usually indicate problems like incorrectly updated profile
or incorrectly readed in profile because of missmatch in CFGs in between
profile and feedback run that are rather dificult to notice and hunt
down...

Honza
> 
> Janis

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-07-20 22:44   ` Jan Hubicka
@ 2005-07-20 23:38     ` girish vaitheeswaran
  2005-07-21 13:03       ` Jan Hubicka
  2005-07-21 18:03       ` Kelley Cook
  0 siblings, 2 replies; 29+ messages in thread
From: girish vaitheeswaran @ 2005-07-20 23:38 UTC (permalink / raw)
  To: Jan Hubicka, Janis Johnson; +Cc: girish vaitheeswaran, gcc

I started with a clean slate in my build environment
and did not have any residual files hanging around.
Are the steps I have indicated in my earlier email
correct. Is there a way I can break down the problem
into a smaller sub-set of flags and eliminate the flag
causing the performance problem. What I mean is since
-fprofile-generate and -fprofile-use enable a bunch of
flags, would it make sense to avoid profiling and try
out some of the individual flags on a trial and error
basis. If so what would be the flags to start the
trials with.

-girish 

--- Jan Hubicka <hubicka@ucw.cz> wrote:

> > On Wed, Jul 20, 2005 at 10:45:01AM -0700, girish
> vaitheeswaran wrote:
> > > > --- Steven Bosscher <stevenb@suse.de> wrote:
> > > > 
> > > > > On Wednesday 20 July 2005 18:53, girish
> vaitheeswaran wrote:
> > > > > > I am seeing a 20% slowdown with feedback
> optimization.
> > > > > > Does anyone have any thoughts on this.
> > > > > 
> > > > > My first thought is that you should probably
> first
> > > > > tell what compiler
> > > > > you are using.
> > >
> > > I am using gcc 3.4.3
> > > -girish
> > 
> > Which platform?  I've seen slower code for
> profile-directed optimizations
> > on powerpc64-linux with GCC 4.0 and mainline. 
> It's a bug, but I haven't
> > looked into it enough to provide a small test case
> for a problem report.
> 
> Actually I would be very interested in seeing
> testcases such as those.
> (and the Girish' slowdown too if possible).  In
> general some slowdowns
> in side corners are probably unavoidable but both
> 3.4.3 and 4.0 seems to
> have pretty consistent improvements with profiling
> at least for SPEC and
> i386 I am testing pretty regularly.
> Such slodowns usually indicate problems like
> incorrectly updated profile
> or incorrectly readed in profile because of
> missmatch in CFGs in between
> profile and feedback run that are rather dificult to
> notice and hunt
> down...
> 
> Honza
> > 
> > Janis
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-07-20 23:38     ` girish vaitheeswaran
@ 2005-07-21 13:03       ` Jan Hubicka
  2005-07-25 17:39         ` girish vaitheeswaran
  2005-07-21 18:03       ` Kelley Cook
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Hubicka @ 2005-07-21 13:03 UTC (permalink / raw)
  To: girish vaitheeswaran; +Cc: Jan Hubicka, Janis Johnson, gcc

> I started with a clean slate in my build environment
> and did not have any residual files hanging around.
> Are the steps I have indicated in my earlier email
> correct. Is there a way I can break down the problem
> into a smaller sub-set of flags and eliminate the flag
> causing the performance problem. What I mean is since
> -fprofile-generate and -fprofile-use enable a bunch of
> flags, would it make sense to avoid profiling and try
> out some of the individual flags on a trial and error
> basis. If so what would be the flags to start the
It would be probably better to just turn off the individual
optimizations with -fprofile-use (for optimizations that are implied by
this flag there should be no need to re-profile each time).
If you can find particular optimization that gets out of control, it
would be lot easier to fix it...

Honza
> trials with.
> 
> -girish 
> 
> --- Jan Hubicka <hubicka@ucw.cz> wrote:
> 
> > > On Wed, Jul 20, 2005 at 10:45:01AM -0700, girish
> > vaitheeswaran wrote:
> > > > > --- Steven Bosscher <stevenb@suse.de> wrote:
> > > > > 
> > > > > > On Wednesday 20 July 2005 18:53, girish
> > vaitheeswaran wrote:
> > > > > > > I am seeing a 20% slowdown with feedback
> > optimization.
> > > > > > > Does anyone have any thoughts on this.
> > > > > > 
> > > > > > My first thought is that you should probably
> > first
> > > > > > tell what compiler
> > > > > > you are using.
> > > >
> > > > I am using gcc 3.4.3
> > > > -girish
> > > 
> > > Which platform?  I've seen slower code for
> > profile-directed optimizations
> > > on powerpc64-linux with GCC 4.0 and mainline. 
> > It's a bug, but I haven't
> > > looked into it enough to provide a small test case
> > for a problem report.
> > 
> > Actually I would be very interested in seeing
> > testcases such as those.
> > (and the Girish' slowdown too if possible).  In
> > general some slowdowns
> > in side corners are probably unavoidable but both
> > 3.4.3 and 4.0 seems to
> > have pretty consistent improvements with profiling
> > at least for SPEC and
> > i386 I am testing pretty regularly.
> > Such slodowns usually indicate problems like
> > incorrectly updated profile
> > or incorrectly readed in profile because of
> > missmatch in CFGs in between
> > profile and feedback run that are rather dificult to
> > notice and hunt
> > down...
> > 
> > Honza
> > > 
> > > Janis
> > 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-07-20 23:38     ` girish vaitheeswaran
  2005-07-21 13:03       ` Jan Hubicka
@ 2005-07-21 18:03       ` Kelley Cook
  2005-07-21 21:15         ` girish vaitheeswaran
  1 sibling, 1 reply; 29+ messages in thread
From: Kelley Cook @ 2005-07-21 18:03 UTC (permalink / raw)
  To: girish_vaithees, gcc

> I started with a clean slate in my build environment 
> and did not have any residual files hanging around. 
> Are the steps I have indicated in my earlier email 
> correct. Is there a way I can break down the problem 
> into a smaller sub-set of flags and eliminate the flag 
> causing the performance problem. What I mean is since 
> -fprofile-generate and -fprofile-use enable a bunch of 
> flags, would it make sense to avoid profiling and try 
> out some of the individual flags on a trial and error 
> basis. If so what would be the flags to start the 
> trials with.
> 
> -girish

Before we go any farther, are you sure that you are also turning on optimization with -fprofile-generate and -fprofile-use?

In other words, you aren't just using "gcc -fprofile-generate xxx.c" to create your object files are you?

You need to use something like "gcc -O2 -march=pentium4 -fprofile-generate" as unoptimized profiles are pretty pointless.

Instead of general terms, specific examples would help a lot.  Like a link to your code that is having problems.

Kelley Cook

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-07-21 18:03       ` Kelley Cook
@ 2005-07-21 21:15         ` girish vaitheeswaran
  0 siblings, 0 replies; 29+ messages in thread
From: girish vaitheeswaran @ 2005-07-21 21:15 UTC (permalink / raw)
  To: kcook, gcc

I am using -O3. This is the only flag apart from the
profile flag -fprofile-use.

I had independently tried -march=pentium4 and that did
not buy any performance for this app.

-girish

--- Kelley Cook <kcook34@ford.com> wrote:

> > I started with a clean slate in my build
> environment 
> > and did not have any residual files hanging
> around. 
> > Are the steps I have indicated in my earlier email
> 
> > correct. Is there a way I can break down the
> problem 
> > into a smaller sub-set of flags and eliminate the
> flag 
> > causing the performance problem. What I mean is
> since 
> > -fprofile-generate and -fprofile-use enable a
> bunch of 
> > flags, would it make sense to avoid profiling and
> try 
> > out some of the individual flags on a trial and
> error 
> > basis. If so what would be the flags to start the 
> > trials with.
> > 
> > -girish
> 
> Before we go any farther, are you sure that you are
> also turning on optimization with -fprofile-generate
> and -fprofile-use?
> 
> In other words, you aren't just using "gcc
> -fprofile-generate xxx.c" to create your object
> files are you?
> 
> You need to use something like "gcc -O2
> -march=pentium4 -fprofile-generate" as unoptimized
> profiles are pretty pointless.
> 
> Instead of general terms, specific examples would
> help a lot.  Like a link to your code that is having
> problems.
> 
> Kelley Cook
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-07-21 13:03       ` Jan Hubicka
@ 2005-07-25 17:39         ` girish vaitheeswaran
  2005-07-26  7:00           ` Jan Hubicka
  0 siblings, 1 reply; 29+ messages in thread
From: girish vaitheeswaran @ 2005-07-25 17:39 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jan Hubicka, Janis Johnson, gcc

I have done quite a few experiments with this to
narrow down the problem. The performance numbers are 
slower compared to *No Feedback optimization with just
-O3* Here are some of them. All the experiments were
done on a new build-area in order to eliminate effects
of old feedback files.

1. I built the app using -O3 and -fprofile-generate to
generate the feedback data. I then ran the workload
and then recompiled the app using -O3 and
-fprofile-use [app was 20% slower]

2. I built the app using -O3 and -fprofile-generate to
generate the feedback data. I then ran the workload
and then recompiled the app using -O3 and
-fprofile-use -fno-vpt -fno-unroll-loops
-fno-peel-loops -fno-tracer (Which is turn off all the
flags used by -fprofile-use) [App was still 20%
slower]

3. I have tried selectively turning of some of the
other flags in the above list as well, but the
performance regression persists.

4. I tried with the older flags namely -fprofile-arcs
and -fbranch-probabilities still no help.

Can someone help me out on how to proceed with this.

Thanks
-girish


--- Jan Hubicka <hubicka@ucw.cz> wrote:

> > I started with a clean slate in my build
> environment
> > and did not have any residual files hanging
> around.
> > Are the steps I have indicated in my earlier email
> > correct. Is there a way I can break down the
> problem
> > into a smaller sub-set of flags and eliminate the
> flag
> > causing the performance problem. What I mean is
> since
> > -fprofile-generate and -fprofile-use enable a
> bunch of
> > flags, would it make sense to avoid profiling and
> try
> > out some of the individual flags on a trial and
> error
> > basis. If so what would be the flags to start the
> It would be probably better to just turn off the
> individual
> optimizations with -fprofile-use (for optimizations
> that are implied by
> this flag there should be no need to re-profile each
> time).
> If you can find particular optimization that gets
> out of control, it
> would be lot easier to fix it...
> 
> Honza
> > trials with.
> > 
> > -girish 
> > 
> > --- Jan Hubicka <hubicka@ucw.cz> wrote:
> > 
> > > > On Wed, Jul 20, 2005 at 10:45:01AM -0700,
> girish
> > > vaitheeswaran wrote:
> > > > > > --- Steven Bosscher <stevenb@suse.de>
> wrote:
> > > > > > 
> > > > > > > On Wednesday 20 July 2005 18:53, girish
> > > vaitheeswaran wrote:
> > > > > > > > I am seeing a 20% slowdown with
> feedback
> > > optimization.
> > > > > > > > Does anyone have any thoughts on this.
> > > > > > > 
> > > > > > > My first thought is that you should
> probably
> > > first
> > > > > > > tell what compiler
> > > > > > > you are using.
> > > > >
> > > > > I am using gcc 3.4.3
> > > > > -girish
> > > > 
> > > > Which platform?  I've seen slower code for
> > > profile-directed optimizations
> > > > on powerpc64-linux with GCC 4.0 and mainline. 
> > > It's a bug, but I haven't
> > > > looked into it enough to provide a small test
> case
> > > for a problem report.
> > > 
> > > Actually I would be very interested in seeing
> > > testcases such as those.
> > > (and the Girish' slowdown too if possible).  In
> > > general some slowdowns
> > > in side corners are probably unavoidable but
> both
> > > 3.4.3 and 4.0 seems to
> > > have pretty consistent improvements with
> profiling
> > > at least for SPEC and
> > > i386 I am testing pretty regularly.
> > > Such slodowns usually indicate problems like
> > > incorrectly updated profile
> > > or incorrectly readed in profile because of
> > > missmatch in CFGs in between
> > > profile and feedback run that are rather
> dificult to
> > > notice and hunt
> > > down...
> > > 
> > > Honza
> > > > 
> > > > Janis
> > > 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-07-25 17:39         ` girish vaitheeswaran
@ 2005-07-26  7:00           ` Jan Hubicka
  2005-07-26 18:10             ` girish vaitheeswaran
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Hubicka @ 2005-07-26  7:00 UTC (permalink / raw)
  To: girish vaitheeswaran; +Cc: Jan Hubicka, Janis Johnson, gcc

> I have done quite a few experiments with this to
> narrow down the problem. The performance numbers are 
> slower compared to *No Feedback optimization with just
> -O3* Here are some of them. All the experiments were
> done on a new build-area in order to eliminate effects
> of old feedback files.
> 
> 1. I built the app using -O3 and -fprofile-generate to
> generate the feedback data. I then ran the workload
> and then recompiled the app using -O3 and
> -fprofile-use [app was 20% slower]
> 
> 2. I built the app using -O3 and -fprofile-generate to
> generate the feedback data. I then ran the workload
> and then recompiled the app using -O3 and
> -fprofile-use -fno-vpt -fno-unroll-loops
> -fno-peel-loops -fno-tracer (Which is turn off all the
> flags used by -fprofile-use) [App was still 20%
> slower]
> 
> 3. I have tried selectively turning of some of the
> other flags in the above list as well, but the
> performance regression persists.
> 
> 4. I tried with the older flags namely -fprofile-arcs
> and -fbranch-probabilities still no help.

So it looks like the slowdown is caused by one of the profile based
optimizations that are enabled by default (basic block reordering or
register allocation).  If you are getting such a noticable slodown, it
probably means that your app has pretty small inner loop.  Can you just
look into assembly generated for it with and without profiling and try
to spot what is gong wrong?

Thanks,
Honza
> 
> Can someone help me out on how to proceed with this.
> 
> Thanks
> -girish
> 
> 
> --- Jan Hubicka <hubicka@ucw.cz> wrote:
> 
> > > I started with a clean slate in my build
> > environment
> > > and did not have any residual files hanging
> > around.
> > > Are the steps I have indicated in my earlier email
> > > correct. Is there a way I can break down the
> > problem
> > > into a smaller sub-set of flags and eliminate the
> > flag
> > > causing the performance problem. What I mean is
> > since
> > > -fprofile-generate and -fprofile-use enable a
> > bunch of
> > > flags, would it make sense to avoid profiling and
> > try
> > > out some of the individual flags on a trial and
> > error
> > > basis. If so what would be the flags to start the
> > It would be probably better to just turn off the
> > individual
> > optimizations with -fprofile-use (for optimizations
> > that are implied by
> > this flag there should be no need to re-profile each
> > time).
> > If you can find particular optimization that gets
> > out of control, it
> > would be lot easier to fix it...
> > 
> > Honza
> > > trials with.
> > > 
> > > -girish 
> > > 
> > > --- Jan Hubicka <hubicka@ucw.cz> wrote:
> > > 
> > > > > On Wed, Jul 20, 2005 at 10:45:01AM -0700,
> > girish
> > > > vaitheeswaran wrote:
> > > > > > > --- Steven Bosscher <stevenb@suse.de>
> > wrote:
> > > > > > > 
> > > > > > > > On Wednesday 20 July 2005 18:53, girish
> > > > vaitheeswaran wrote:
> > > > > > > > > I am seeing a 20% slowdown with
> > feedback
> > > > optimization.
> > > > > > > > > Does anyone have any thoughts on this.
> > > > > > > > 
> > > > > > > > My first thought is that you should
> > probably
> > > > first
> > > > > > > > tell what compiler
> > > > > > > > you are using.
> > > > > >
> > > > > > I am using gcc 3.4.3
> > > > > > -girish
> > > > > 
> > > > > Which platform?  I've seen slower code for
> > > > profile-directed optimizations
> > > > > on powerpc64-linux with GCC 4.0 and mainline. 
> > > > It's a bug, but I haven't
> > > > > looked into it enough to provide a small test
> > case
> > > > for a problem report.
> > > > 
> > > > Actually I would be very interested in seeing
> > > > testcases such as those.
> > > > (and the Girish' slowdown too if possible).  In
> > > > general some slowdowns
> > > > in side corners are probably unavoidable but
> > both
> > > > 3.4.3 and 4.0 seems to
> > > > have pretty consistent improvements with
> > profiling
> > > > at least for SPEC and
> > > > i386 I am testing pretty regularly.
> > > > Such slodowns usually indicate problems like
> > > > incorrectly updated profile
> > > > or incorrectly readed in profile because of
> > > > missmatch in CFGs in between
> > > > profile and feedback run that are rather
> > dificult to
> > > > notice and hunt
> > > > down...
> > > > 
> > > > Honza
> > > > > 
> > > > > Janis
> > > > 
> > 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-07-26  7:00           ` Jan Hubicka
@ 2005-07-26 18:10             ` girish vaitheeswaran
  2005-07-30 19:20               ` Jan Hubicka
  0 siblings, 1 reply; 29+ messages in thread
From: girish vaitheeswaran @ 2005-07-26 18:10 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jan Hubicka, Janis Johnson, gcc

Jan,
That's going to be rather difficult given that the app
has over 1000 files. Is there a way I can turn off the
"default" options one at a time ?
Thx
-girish

--- Jan Hubicka <hubicka@ucw.cz> wrote:

> > I have done quite a few experiments with this to
> > narrow down the problem. The performance numbers
> are 
> > slower compared to *No Feedback optimization with
> just
> > -O3* Here are some of them. All the experiments
> were
> > done on a new build-area in order to eliminate
> effects
> > of old feedback files.
> > 
> > 1. I built the app using -O3 and
> -fprofile-generate to
> > generate the feedback data. I then ran the
> workload
> > and then recompiled the app using -O3 and
> > -fprofile-use [app was 20% slower]
> > 
> > 2. I built the app using -O3 and
> -fprofile-generate to
> > generate the feedback data. I then ran the
> workload
> > and then recompiled the app using -O3 and
> > -fprofile-use -fno-vpt -fno-unroll-loops
> > -fno-peel-loops -fno-tracer (Which is turn off all
> the
> > flags used by -fprofile-use) [App was still 20%
> > slower]
> > 
> > 3. I have tried selectively turning of some of the
> > other flags in the above list as well, but the
> > performance regression persists.
> > 
> > 4. I tried with the older flags namely
> -fprofile-arcs
> > and -fbranch-probabilities still no help.
> 
> So it looks like the slowdown is caused by one of
> the profile based
> optimizations that are enabled by default (basic
> block reordering or
> register allocation).  If you are getting such a
> noticable slodown, it
> probably means that your app has pretty small inner
> loop.  Can you just
> look into assembly generated for it with and without
> profiling and try
> to spot what is gong wrong?
> 
> Thanks,
> Honza
> > 
> > Can someone help me out on how to proceed with
> this.
> > 
> > Thanks
> > -girish
> > 
> > 
> > --- Jan Hubicka <hubicka@ucw.cz> wrote:
> > 
> > > > I started with a clean slate in my build
> > > environment
> > > > and did not have any residual files hanging
> > > around.
> > > > Are the steps I have indicated in my earlier
> email
> > > > correct. Is there a way I can break down the
> > > problem
> > > > into a smaller sub-set of flags and eliminate
> the
> > > flag
> > > > causing the performance problem. What I mean
> is
> > > since
> > > > -fprofile-generate and -fprofile-use enable a
> > > bunch of
> > > > flags, would it make sense to avoid profiling
> and
> > > try
> > > > out some of the individual flags on a trial
> and
> > > error
> > > > basis. If so what would be the flags to start
> the
> > > It would be probably better to just turn off the
> > > individual
> > > optimizations with -fprofile-use (for
> optimizations
> > > that are implied by
> > > this flag there should be no need to re-profile
> each
> > > time).
> > > If you can find particular optimization that
> gets
> > > out of control, it
> > > would be lot easier to fix it...
> > > 
> > > Honza
> > > > trials with.
> > > > 
> > > > -girish 
> > > > 
> > > > --- Jan Hubicka <hubicka@ucw.cz> wrote:
> > > > 
> > > > > > On Wed, Jul 20, 2005 at 10:45:01AM -0700,
> > > girish
> > > > > vaitheeswaran wrote:
> > > > > > > > --- Steven Bosscher <stevenb@suse.de>
> > > wrote:
> > > > > > > > 
> > > > > > > > > On Wednesday 20 July 2005 18:53,
> girish
> > > > > vaitheeswaran wrote:
> > > > > > > > > > I am seeing a 20% slowdown with
> > > feedback
> > > > > optimization.
> > > > > > > > > > Does anyone have any thoughts on
> this.
> > > > > > > > > 
> > > > > > > > > My first thought is that you should
> > > probably
> > > > > first
> > > > > > > > > tell what compiler
> > > > > > > > > you are using.
> > > > > > >
> > > > > > > I am using gcc 3.4.3
> > > > > > > -girish
> > > > > > 
> > > > > > Which platform?  I've seen slower code for
> > > > > profile-directed optimizations
> > > > > > on powerpc64-linux with GCC 4.0 and
> mainline. 
> > > > > It's a bug, but I haven't
> > > > > > looked into it enough to provide a small
> test
> > > case
> > > > > for a problem report.
> > > > > 
> > > > > Actually I would be very interested in
> seeing
> > > > > testcases such as those.
> > > > > (and the Girish' slowdown too if possible). 
> In
> > > > > general some slowdowns
> > > > > in side corners are probably unavoidable but
> > > both
> > > > > 3.4.3 and 4.0 seems to
> > > > > have pretty consistent improvements with
> > > profiling
> > > > > at least for SPEC and
> > > > > i386 I am testing pretty regularly.
> > > > > Such slodowns usually indicate problems like
> > > > > incorrectly updated profile
> > > > > or incorrectly readed in profile because of
> > > > > missmatch in CFGs in between
> > > > > profile and feedback run that are rather
> > > dificult to
> > > > > notice and hunt
> > > > > down...
> > > > > 
> > > > > Honza
> > > > > > 
> > > > > > Janis
> > > > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-07-26 18:10             ` girish vaitheeswaran
@ 2005-07-30 19:20               ` Jan Hubicka
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Hubicka @ 2005-07-30 19:20 UTC (permalink / raw)
  To: girish vaitheeswaran; +Cc: Jan Hubicka, Janis Johnson, gcc

> Jan,
Hi,
> That's going to be rather difficult given that the app
> has over 1000 files. Is there a way I can turn off the
> "default" options one at a time ?

This is unforutnately not possible :(  The optimizations used either
profile feedback or profile guessed by GCC itself.  It looks like for
your case the profile guessed by GCC (even if departed from reality)
causes GCC to produce better code than the real profile (or that the
real profile got missread, but there are some sanity checks for this so
this is quite unlikely).

It seems to me that only way to proceed from here is some profiling.
The way I usually look into such problems is to produce oprofile of both
versions of code and then compare the times spent in individual
functions then it is sometimes possible to identify the offending code
more easilly....

Honza

> Thx
> -girish
> 
> --- Jan Hubicka <hubicka@ucw.cz> wrote:
> 
> > > I have done quite a few experiments with this to
> > > narrow down the problem. The performance numbers
> > are 
> > > slower compared to *No Feedback optimization with
> > just
> > > -O3* Here are some of them. All the experiments
> > were
> > > done on a new build-area in order to eliminate
> > effects
> > > of old feedback files.
> > > 
> > > 1. I built the app using -O3 and
> > -fprofile-generate to
> > > generate the feedback data. I then ran the
> > workload
> > > and then recompiled the app using -O3 and
> > > -fprofile-use [app was 20% slower]
> > > 
> > > 2. I built the app using -O3 and
> > -fprofile-generate to
> > > generate the feedback data. I then ran the
> > workload
> > > and then recompiled the app using -O3 and
> > > -fprofile-use -fno-vpt -fno-unroll-loops
> > > -fno-peel-loops -fno-tracer (Which is turn off all
> > the
> > > flags used by -fprofile-use) [App was still 20%
> > > slower]
> > > 
> > > 3. I have tried selectively turning of some of the
> > > other flags in the above list as well, but the
> > > performance regression persists.
> > > 
> > > 4. I tried with the older flags namely
> > -fprofile-arcs
> > > and -fbranch-probabilities still no help.
> > 
> > So it looks like the slowdown is caused by one of
> > the profile based
> > optimizations that are enabled by default (basic
> > block reordering or
> > register allocation).  If you are getting such a
> > noticable slodown, it
> > probably means that your app has pretty small inner
> > loop.  Can you just
> > look into assembly generated for it with and without
> > profiling and try
> > to spot what is gong wrong?
> > 
> > Thanks,
> > Honza
> > > 
> > > Can someone help me out on how to proceed with
> > this.
> > > 
> > > Thanks
> > > -girish
> > > 
> > > 
> > > --- Jan Hubicka <hubicka@ucw.cz> wrote:
> > > 
> > > > > I started with a clean slate in my build
> > > > environment
> > > > > and did not have any residual files hanging
> > > > around.
> > > > > Are the steps I have indicated in my earlier
> > email
> > > > > correct. Is there a way I can break down the
> > > > problem
> > > > > into a smaller sub-set of flags and eliminate
> > the
> > > > flag
> > > > > causing the performance problem. What I mean
> > is
> > > > since
> > > > > -fprofile-generate and -fprofile-use enable a
> > > > bunch of
> > > > > flags, would it make sense to avoid profiling
> > and
> > > > try
> > > > > out some of the individual flags on a trial
> > and
> > > > error
> > > > > basis. If so what would be the flags to start
> > the
> > > > It would be probably better to just turn off the
> > > > individual
> > > > optimizations with -fprofile-use (for
> > optimizations
> > > > that are implied by
> > > > this flag there should be no need to re-profile
> > each
> > > > time).
> > > > If you can find particular optimization that
> > gets
> > > > out of control, it
> > > > would be lot easier to fix it...
> > > > 
> > > > Honza
> > > > > trials with.
> > > > > 
> > > > > -girish 
> > > > > 
> > > > > --- Jan Hubicka <hubicka@ucw.cz> wrote:
> > > > > 
> > > > > > > On Wed, Jul 20, 2005 at 10:45:01AM -0700,
> > > > girish
> > > > > > vaitheeswaran wrote:
> > > > > > > > > --- Steven Bosscher <stevenb@suse.de>
> > > > wrote:
> > > > > > > > > 
> > > > > > > > > > On Wednesday 20 July 2005 18:53,
> > girish
> > > > > > vaitheeswaran wrote:
> > > > > > > > > > > I am seeing a 20% slowdown with
> > > > feedback
> > > > > > optimization.
> > > > > > > > > > > Does anyone have any thoughts on
> > this.
> > > > > > > > > > 
> > > > > > > > > > My first thought is that you should
> > > > probably
> > > > > > first
> > > > > > > > > > tell what compiler
> > > > > > > > > > you are using.
> > > > > > > >
> > > > > > > > I am using gcc 3.4.3
> > > > > > > > -girish
> > > > > > > 
> > > > > > > Which platform?  I've seen slower code for
> > > > > > profile-directed optimizations
> > > > > > > on powerpc64-linux with GCC 4.0 and
> > mainline. 
> > > > > > It's a bug, but I haven't
> > > > > > > looked into it enough to provide a small
> > test
> > > > case
> > > > > > for a problem report.
> > > > > > 
> > > > > > Actually I would be very interested in
> > seeing
> > > > > > testcases such as those.
> > > > > > (and the Girish' slowdown too if possible). 
> > In
> > > > > > general some slowdowns
> > > > > > in side corners are probably unavoidable but
> > > > both
> > > > > > 3.4.3 and 4.0 seems to
> > > > > > have pretty consistent improvements with
> > > > profiling
> > > > > > at least for SPEC and
> > > > > > i386 I am testing pretty regularly.
> > > > > > Such slodowns usually indicate problems like
> > > > > > incorrectly updated profile
> > > > > > or incorrectly readed in profile because of
> > > > > > missmatch in CFGs in between
> > > > > > profile and feedback run that are rather
> > > > dificult to
> > > > > > notice and hunt
> > > > > > down...
> > > > > > 
> > > > > > Honza
> > > > > > > 
> > > > > > > Janis
> > > > > > 
> > > > 
> > 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-09-01 22:54     ` Janis Johnson
  2005-09-01 23:13       ` Steven Bosscher
@ 2005-09-02 15:29       ` Kai Henningsen
  1 sibling, 0 replies; 29+ messages in thread
From: Kai Henningsen @ 2005-09-02 15:29 UTC (permalink / raw)
  To: gcc

Hi Janis,
janis187@us.ibm.com (Janis Johnson)  wrote on 01.09.05 in <20050901225348.GA6845@us.ibm.com>:

[quoteto.xps]
> On Thu, Sep 01, 2005 at 11:45:35PM +0200, Steven Bosscher wrote:
> > On Thursday 01 September 2005 23:19, girish vaitheeswaran wrote:
> > > Sorry I still did not follow. This is what I
> > > understood. During Feedback optimization apart from
> > > the -fprofile-generate, one needs to turn on
> > > -fmove-loop-invariants.
> >
> > You don't "need to".  It just might help iff you are using a gcc 4.1
> > based compiler.
> >
> > > However this option is not
> > > recognized by the gcc 3.4.4 or 3.4.3 compilers. What
> > > am I missing?
> >
> > You are missing that
> > 1) this whole thread does not concern gcc 3.4.x; and
> > 2) the option -fmove-loop-invariants does not exist in 3.4.x.
>
> Girish started this thread about problems he is seeing with GCC 3.4.3

The discussion, maybe. The thread, definitely not - that was started by  
Peter Steinmetz (new subject, no References:). And it was explicitely  
about "using mainline":

] There was some discussion a few weeks ago about some apps running slower
] with FDO enabled.

...

] While this doesn't explain all of the degradations discussed (some were
] showing up on older versions of the compiler), it may explain some.

MfG Kai

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-09-01 19:04         ` Jan Hubicka
@ 2005-09-02  9:01           ` Zdenek Dvorak
  0 siblings, 0 replies; 29+ messages in thread
From: Zdenek Dvorak @ 2005-09-02  9:01 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Peter Steinmetz, gcc, girish_vaithees, hubicka, janis187,
	Steven Bosscher

Hello,

> > >you may try adding -fmove-loop-invariants flag, which enables new
> > >invariant motion pass.
> > 
> > That cleaned up both my simplified test case, and the code it
> > originated from.  It also cleaned up a few other cases where I
> > was noticing worse performance with FDO enabled.  Thanks!!
> > 
> > Perhaps this option should be enabled by default when doing FDO
> > to replace the loop invariant motions done by the recently
> > disabled loop optimize pass.
> 
> This sounds like sane idea.  Zdenek, is -fmove-loop-invariants dangerous
> in some way or just disabled because old loop does the same?

-fmove-loop-invariants is enabled by default on killloop-branch, and
passes bootstrap & regtesting on i686 and x86_64; the branch does not
bootstrap for me on ia64 and ppc (but neither does mainline).  I am
not aware of any correctness/ICE problems with -fmove-loop-invariants,
but given that it is disabled by default, this does not say much.

Zdenek

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-09-01 22:54     ` Janis Johnson
@ 2005-09-01 23:13       ` Steven Bosscher
  2005-09-02 15:29       ` Kai Henningsen
  1 sibling, 0 replies; 29+ messages in thread
From: Steven Bosscher @ 2005-09-01 23:13 UTC (permalink / raw)
  To: Janis Johnson; +Cc: gcc, girish vaitheeswaran, Eric Christopher

On Friday 02 September 2005 00:53, Janis Johnson wrote:
> Girish started this thread about problems he is seeing with GCC 3.4.3
> (see http://gcc.gnu.org/ml/gcc/2005-07/msg00866.html).  Others of us
> chimed in about similar issues with later versions.  Suggestions for
> avoiding the problems have been about those later versions, not the
> version he is using.

Ah.  Sorry then.  My bad.

Gr.
Steven

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-09-01 21:45   ` Steven Bosscher
@ 2005-09-01 22:54     ` Janis Johnson
  2005-09-01 23:13       ` Steven Bosscher
  2005-09-02 15:29       ` Kai Henningsen
  0 siblings, 2 replies; 29+ messages in thread
From: Janis Johnson @ 2005-09-01 22:54 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc, girish vaitheeswaran, Eric Christopher

On Thu, Sep 01, 2005 at 11:45:35PM +0200, Steven Bosscher wrote:
> On Thursday 01 September 2005 23:19, girish vaitheeswaran wrote:
> > Sorry I still did not follow. This is what I
> > understood. During Feedback optimization apart from
> > the -fprofile-generate, one needs to turn on
> > -fmove-loop-invariants.
> 
> You don't "need to".  It just might help iff you are using a gcc 4.1
> based compiler.
> 
> > However this option is not 
> > recognized by the gcc 3.4.4 or 3.4.3 compilers. What
> > am I missing?
> 
> You are missing that
> 1) this whole thread does not concern gcc 3.4.x; and
> 2) the option -fmove-loop-invariants does not exist in 3.4.x.

Girish started this thread about problems he is seeing with GCC 3.4.3
(see http://gcc.gnu.org/ml/gcc/2005-07/msg00866.html).  Others of us
chimed in about similar issues with later versions.  Suggestions for
avoiding the problems have been about those later versions, not the
version he is using.

Janis

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-09-01 21:19 ` girish vaitheeswaran
@ 2005-09-01 21:45   ` Steven Bosscher
  2005-09-01 22:54     ` Janis Johnson
  0 siblings, 1 reply; 29+ messages in thread
From: Steven Bosscher @ 2005-09-01 21:45 UTC (permalink / raw)
  To: gcc; +Cc: girish vaitheeswaran, Eric Christopher

On Thursday 01 September 2005 23:19, girish vaitheeswaran wrote:
> Sorry I still did not follow. This is what I
> understood. During Feedback optimization apart from
> the -fprofile-generate, one needs to turn on
> -fmove-loop-invariants.

You don't "need to".  It just might help iff you are using a gcc 4.1
based compiler.

> However this option is not 
> recognized by the gcc 3.4.4 or 3.4.3 compilers. What
> am I missing?

You are missing that
1) this whole thread does not concern gcc 3.4.x; and
2) the option -fmove-loop-invariants does not exist in 3.4.x.

Gr.
Steven

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
       [not found] <AE7DE2E6-9131-4407-9585-746013E36070@apple.com>
@ 2005-09-01 21:19 ` girish vaitheeswaran
  2005-09-01 21:45   ` Steven Bosscher
  0 siblings, 1 reply; 29+ messages in thread
From: girish vaitheeswaran @ 2005-09-01 21:19 UTC (permalink / raw)
  To: Eric Christopher; +Cc: GCC Mailing List

Sorry I still did not follow. This is what I
understood. During Feedback optimization apart from
the -fprofile-generate, one needs to turn on
-fmove-loop-invariants. However this option is not
recognized by the gcc 3.4.4 or 3.4.3 compilers. What
am I missing?

-girish


--- Eric Christopher <echristo@apple.com> wrote:

> 
> On Aug 31, 2005, at 3:40 PM, girish vaitheeswaran
> wrote:
> 
> > I do not see this flag in gcc3.4.4.
> >
> >
> > Am I missing something?
> >>
> 
> >> you may try adding -fmove-loop-invariants flag,
> >> which enables new
> >> invariant motion pass.
> 
> The "new invariant motion pass".
> 
> -eric
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-08-31 18:16       ` Peter Steinmetz
@ 2005-09-01 19:04         ` Jan Hubicka
  2005-09-02  9:01           ` Zdenek Dvorak
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Hubicka @ 2005-09-01 19:04 UTC (permalink / raw)
  To: Peter Steinmetz
  Cc: Zdenek Dvorak, gcc, girish_vaithees, hubicka, janis187, Steven Bosscher

> >you may try adding -fmove-loop-invariants flag, which enables new
> >invariant motion pass.
> 
> That cleaned up both my simplified test case, and the code it
> originated from.  It also cleaned up a few other cases where I
> was noticing worse performance with FDO enabled.  Thanks!!
> 
> Perhaps this option should be enabled by default when doing FDO
> to replace the loop invariant motions done by the recently
> disabled loop optimize pass.

This sounds like sane idea.  Zdenek, is -fmove-loop-invariants dangerous
in some way or just disabled because old loop does the same?

Honza
> 
> Pete

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-08-31 11:44     ` Zdenek Dvorak
@ 2005-08-31 18:16       ` Peter Steinmetz
  2005-09-01 19:04         ` Jan Hubicka
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Steinmetz @ 2005-08-31 18:16 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc, girish_vaithees, hubicka, janis187, Steven Bosscher

>you may try adding -fmove-loop-invariants flag, which enables new
>invariant motion pass.

That cleaned up both my simplified test case, and the code it
originated from.  It also cleaned up a few other cases where I
was noticing worse performance with FDO enabled.  Thanks!!

Perhaps this option should be enabled by default when doing FDO
to replace the loop invariant motions done by the recently
disabled loop optimize pass.

Pete

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-08-30 17:57   ` Peter Steinmetz
@ 2005-08-31 11:44     ` Zdenek Dvorak
  2005-08-31 18:16       ` Peter Steinmetz
  0 siblings, 1 reply; 29+ messages in thread
From: Zdenek Dvorak @ 2005-08-31 11:44 UTC (permalink / raw)
  To: Peter Steinmetz; +Cc: Steven Bosscher, gcc, girish_vaithees, hubicka, janis187

Hello,

> >A more likely source of performance degradation is that loop unrolling
> >is enabled when profiling, and loop unrolling is almost always a bad
> >pessimization on 32 bits x86 targets.
> 
> To clarify, I was compiling with -funroll-loops and -fpeel-loops
> enabled in both cases.
> 
> The FDO slowdown in my case was caused by the presence of some loop
> invariant code that was getting removed from the loop by the loop
> optimizer pass in the non-FDO case.

you may try adding -fmove-loop-invariants flag, which enables new
invariant motion pass.

Zdenek

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-08-30 19:58 ` Jan Hubicka
  2005-08-31  1:25   ` girish vaitheeswaran
@ 2005-08-31  4:33   ` Peter Steinmetz
  1 sibling, 0 replies; 29+ messages in thread
From: Peter Steinmetz @ 2005-08-31  4:33 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc, girish_vaithees, hubicka, janis187

> Do you have specific testcase?  It would be interesting to see if new
> optimizer can catch up at least on kill-loop branch.

Here is a simplified version of what I observed.  In the non-FDO case,
the loop invariant load of the constant 32 is removed from the loop.
When FDO is enabled, the load remains in the loop.

float farray[100];

int main (int argc, char *argv[])
{
    int m;

    for( m = 0; m < 100; m++ )
    {
        farray[m] = 32;
    }
}

I'm compiling it as follows using a version of gcc built from
mainline yesterday.

Non-FDO:
gcc -O3 -funroll-loops -fpeel-loops -o test test.c

FDO:
gcc -O3 -funroll-loops -fpeel-loops -fprofile-generate -o test test.c
./test
gcc -O3 -funroll-loops -fpeel-loops -fprofile-use -o test test.c

Pete

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-08-30 19:58 ` Jan Hubicka
@ 2005-08-31  1:25   ` girish vaitheeswaran
  2005-08-31  4:33   ` Peter Steinmetz
  1 sibling, 0 replies; 29+ messages in thread
From: girish vaitheeswaran @ 2005-08-31  1:25 UTC (permalink / raw)
  To: Jan Hubicka, Peter Steinmetz; +Cc: gcc, janis187, girish_vaithees, hubicka

I have tried with gcc 3.4.4 and still see the same
20%slowdown. If you folks are able to crack this, do
let me know.  On my earlier attempts I had tried to
disable all of the flags that feedback optimization
turns on (except the ones that are turned on by
default) and still got the 20% slowdown.
Is there something that you would want me to try out
wrt to enabling/disabling certain flags, i'd be happy
to. The versions I have tried out so far are 
2.95,3.4.3 and 3.4.4

-girish

--- Jan Hubicka <jh@suse.cz> wrote:

> > 
> > There was some discussion a few weeks ago about
> some apps running slower
> > with FDO enabled.
> > 
> > I've recently investigated a similar situation
> using mainline.  In my case,
> > the fact that the loop_optimize pass is disabled
> during FDO was the cause
> > of the slowdown.  It appears that was recently
> disabled as part of Jan
> > Hubicka's patch to eliminate RTL based profiling. 
> The commentary indicates
> > that the old loop optimizer is incompatible with
> tree profiling.
> > 
> > While this doesn't explain all of the degradations
> discussed (some were
> > showing up on older versions of the compiler), it
> may explain some.
> 
> Do you have specific testcase?  It would be
> interesting to see if new
> optimizer can catch up at least on kill-loop branch.
> 
> Thanks for investigating!
> Honza
> > 
> > Pete
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-08-30 16:24 Peter Steinmetz
  2005-08-30 16:24 ` Steven Bosscher
@ 2005-08-30 19:58 ` Jan Hubicka
  2005-08-31  1:25   ` girish vaitheeswaran
  2005-08-31  4:33   ` Peter Steinmetz
  1 sibling, 2 replies; 29+ messages in thread
From: Jan Hubicka @ 2005-08-30 19:58 UTC (permalink / raw)
  To: Peter Steinmetz; +Cc: gcc, janis187, girish_vaithees, hubicka

> 
> There was some discussion a few weeks ago about some apps running slower
> with FDO enabled.
> 
> I've recently investigated a similar situation using mainline.  In my case,
> the fact that the loop_optimize pass is disabled during FDO was the cause
> of the slowdown.  It appears that was recently disabled as part of Jan
> Hubicka's patch to eliminate RTL based profiling.  The commentary indicates
> that the old loop optimizer is incompatible with tree profiling.
> 
> While this doesn't explain all of the degradations discussed (some were
> showing up on older versions of the compiler), it may explain some.

Do you have specific testcase?  It would be interesting to see if new
optimizer can catch up at least on kill-loop branch.

Thanks for investigating!
Honza
> 
> Pete

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-08-30 16:24 ` Steven Bosscher
@ 2005-08-30 17:57   ` Peter Steinmetz
  2005-08-31 11:44     ` Zdenek Dvorak
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Steinmetz @ 2005-08-30 17:57 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc, girish_vaithees, hubicka, janis187

>A more likely source of performance degradation is that loop unrolling
>is enabled when profiling, and loop unrolling is almost always a bad
>pessimization on 32 bits x86 targets.

To clarify, I was compiling with -funroll-loops and -fpeel-loops
enabled in both cases.

The FDO slowdown in my case was caused by the presence of some loop
invariant code that was getting removed from the loop by the loop
optimizer pass in the non-FDO case.

I'm running on powerpc-linux.

Pete

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-08-30 16:24 Peter Steinmetz
@ 2005-08-30 16:24 ` Steven Bosscher
  2005-08-30 17:57   ` Peter Steinmetz
  2005-08-30 19:58 ` Jan Hubicka
  1 sibling, 1 reply; 29+ messages in thread
From: Steven Bosscher @ 2005-08-30 16:24 UTC (permalink / raw)
  To: gcc; +Cc: Peter Steinmetz, janis187, girish_vaithees, hubicka

On Tuesday 30 August 2005 17:53, Peter Steinmetz wrote:
> While this doesn't explain all of the degradations discussed (some were
> showing up on older versions of the compiler), it may explain some.

There is a lot of empirical evidence that the loop optimizer already
doesn't do many useful things anymore, so this "some" is likely a
negligible some.
A more likely source of performance degradation is that loop unrolling
is enabled when profiling, and loop unrolling is almost always a bad
pessimization on 32 bits x86 targets.

Gr.
Steven

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
@ 2005-08-30 16:24 Peter Steinmetz
  2005-08-30 16:24 ` Steven Bosscher
  2005-08-30 19:58 ` Jan Hubicka
  0 siblings, 2 replies; 29+ messages in thread
From: Peter Steinmetz @ 2005-08-30 16:24 UTC (permalink / raw)
  To: gcc; +Cc: janis187, girish_vaithees, hubicka


There was some discussion a few weeks ago about some apps running slower
with FDO enabled.

I've recently investigated a similar situation using mainline.  In my case,
the fact that the loop_optimize pass is disabled during FDO was the cause
of the slowdown.  It appears that was recently disabled as part of Jan
Hubicka's patch to eliminate RTL based profiling.  The commentary indicates
that the old loop optimizer is incompatible with tree profiling.

While this doesn't explain all of the degradations discussed (some were
showing up on older versions of the compiler), it may explain some.

Pete

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: -fprofile-generate and -fprofile-use
  2005-07-20 16:53 girish vaitheeswaran
@ 2005-07-20 16:59 ` Steven Bosscher
  0 siblings, 0 replies; 29+ messages in thread
From: Steven Bosscher @ 2005-07-20 16:59 UTC (permalink / raw)
  To: gcc; +Cc: girish vaitheeswaran

On Wednesday 20 July 2005 18:53, girish vaitheeswaran wrote:
> I am seeing a 20% slowdown with feedback optimization.
> Does anyone have any thoughts on this.

My first thought is that you should probably first tell what compiler
you are using.

Gr.
Steven

^ permalink raw reply	[flat|nested] 29+ messages in thread

* -fprofile-generate and -fprofile-use
@ 2005-07-20 16:53 girish vaitheeswaran
  2005-07-20 16:59 ` Steven Bosscher
  0 siblings, 1 reply; 29+ messages in thread
From: girish vaitheeswaran @ 2005-07-20 16:53 UTC (permalink / raw)
  To: gcc

I am seeing a 20% slowdown with feedback optimization.
Does anyone have any thoughts on this.
These are the steps I followed

1. Compiled the application using -fprofile-generate.
I used this flag both in the compile flags and as part
of the link flags. I also had to use libgcov.a during
the link phase otherwise it would die looking for gcov
functions.
2. Ran a representative work-load
3. Followed (1) except that instead of
-fprofile-generate used -fprofile-use

I had to drop the "-fprofile-use" on 2 files as they
had corrupted profile data. I have tried this
experiment with -fprofile-arc and
-fbranch-probabilities as well and I get the same 20%
slowdown.

Thx
-girish

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2005-09-02 15:29 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-07-20 17:45 -fprofile-generate and -fprofile-use girish vaitheeswaran
2005-07-20 19:01 ` Janis Johnson
2005-07-20 20:16   ` girish vaitheeswaran
2005-07-20 22:44   ` Jan Hubicka
2005-07-20 23:38     ` girish vaitheeswaran
2005-07-21 13:03       ` Jan Hubicka
2005-07-25 17:39         ` girish vaitheeswaran
2005-07-26  7:00           ` Jan Hubicka
2005-07-26 18:10             ` girish vaitheeswaran
2005-07-30 19:20               ` Jan Hubicka
2005-07-21 18:03       ` Kelley Cook
2005-07-21 21:15         ` girish vaitheeswaran
     [not found] <AE7DE2E6-9131-4407-9585-746013E36070@apple.com>
2005-09-01 21:19 ` girish vaitheeswaran
2005-09-01 21:45   ` Steven Bosscher
2005-09-01 22:54     ` Janis Johnson
2005-09-01 23:13       ` Steven Bosscher
2005-09-02 15:29       ` Kai Henningsen
  -- strict thread matches above, loose matches on Subject: below --
2005-08-30 16:24 Peter Steinmetz
2005-08-30 16:24 ` Steven Bosscher
2005-08-30 17:57   ` Peter Steinmetz
2005-08-31 11:44     ` Zdenek Dvorak
2005-08-31 18:16       ` Peter Steinmetz
2005-09-01 19:04         ` Jan Hubicka
2005-09-02  9:01           ` Zdenek Dvorak
2005-08-30 19:58 ` Jan Hubicka
2005-08-31  1:25   ` girish vaitheeswaran
2005-08-31  4:33   ` Peter Steinmetz
2005-07-20 16:53 girish vaitheeswaran
2005-07-20 16:59 ` Steven Bosscher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).