public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/35544]  New: Error with -fprofile-use
@ 2008-03-12  5:49 xinliangli at gmail dot com
  2008-03-12  6:05 ` [Bug middle-end/35544] " pinskia at gcc dot gnu dot org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: xinliangli at gmail dot com @ 2008-03-12  5:49 UTC (permalink / raw)
  To: gcc-bugs

In the following example, profile data generated by -O0 binary run can not be
used for profile-use at -O2. This is either a bug or design flaw if not
supported.

g++ -fprofile-generate devirt.cc
./a.out
g++ -fprofile-use -O2 devirt.cc

==>
devirt.cc: In function 'int main()':
devirt.cc:45: error: coverage mismatch for function 'main' while reading
counter 'arcs'
devirt.cc:45: note: number of counters is 9 instead of 7


// devirt.cc

class A {
public:
  virtual int foo() {
     return 1;
  }

int i;
};

class B : public A
{
public:
  virtual int foo() {
     return 2;
  }

 int b;
} ;


int main()
{
 int i;

  A* ap = 0;

  for (i = 0; i < 10000; i++)
  {

     if (i%7==0)
     {
        ap = new A();
     }
     else
        ap = new B();

    ap->foo();

    delete ap;

  }

  return 0;

}


-- 
           Summary: Error with -fprofile-use
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: xinliangli at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35544


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/35544] Error with -fprofile-use
  2008-03-12  5:49 [Bug middle-end/35544] New: Error with -fprofile-use xinliangli at gmail dot com
@ 2008-03-12  6:05 ` pinskia at gcc dot gnu dot org
  2008-03-12  6:25 ` xinliangli at gmail dot com
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-03-12  6:05 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from pinskia at gcc dot gnu dot org  2008-03-12 06:04 -------
This is by design, you need to use the same options for -fprofile-use as you do
for -fprofile-generate.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Error with -fprofile-use    |Error with -fprofile-use


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35544


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/35544] Error with -fprofile-use
  2008-03-12  5:49 [Bug middle-end/35544] New: Error with -fprofile-use xinliangli at gmail dot com
  2008-03-12  6:05 ` [Bug middle-end/35544] " pinskia at gcc dot gnu dot org
@ 2008-03-12  6:25 ` xinliangli at gmail dot com
  2008-03-12  6:29 ` pinskia at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: xinliangli at gmail dot com @ 2008-03-12  6:25 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from xinliangli at gmail dot com  2008-03-12 06:24 -------
(In reply to comment #1)
> This is by design, you need to use the same options for -fprofile-use as you do
> for -fprofile-generate.
> 

This model won't work well when -O4 (ipo) is in place. If instrumentation and
annotation happens at the same phase in the compiler pipeline, I don't see a
reason why such restriction is needed.

David


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35544


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/35544] Error with -fprofile-use
  2008-03-12  5:49 [Bug middle-end/35544] New: Error with -fprofile-use xinliangli at gmail dot com
  2008-03-12  6:05 ` [Bug middle-end/35544] " pinskia at gcc dot gnu dot org
  2008-03-12  6:25 ` xinliangli at gmail dot com
@ 2008-03-12  6:29 ` pinskia at gcc dot gnu dot org
  2008-03-12  6:41 ` xinliangli at gmail dot com
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-03-12  6:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from pinskia at gcc dot gnu dot org  2008-03-12 06:28 -------
(In reply to comment #2)
> This model won't work well when -O4 (ipo) is in place. If instrumentation and
> annotation happens at the same phase in the compiler pipeline, I don't see a
> reason why such restriction is needed.

Why do you think it will not work well with LTO?

The instrumentation and annotation already happens at the same place which is
why the same options are required.  They just happen after some inlining and
other early optimization to make sure the profiling does not get too slow.

-- Pinski


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35544


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/35544] Error with -fprofile-use
  2008-03-12  5:49 [Bug middle-end/35544] New: Error with -fprofile-use xinliangli at gmail dot com
                   ` (2 preceding siblings ...)
  2008-03-12  6:29 ` pinskia at gcc dot gnu dot org
@ 2008-03-12  6:41 ` xinliangli at gmail dot com
  2008-03-12  6:51 ` pinskia at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: xinliangli at gmail dot com @ 2008-03-12  6:41 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from xinliangli at gmail dot com  2008-03-12 06:40 -------
(In reply to comment #3)
> (In reply to comment #2)
> > This model won't work well when -O4 (ipo) is in place. If instrumentation and
> > annotation happens at the same phase in the compiler pipeline, I don't see a
> > reason why such restriction is needed.
> 
> Why do you think it will not work well with LTO?

At least it is tricky with cross module inlining happening for instrumented
functions -- the inline instances should really updating counters in the
original routines.

Besides, this requirement will significantly slow down an instrumented compile
for IPO.


> 
> The instrumentation and annotation already happens at the same place which is
> why the same options are required. 

Sounds like 'same options' is sufficient for that to happen, but not necessary.

> They just happen after some inlining and
> other early optimization to make sure the profiling does not get too slow.

If the early optimizations always happen at -O1 and above, a more reasonable
requirement is that instrumentation compile must be done at -O1 and above.
Again, how much performance gain can be achieved with early optimization for
instrumented binary (which is slow anyway)?

David

> 
> -- Pinski
> 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35544


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/35544] Error with -fprofile-use
  2008-03-12  5:49 [Bug middle-end/35544] New: Error with -fprofile-use xinliangli at gmail dot com
                   ` (3 preceding siblings ...)
  2008-03-12  6:41 ` xinliangli at gmail dot com
@ 2008-03-12  6:51 ` pinskia at gcc dot gnu dot org
  2008-03-12  6:53 ` pinskia at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-03-12  6:51 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from pinskia at gcc dot gnu dot org  2008-03-12 06:50 -------
For reference see:
http://gcc.gnu.org/ml/gcc-patches/2007-01/msg01205.html
http://gcc.gnu.org/ml/gcc-patches/2005-03/msg00200.html
http://gcc.gnu.org/ml/gcc/2005-12/msg00215.html

And most likely a couple others too.  I cannot find a good reference explaining
why early inline helps a lot for C++ code like tramp3d.

>Again, how much performance gain can be achieved with early optimization for
instrumented binary (which is slow anyway)?

IIRC there was a huge win (like a 10x speedup) for doing this.


>  At least it is tricky with cross module inlining happening for instrumented
> functions -- the inline instances should really updating counters in the
> original routines.

Depends on if the inline is done with early inline or late and really really if
it is done with late, the counters will be emitted with the LTO file before
reading them back in so the counters will be correct already and inlining
becomes obvious.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35544


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/35544] Error with -fprofile-use
  2008-03-12  5:49 [Bug middle-end/35544] New: Error with -fprofile-use xinliangli at gmail dot com
                   ` (4 preceding siblings ...)
  2008-03-12  6:51 ` pinskia at gcc dot gnu dot org
@ 2008-03-12  6:53 ` pinskia at gcc dot gnu dot org
  2008-03-12  7:13 ` xinliangli at gmail dot com
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-03-12  6:53 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from pinskia at gcc dot gnu dot org  2008-03-12 06:52 -------
I should note that when GCC emits the annotations for profiling, it actually
emits the counter updates and all and other optimizations don't need to know
about them except if they want to know the information about what they had
provided.  I Bet this is different from the compiler you were developing.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35544


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/35544] Error with -fprofile-use
  2008-03-12  5:49 [Bug middle-end/35544] New: Error with -fprofile-use xinliangli at gmail dot com
                   ` (5 preceding siblings ...)
  2008-03-12  6:53 ` pinskia at gcc dot gnu dot org
@ 2008-03-12  7:13 ` xinliangli at gmail dot com
  2008-03-12  7:18 ` xinliangli at gmail dot com
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: xinliangli at gmail dot com @ 2008-03-12  7:13 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from xinliangli at gmail dot com  2008-03-12 07:12 -------
(In reply to comment #6)
> I should note that when GCC emits the annotations for profiling, it actually
> emits the counter updates and all and other optimizations don't need to know
> about them except if they want to know the information about what they had
> provided.  I Bet this is different from the compiler you were developing.
> 

So gcc does not generate inline sequence for update counters but via runtime
library calls for counter update, or I completely misunderstood?

David


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35544


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/35544] Error with -fprofile-use
  2008-03-12  5:49 [Bug middle-end/35544] New: Error with -fprofile-use xinliangli at gmail dot com
                   ` (6 preceding siblings ...)
  2008-03-12  7:13 ` xinliangli at gmail dot com
@ 2008-03-12  7:18 ` xinliangli at gmail dot com
  2008-03-12 10:09 ` rguenth at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: xinliangli at gmail dot com @ 2008-03-12  7:18 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from xinliangli at gmail dot com  2008-03-12 07:17 -------
(In reply to comment #5)
> For reference see:
> http://gcc.gnu.org/ml/gcc-patches/2007-01/msg01205.html
> http://gcc.gnu.org/ml/gcc-patches/2005-03/msg00200.html
> http://gcc.gnu.org/ml/gcc/2005-12/msg00215.html
> 
> And most likely a couple others too.  I cannot find a good reference explaining
> why early inline helps a lot for C++ code like tramp3d.
> 
> >Again, how much performance gain can be achieved with early optimization for
> instrumented binary (which is slow anyway)?
> 
> IIRC there was a huge win (like a 10x speedup) for doing this.
> 

Yes, that is actually likely for C++ apps. For the compiler I worked with
before, the default optimization level is -O1 which performs user directed
inlining -- so the implicit requirement for instrumentation is actually -O1 and
above.
> 
> >  At least it is tricky with cross module inlining happening for instrumented
> > functions -- the inline instances should really updating counters in the
> > original routines.
> 
> Depends on if the inline is done with early inline or late and really really if
> it is done with late, the counters will be emitted with the LTO file before
> reading them back in so the counters will be correct already and inlining
> becomes obvious.
> 

I was referring to cross module inlining that happens late. Looks like for gcc,
this won't be an issue in theory.

David


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35544


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/35544] Error with -fprofile-use
  2008-03-12  5:49 [Bug middle-end/35544] New: Error with -fprofile-use xinliangli at gmail dot com
                   ` (7 preceding siblings ...)
  2008-03-12  7:18 ` xinliangli at gmail dot com
@ 2008-03-12 10:09 ` rguenth at gcc dot gnu dot org
  2008-03-13  6:02 ` wilson at tuliptree dot org
  2008-03-13  6:21 ` xinliangli at gmail dot com
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-03-12 10:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from rguenth at gcc dot gnu dot org  2008-03-12 10:08 -------
GCC inserts code to update counters inline.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |INVALID


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35544


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/35544] Error with -fprofile-use
  2008-03-12  5:49 [Bug middle-end/35544] New: Error with -fprofile-use xinliangli at gmail dot com
                   ` (8 preceding siblings ...)
  2008-03-12 10:09 ` rguenth at gcc dot gnu dot org
@ 2008-03-13  6:02 ` wilson at tuliptree dot org
  2008-03-13  6:21 ` xinliangli at gmail dot com
  10 siblings, 0 replies; 12+ messages in thread
From: wilson at tuliptree dot org @ 2008-03-13  6:02 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from wilson at tuliptree dot org  2008-03-13 06:02 -------
Subject: Re:   New: Error with -fprofile-use

xinliangli at gmail dot com wrote:
> In the following example, profile data generated by -O0 binary run can not be
> used for profile-use at -O2. This is either a bug or design flaw if not
> supported.

Gcc constructs a minimal spanning tree for the CFG, and then only adds 
instrumentation code to arcs that are not on the minimal spanning tree. 
    We can compute arc counts for the rest of the CFG from this set.  We 
can compute basic block execution counts from the arc counts.  This 
reduces the run-time cost of the instrumentation code, since only a 
small fraction of the basic blocks require profiling code.  However, it 
also requires that we have exactly the same CFG with -fprofile-generate 
as we have with -fprofile-use, and hence you must use the same 
optimization options with both compiles.

Jim


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35544


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/35544] Error with -fprofile-use
  2008-03-12  5:49 [Bug middle-end/35544] New: Error with -fprofile-use xinliangli at gmail dot com
                   ` (9 preceding siblings ...)
  2008-03-13  6:02 ` wilson at tuliptree dot org
@ 2008-03-13  6:21 ` xinliangli at gmail dot com
  10 siblings, 0 replies; 12+ messages in thread
From: xinliangli at gmail dot com @ 2008-03-13  6:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from xinliangli at gmail dot com  2008-03-13 06:20 -------
(In reply to comment #10)
> Subject: Re:   New: Error with -fprofile-use
> 
> xinliangli at gmail dot com wrote:
> > In the following example, profile data generated by -O0 binary run can not be
> > used for profile-use at -O2. This is either a bug or design flaw if not
> > supported.
> 
> Gcc constructs a minimal spanning tree for the CFG, and then only adds 
> instrumentation code to arcs that are not on the minimal spanning tree. 
>     We can compute arc counts for the rest of the CFG from this set.  We 
> can compute basic block execution counts from the arc counts.  This 
> reduces the run-time cost of the instrumentation code, since only a 
> small fraction of the basic blocks require profiling code.  However, it 
> also requires that we have exactly the same CFG with -fprofile-generate 
> as we have with -fprofile-use, and hence you must use the same 
> optimization options with both compiles.
> 
> Jim
> 

Gcc's design minimizes runtime overhead for edge profiling -- this is a very
good design. Same CFG is the requirement for other compilers as well, but some
compilers do tolerate some differences even with source some source changes --
this give user the flexibility to use slightly stale profile data while still
getting some performance benefit.  Please note that people already hate FDO due
to its complicated steps -- adding more constraints won't help to make them
happy.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35544


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-03-13  6:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-12  5:49 [Bug middle-end/35544] New: Error with -fprofile-use xinliangli at gmail dot com
2008-03-12  6:05 ` [Bug middle-end/35544] " pinskia at gcc dot gnu dot org
2008-03-12  6:25 ` xinliangli at gmail dot com
2008-03-12  6:29 ` pinskia at gcc dot gnu dot org
2008-03-12  6:41 ` xinliangli at gmail dot com
2008-03-12  6:51 ` pinskia at gcc dot gnu dot org
2008-03-12  6:53 ` pinskia at gcc dot gnu dot org
2008-03-12  7:13 ` xinliangli at gmail dot com
2008-03-12  7:18 ` xinliangli at gmail dot com
2008-03-12 10:09 ` rguenth at gcc dot gnu dot org
2008-03-13  6:02 ` wilson at tuliptree dot org
2008-03-13  6:21 ` xinliangli at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).