public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/35545]  New: virtual call specialization not happening with FDO
@ 2008-03-12  5:56 xinliangli at gmail dot com
  2008-03-12  6:08 ` [Bug middle-end/35545] " pinskia at gcc dot gnu dot org
  0 siblings, 1 reply; 11+ messages in thread
From: xinliangli at gmail dot com @ 2008-03-12  5:56 UTC (permalink / raw)
  To: gcc-bugs

In the following example, virtual calls via ap should be speciallized -- there
is one dominatating call target, but compiling the program at -O3 with
-fprofile-use, it does not happen.

g++ -O1 -fprofile-generate devirt.cc
./a.out
g++ -fprofile-use -O3 -fdump-tree-optimized devirt.cc


// devirt.cc

class A {
public:
  virtual int foo() {
     return 1;
  }

int i;
};

class B : public A
{
public:
  virtual int foo() {
     return 2;
  }

 int b;
} ;


int main()
{
 int i;

  A* ap = 0;

  for (i = 0; i < 10000; i++)
  {

     if (i%7==0)
     {
        ap = new A();
     }
     else
        ap = new B();

    ap->foo();

    delete ap;

  }

  return 0;

}


-- 
           Summary: virtual call specialization not happening with FDO
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: xinliangli at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/35545] virtual call specialization not happening with FDO
  2008-03-12  5:56 [Bug middle-end/35545] New: virtual call specialization not happening with FDO xinliangli at gmail dot com
@ 2008-03-12  6:08 ` pinskia at gcc dot gnu dot org
  0 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-03-12  6:08 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from pinskia at gcc dot gnu dot org  2008-03-12 06:07 -------
Jump threading happened by we did not prop the call correctly:
  ap = (struct A *) D.2075;
  ap->_vptr$A = &_ZTV1A[2];
  OBJ_TYPE_REF(*ap->_vptr$A;ap->0) (ap);

...
  this = (struct B *) D.2076;
  this->D.2018._vptr$A = &_ZTV1B[2];
  ap.61 = &this->D.2018;
  OBJ_TYPE_REF(*ap.61->_vptr$A;ap.61->0) (ap.61);


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/35545] virtual call specialization not happening with FDO
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2013-12-17 21:15 ` hubicka at ucw dot cz
@ 2014-09-25 21:29 ` hubicka at gcc dot gnu.org
  8 siblings, 0 replies; 11+ messages in thread
From: hubicka at gcc dot gnu.org @ 2014-09-25 21:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #12 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
We still fail to fold here.  After tracer we get:
  # ap_2 = PHI <ap_8(4)>
  # prephitmp_14 = PHI <&MEM[(void *)&_ZTV1A + 16B](4)>
  _19 = *prephitmp_14;
  PROF_24 = [obj_type_ref] OBJ_TYPE_REF(_19;(struct A)ap_2->0);
  if (PROF_24 == foo)
    goto <bb 9>;
  else
    goto <bb 8>;

and the necessary forward propagation & folding to realize that:
  _19 = foo;
  PROF_24 = [obj_type_ref] OBJ_TYPE_REF(_19;(struct A)ap_8->0);
  if (PROF_24 == foo)

happens only in .optimized dump.

I would go with patch from comment 5 moving tracer earlier - the tail
duplication intends to make code more optimizable and it is stupid to run it
only after all optimizations are done.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/35545] virtual call specialization not happening with FDO
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2013-12-17 20:53 ` law at redhat dot com
@ 2013-12-17 21:15 ` hubicka at ucw dot cz
  2014-09-25 21:29 ` hubicka at gcc dot gnu.org
  8 siblings, 0 replies; 11+ messages in thread
From: hubicka at ucw dot cz @ 2013-12-17 21:15 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #9 from Jan Hubicka <hubicka at ucw dot cz> ---
> It's not a matter of cost model, but if propagating the values to their uses. 
> I haven't looked closely at the tracer, but wouldn't it benefit by having
> constants in particular propagated to their uses?

Tracer depends on the usual estimate_num_insns limits
 (it is 12 years since I wrote it, so what I recall)
It collects BBs that are interesting starts of traces, takes them in priority
order and duplicates from the seeds until code growth is met or it runs out of
interesting candidates by other criteria.

I think it generally tends to starve on candidates as the definition of trace
is
relatively strong, but I am not 100% sure on it. So it is not that much
dependent
on bounds given by code size metric.

If we had unlimited time, it would  be better to propagate constants and
cleanup
both before and after tracer.

If we can chose whether we want to do tracer before last pass that is able
to propagate and fold constants or after, I would chose before for the reason
I mentioned on begginig; the whole point of the tail duplication is to simplify
CFG and allow better propagation.

I think missed tracing here and there is less painful than missed optimizatoin
in duplicated code.

We may even consider pushing tracer before DOM, since tail duplication may
enable
DOM to produce more useful threading/propagation and code after tracer is not
too painfuly obstructated. Sure you can end up with PHI that has only one
constant
argument.  I can see that DOM may miss optimization here.
> Propagating the constant for x' in BBm and eliminating the degenerate is what
> the phi-only cprop pass does.  If the tracer generates similar things, then
> running phi-only cprop after it might be useful as well.  It *should* be very
> fast.

Yes, tracer does similar things.  You can think about it as about speculative
jump threading - if one path through meet points seems more likely than the
other based on profile, tracer will duplicate it in a hope that later
optimization pass will prove some of conditionals constant over the duplicated
path.  For that it needs subsequent propagation pass (CCP or better VRP) to
match.  That is why its current place in pass queue is unlucky.  Possible
benefits of tail duplications are of course not limited to threading.

We can do one extra cleanup pass, too.  Tracer is on by default only with
-fprofile-use so extra phi-only cprop with -ftracer probably is not dangerous
to overall compile time experience.

Honza


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/35545] virtual call specialization not happening with FDO
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2013-12-17 20:39 ` hubicka at ucw dot cz
@ 2013-12-17 20:53 ` law at redhat dot com
  2013-12-17 21:15 ` hubicka at ucw dot cz
  2014-09-25 21:29 ` hubicka at gcc dot gnu.org
  8 siblings, 0 replies; 11+ messages in thread
From: law at redhat dot com @ 2013-12-17 20:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #8 from Jeffrey A. Law <law at redhat dot com> ---
It's not a matter of cost model, but if propagating the values to their uses. 
I haven't looked closely at the tracer, but wouldn't it benefit by having
constants in particular propagated to their uses?

Yes, DOM's duplication/isolation of paths exposes the degenerate PHIs.  So we
might have had:

One of the nice side effects of jump threading is that it isolates paths.  So
we might have had 

BBn
x = phi (a, b, c, constant, d, e, f)

duplication for threading might turn that into

BBn:
x = phi (a, b, c, d, e, f)  // The original


elsewhere

BBm:
x' = phi (constant)  // the duplicate


Propagating the constant for x' in BBm and eliminating the degenerate is what
the phi-only cprop pass does.  If the tracer generates similar things, then
running phi-only cprop after it might be useful as well.  It *should* be very
fast.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/35545] virtual call specialization not happening with FDO
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2013-12-17 20:33 ` law at redhat dot com
@ 2013-12-17 20:39 ` hubicka at ucw dot cz
  2013-12-17 20:53 ` law at redhat dot com
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: hubicka at ucw dot cz @ 2013-12-17 20:39 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #7 from Jan Hubicka <hubicka at ucw dot cz> ---
> You certainly don't want to put something between DOM and phi-only-cprop.  Jump
> threading will tend to expose lots of degenerate PHIs.  phi-only-cprop
> eliminates those degenerates.  We could have used the normal cprop code, but it
> seemed too heavy-weight for the cleanups we wanted to do.

Well, consttants, copies and PHIs are accounted as 0 size and thus not part of
tracer's
cost model, so perhaps we do not care about presence of degnerated PHIs here.
Moreover it is the degnerate PHI produced by tracer that causes are problems. I
assume
DOM does degnerate PHIs by code duplication for jump threading and tracer is
exactly the
same type of transformation and for same reasons we may want phi-only-cprop
after tracer
as we do it after DOM.

It however seems a more like VRP's missed optimization to not be able to paper
over that
degenerate PHI produced by tracer at first place, so I will try to poke about
this tomorrow
a bit more.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/35545] virtual call specialization not happening with FDO
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2013-12-17 18:09 ` hubicka at gcc dot gnu.org
@ 2013-12-17 20:33 ` law at redhat dot com
  2013-12-17 20:39 ` hubicka at ucw dot cz
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: law at redhat dot com @ 2013-12-17 20:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

Jeffrey A. Law <law at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at redhat dot com

--- Comment #6 from Jeffrey A. Law <law at redhat dot com> ---
You certainly don't want to put something between DOM and phi-only-cprop.  Jump
threading will tend to expose lots of degenerate PHIs.  phi-only-cprop
eliminates those degenerates.  We could have used the normal cprop code, but it
seemed too heavy-weight for the cleanups we wanted to do.

One could argue we want a phi-only-cprop cleanup after VRP since it threads
jumps too.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/35545] virtual call specialization not happening with FDO
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2013-12-17 18:06 ` rguenther at suse dot de
@ 2013-12-17 18:09 ` hubicka at gcc dot gnu.org
  2013-12-17 20:33 ` law at redhat dot com
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: hubicka at gcc dot gnu.org @ 2013-12-17 18:09 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #5 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Main issue seems to be that VRP messes up on:
  # ap_2 = PHI <ap_8(5)>
  # prephitmp_14 = PHI <&MEM[(void *)&_ZTV1A + 16B](5)>
  _19 = *prephitmp_14;

here it somehow won't constant propagate the load. 
Index: passes.def
===================================================================
--- passes.def  (revision 206040)
+++ passes.def  (working copy)
@@ -236,6 +236,7 @@ along with GCC; see the file COPYING3.
       NEXT_PASS (pass_reassoc);
       NEXT_PASS (pass_strength_reduction);
       NEXT_PASS (pass_dominator);
+      NEXT_PASS (pass_tracer);
       /* The only const/copy propagation opportunities left after
         DOM should be due to degenerate PHI nodes.  So rather than
         run the full propagators, run a specialized pass which
@@ -244,7 +245,6 @@ along with GCC; see the file COPYING3.
       NEXT_PASS (pass_phi_only_cprop);
       NEXT_PASS (pass_vrp);
       NEXT_PASS (pass_cd_dce);
-      NEXT_PASS (pass_tracer);
       NEXT_PASS (pass_dse);
       NEXT_PASS (pass_forwprop);
       NEXT_PASS (pass_phiopt);

actually helps since phi_only_cprop is good on this transform. I do not quite
gather why VRP can't do it itself.

I sent first patch to http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01517.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/35545] virtual call specialization not happening with FDO
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
  2013-12-17 16:34 ` hubicka at gcc dot gnu.org
  2013-12-17 17:29 ` hubicka at gcc dot gnu.org
@ 2013-12-17 18:06 ` rguenther at suse dot de
  2013-12-17 18:09 ` hubicka at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: rguenther at suse dot de @ 2013-12-17 18:06 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> ---
"hubicka at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote:
>http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545
>
>Jan Hubicka <hubicka at gcc dot gnu.org> changed:
>
>           What    |Removed                     |Added
>----------------------------------------------------------------------------
>             Status|UNCONFIRMED                 |NEW
>   Last reconfirmed|                            |2013-12-17
>            CC|                            |hubicka at gcc dot gnu.org,
>                  |                            |mjambor at suse dot cz,
>                 |                            |rguenther at suse dot de
>     Ever confirmed|0                           |1
>
>--- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
>Couple years later we finally devirtualize here:
>  <bb 5>:
>  ap_8 = operator new (16);
>  ap_8->i = 0;
>  ap_8->_vptr.A = &MEM[(void *)&_ZTV1A + 16B];
>  _19 = foo;
>  PROF_26 = [obj_type_ref] OBJ_TYPE_REF(_19;(struct A)ap_8->0);
>  if (PROF_26 == foo)
>    goto <bb 8>;
>  else
>    goto <bb 7>;
>
>  <bb 6>:
>  ap_13 = operator new (16);
>  MEM[(struct B *)ap_13].D.2237.i = 0;
>  MEM[(struct B *)ap_13].b = 0;
>  MEM[(struct B *)ap_13].D.2237._vptr.A = &MEM[(void *)&_ZTV1B + 16B];
>  _1 = foo;
>  PROF_30 = [obj_type_ref] OBJ_TYPE_REF(_1;(struct A)ap_13->0);
>  if (PROF_30 == foo)
>    goto <bb 8>;
>  else
>    goto <bb 7>;
>
>however the code ends up super sily after tracer.
>for some reason we do not manage to fold away the virtual table lookup.
>Why?

This is probably doms fault. Does it even handle function pointers?

Tracer runs way too late and with no cleanups behind it.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/35545] virtual call specialization not happening with FDO
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
  2013-12-17 16:34 ` hubicka at gcc dot gnu.org
@ 2013-12-17 17:29 ` hubicka at gcc dot gnu.org
  2013-12-17 18:06 ` rguenther at suse dot de
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: hubicka at gcc dot gnu.org @ 2013-12-17 17:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #3 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Following patch gets rid of OBJ_TYPE_REF
Index: value-prof.c
===================================================================
--- value-prof.c        (revision 206040)
+++ value-prof.c        (working copy)
@@ -1333,9 +1333,15 @@ gimple_ic (gimple icall_stmt, struct cgr
   cond_bb = gimple_bb (icall_stmt);
   gsi = gsi_for_stmt (icall_stmt);

-  tmp0 = make_temp_ssa_name (optype, NULL, "PROF");
-  tmp1 = make_temp_ssa_name (optype, NULL, "PROF");
-  tmp = unshare_expr (gimple_call_fn (icall_stmt));
+  tmp0 = make_temp_ssa_name (optype, NULL, "SPEC");
+  tmp1 = make_temp_ssa_name (optype, NULL, "SPEC");
+  tmp = gimple_call_fn (icall_stmt);
+
+  /* Drop OBJ_TYPE_REF expression, it is ignored by rest of
+     optimization queue anyway.  */
+  if (TREE_CODE (tmp) == OBJ_TYPE_REF)
+    tmp = OBJ_TYPE_REF_EXPR (tmp);
+  tmp = unshare_expr (tmp);
   load_stmt = gimple_build_assign (tmp0, tmp);
   gsi_insert_before (&gsi, load_stmt, GSI_SAME_STMT);


but it does not help.  We end up:
  _1 = foo;
  if (_1 == foo)
    goto <bb 8>;

not much better.  This nonsense appears only in optimized dump, before we have
the following nonsense:
  # ap_9 = PHI <ap_13(6)>
  # prephitmp_18 = PHI <&MEM[(void *)&_ZTV1B + 16B](6)>
  _1 = *prephitmp_18;

that is introduced by duplicating
  <bb 7>:
  # ap_2 = PHI <ap_8(5), ap_13(6)>
  # prephitmp_14 = PHI <&MEM[(void *)&_ZTV1A + 16B](5), &MEM[(void *)&_ZTV1B +
16B](6)>
  _19 = *prephitmp_14;
  if (_19 == foo)
    goto <bb 9>;
  else
    goto <bb 8>;

The following patch:
Index: passes.def
===================================================================
--- passes.def  (revision 206040)
+++ passes.def  (working copy)
@@ -242,9 +242,9 @@ along with GCC; see the file COPYING3.
         only examines PHIs to discover const/copy propagation
         opportunities.  */
       NEXT_PASS (pass_phi_only_cprop);
+      NEXT_PASS (pass_tracer);
       NEXT_PASS (pass_vrp);
       NEXT_PASS (pass_cd_dce);
-      NEXT_PASS (pass_tracer);
       NEXT_PASS (pass_dse);
       NEXT_PASS (pass_forwprop);
       NEXT_PASS (pass_phiopt);

makes us to VRP value through but only when OBJ_TYPE_REF is nout around.
I think pushing tracer up is good idea - we should have at least one vrp/ccp
pass after tracer. Its main purpose is to provide enough context so those
forward propagating passes can do better job. it seems stupid to do it only
after everything was finished.  I also see why tracer would preffer DCE to be
done first, but we have only so many passes to do.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/35545] virtual call specialization not happening with FDO
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
@ 2013-12-17 16:34 ` hubicka at gcc dot gnu.org
  2013-12-17 17:29 ` hubicka at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: hubicka at gcc dot gnu.org @ 2013-12-17 16:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2013-12-17
                 CC|                            |hubicka at gcc dot gnu.org,
                   |                            |mjambor at suse dot cz,
                   |                            |rguenther at suse dot de
     Ever confirmed|0                           |1

--- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Couple years later we finally devirtualize here:
  <bb 5>:
  ap_8 = operator new (16);
  ap_8->i = 0;
  ap_8->_vptr.A = &MEM[(void *)&_ZTV1A + 16B];
  _19 = foo;
  PROF_26 = [obj_type_ref] OBJ_TYPE_REF(_19;(struct A)ap_8->0);
  if (PROF_26 == foo)
    goto <bb 8>;
  else
    goto <bb 7>;

  <bb 6>:
  ap_13 = operator new (16);
  MEM[(struct B *)ap_13].D.2237.i = 0;
  MEM[(struct B *)ap_13].b = 0;
  MEM[(struct B *)ap_13].D.2237._vptr.A = &MEM[(void *)&_ZTV1B + 16B];
  _1 = foo;
  PROF_30 = [obj_type_ref] OBJ_TYPE_REF(_1;(struct A)ap_13->0);
  if (PROF_30 == foo)
    goto <bb 8>;
  else
    goto <bb 7>;

however the code ends up super sily after tracer.
for some reason we do not manage to fold away the virtual table lookup.
Why?


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-09-25 21:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-12  5:56 [Bug middle-end/35545] New: virtual call specialization not happening with FDO xinliangli at gmail dot com
2008-03-12  6:08 ` [Bug middle-end/35545] " pinskia at gcc dot gnu dot org
     [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
2013-12-17 16:34 ` hubicka at gcc dot gnu.org
2013-12-17 17:29 ` hubicka at gcc dot gnu.org
2013-12-17 18:06 ` rguenther at suse dot de
2013-12-17 18:09 ` hubicka at gcc dot gnu.org
2013-12-17 20:33 ` law at redhat dot com
2013-12-17 20:39 ` hubicka at ucw dot cz
2013-12-17 20:53 ` law at redhat dot com
2013-12-17 21:15 ` hubicka at ucw dot cz
2014-09-25 21:29 ` hubicka at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).