[Bug middle-end/35545] virtual call specialization not happening with FDO

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug middle-end/35545] virtual call specialization not happening with FDO
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
@ 2013-12-17 16:34 ` hubicka at gcc dot gnu.org
  2013-12-17 17:29 ` hubicka at gcc dot gnu.org
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: hubicka at gcc dot gnu.org @ 2013-12-17 16:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2013-12-17
                 CC|                            |hubicka at gcc dot gnu.org,
                   |                            |mjambor at suse dot cz,
                   |                            |rguenther at suse dot de
     Ever confirmed|0                           |1

--- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Couple years later we finally devirtualize here:
  <bb 5>:
  ap_8 = operator new (16);
  ap_8->i = 0;
  ap_8->_vptr.A = &MEM[(void *)&_ZTV1A + 16B];
  _19 = foo;
  PROF_26 = [obj_type_ref] OBJ_TYPE_REF(_19;(struct A)ap_8->0);
  if (PROF_26 == foo)
    goto <bb 8>;
  else
    goto <bb 7>;

  <bb 6>:
  ap_13 = operator new (16);
  MEM[(struct B *)ap_13].D.2237.i = 0;
  MEM[(struct B *)ap_13].b = 0;
  MEM[(struct B *)ap_13].D.2237._vptr.A = &MEM[(void *)&_ZTV1B + 16B];
  _1 = foo;
  PROF_30 = [obj_type_ref] OBJ_TYPE_REF(_1;(struct A)ap_13->0);
  if (PROF_30 == foo)
    goto <bb 8>;
  else
    goto <bb 7>;

however the code ends up super sily after tracer.
for some reason we do not manage to fold away the virtual table lookup.
Why?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] virtual call specialization not happening with FDO
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
  2013-12-17 16:34 ` [Bug middle-end/35545] virtual call specialization not happening with FDO hubicka at gcc dot gnu.org
@ 2013-12-17 17:29 ` hubicka at gcc dot gnu.org
  2013-12-17 18:06 ` rguenther at suse dot de
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: hubicka at gcc dot gnu.org @ 2013-12-17 17:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #3 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Following patch gets rid of OBJ_TYPE_REF
Index: value-prof.c
===================================================================
--- value-prof.c        (revision 206040)
+++ value-prof.c        (working copy)
@@ -1333,9 +1333,15 @@ gimple_ic (gimple icall_stmt, struct cgr
   cond_bb = gimple_bb (icall_stmt);
   gsi = gsi_for_stmt (icall_stmt);

-  tmp0 = make_temp_ssa_name (optype, NULL, "PROF");
-  tmp1 = make_temp_ssa_name (optype, NULL, "PROF");
-  tmp = unshare_expr (gimple_call_fn (icall_stmt));
+  tmp0 = make_temp_ssa_name (optype, NULL, "SPEC");
+  tmp1 = make_temp_ssa_name (optype, NULL, "SPEC");
+  tmp = gimple_call_fn (icall_stmt);
+
+  /* Drop OBJ_TYPE_REF expression, it is ignored by rest of
+     optimization queue anyway.  */
+  if (TREE_CODE (tmp) == OBJ_TYPE_REF)
+    tmp = OBJ_TYPE_REF_EXPR (tmp);
+  tmp = unshare_expr (tmp);
   load_stmt = gimple_build_assign (tmp0, tmp);
   gsi_insert_before (&gsi, load_stmt, GSI_SAME_STMT);


but it does not help.  We end up:
  _1 = foo;
  if (_1 == foo)
    goto <bb 8>;

not much better.  This nonsense appears only in optimized dump, before we have
the following nonsense:
  # ap_9 = PHI <ap_13(6)>
  # prephitmp_18 = PHI <&MEM[(void *)&_ZTV1B + 16B](6)>
  _1 = *prephitmp_18;

that is introduced by duplicating
  <bb 7>:
  # ap_2 = PHI <ap_8(5), ap_13(6)>
  # prephitmp_14 = PHI <&MEM[(void *)&_ZTV1A + 16B](5), &MEM[(void *)&_ZTV1B +
16B](6)>
  _19 = *prephitmp_14;
  if (_19 == foo)
    goto <bb 9>;
  else
    goto <bb 8>;

The following patch:
Index: passes.def
===================================================================
--- passes.def  (revision 206040)
+++ passes.def  (working copy)
@@ -242,9 +242,9 @@ along with GCC; see the file COPYING3.
         only examines PHIs to discover const/copy propagation
         opportunities.  */
       NEXT_PASS (pass_phi_only_cprop);
+      NEXT_PASS (pass_tracer);
       NEXT_PASS (pass_vrp);
       NEXT_PASS (pass_cd_dce);
-      NEXT_PASS (pass_tracer);
       NEXT_PASS (pass_dse);
       NEXT_PASS (pass_forwprop);
       NEXT_PASS (pass_phiopt);

makes us to VRP value through but only when OBJ_TYPE_REF is nout around.
I think pushing tracer up is good idea - we should have at least one vrp/ccp
pass after tracer. Its main purpose is to provide enough context so those
forward propagating passes can do better job. it seems stupid to do it only
after everything was finished.  I also see why tracer would preffer DCE to be
done first, but we have only so many passes to do.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] virtual call specialization not happening with FDO
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
  2013-12-17 16:34 ` [Bug middle-end/35545] virtual call specialization not happening with FDO hubicka at gcc dot gnu.org
  2013-12-17 17:29 ` hubicka at gcc dot gnu.org
@ 2013-12-17 18:06 ` rguenther at suse dot de
  2013-12-17 18:09 ` hubicka at gcc dot gnu.org
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenther at suse dot de @ 2013-12-17 18:06 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> ---
"hubicka at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote:
>http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545
>
>Jan Hubicka <hubicka at gcc dot gnu.org> changed:
>
>           What    |Removed                     |Added
>----------------------------------------------------------------------------
>             Status|UNCONFIRMED                 |NEW
>   Last reconfirmed|                            |2013-12-17
>            CC|                            |hubicka at gcc dot gnu.org,
>                  |                            |mjambor at suse dot cz,
>                 |                            |rguenther at suse dot de
>     Ever confirmed|0                           |1
>
>--- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
>Couple years later we finally devirtualize here:
>  <bb 5>:
>  ap_8 = operator new (16);
>  ap_8->i = 0;
>  ap_8->_vptr.A = &MEM[(void *)&_ZTV1A + 16B];
>  _19 = foo;
>  PROF_26 = [obj_type_ref] OBJ_TYPE_REF(_19;(struct A)ap_8->0);
>  if (PROF_26 == foo)
>    goto <bb 8>;
>  else
>    goto <bb 7>;
>
>  <bb 6>:
>  ap_13 = operator new (16);
>  MEM[(struct B *)ap_13].D.2237.i = 0;
>  MEM[(struct B *)ap_13].b = 0;
>  MEM[(struct B *)ap_13].D.2237._vptr.A = &MEM[(void *)&_ZTV1B + 16B];
>  _1 = foo;
>  PROF_30 = [obj_type_ref] OBJ_TYPE_REF(_1;(struct A)ap_13->0);
>  if (PROF_30 == foo)
>    goto <bb 8>;
>  else
>    goto <bb 7>;
>
>however the code ends up super sily after tracer.
>for some reason we do not manage to fold away the virtual table lookup.
>Why?

This is probably doms fault. Does it even handle function pointers?

Tracer runs way too late and with no cleanups behind it.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] virtual call specialization not happening with FDO
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2013-12-17 18:06 ` rguenther at suse dot de
@ 2013-12-17 18:09 ` hubicka at gcc dot gnu.org
  2013-12-17 20:33 ` law at redhat dot com
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: hubicka at gcc dot gnu.org @ 2013-12-17 18:09 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #5 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Main issue seems to be that VRP messes up on:
  # ap_2 = PHI <ap_8(5)>
  # prephitmp_14 = PHI <&MEM[(void *)&_ZTV1A + 16B](5)>
  _19 = *prephitmp_14;

here it somehow won't constant propagate the load. 
Index: passes.def
===================================================================
--- passes.def  (revision 206040)
+++ passes.def  (working copy)
@@ -236,6 +236,7 @@ along with GCC; see the file COPYING3.
       NEXT_PASS (pass_reassoc);
       NEXT_PASS (pass_strength_reduction);
       NEXT_PASS (pass_dominator);
+      NEXT_PASS (pass_tracer);
       /* The only const/copy propagation opportunities left after
         DOM should be due to degenerate PHI nodes.  So rather than
         run the full propagators, run a specialized pass which
@@ -244,7 +245,6 @@ along with GCC; see the file COPYING3.
       NEXT_PASS (pass_phi_only_cprop);
       NEXT_PASS (pass_vrp);
       NEXT_PASS (pass_cd_dce);
-      NEXT_PASS (pass_tracer);
       NEXT_PASS (pass_dse);
       NEXT_PASS (pass_forwprop);
       NEXT_PASS (pass_phiopt);

actually helps since phi_only_cprop is good on this transform. I do not quite
gather why VRP can't do it itself.

I sent first patch to http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01517.html


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] virtual call specialization not happening with FDO
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2013-12-17 18:09 ` hubicka at gcc dot gnu.org
@ 2013-12-17 20:33 ` law at redhat dot com
  2013-12-17 20:39 ` hubicka at ucw dot cz
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: law at redhat dot com @ 2013-12-17 20:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

Jeffrey A. Law <law at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at redhat dot com

--- Comment #6 from Jeffrey A. Law <law at redhat dot com> ---
You certainly don't want to put something between DOM and phi-only-cprop.  Jump
threading will tend to expose lots of degenerate PHIs.  phi-only-cprop
eliminates those degenerates.  We could have used the normal cprop code, but it
seemed too heavy-weight for the cleanups we wanted to do.

One could argue we want a phi-only-cprop cleanup after VRP since it threads
jumps too.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] virtual call specialization not happening with FDO
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2013-12-17 20:33 ` law at redhat dot com
@ 2013-12-17 20:39 ` hubicka at ucw dot cz
  2013-12-17 20:53 ` law at redhat dot com
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: hubicka at ucw dot cz @ 2013-12-17 20:39 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #7 from Jan Hubicka <hubicka at ucw dot cz> ---
> You certainly don't want to put something between DOM and phi-only-cprop.  Jump
> threading will tend to expose lots of degenerate PHIs.  phi-only-cprop
> eliminates those degenerates.  We could have used the normal cprop code, but it
> seemed too heavy-weight for the cleanups we wanted to do.

Well, consttants, copies and PHIs are accounted as 0 size and thus not part of
tracer's
cost model, so perhaps we do not care about presence of degnerated PHIs here.
Moreover it is the degnerate PHI produced by tracer that causes are problems. I
assume
DOM does degnerate PHIs by code duplication for jump threading and tracer is
exactly the
same type of transformation and for same reasons we may want phi-only-cprop
after tracer
as we do it after DOM.

It however seems a more like VRP's missed optimization to not be able to paper
over that
degenerate PHI produced by tracer at first place, so I will try to poke about
this tomorrow
a bit more.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] virtual call specialization not happening with FDO
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2013-12-17 20:39 ` hubicka at ucw dot cz
@ 2013-12-17 20:53 ` law at redhat dot com
  2013-12-17 21:15 ` hubicka at ucw dot cz
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: law at redhat dot com @ 2013-12-17 20:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #8 from Jeffrey A. Law <law at redhat dot com> ---
It's not a matter of cost model, but if propagating the values to their uses. 
I haven't looked closely at the tracer, but wouldn't it benefit by having
constants in particular propagated to their uses?

Yes, DOM's duplication/isolation of paths exposes the degenerate PHIs.  So we
might have had:

One of the nice side effects of jump threading is that it isolates paths.  So
we might have had 

BBn
x = phi (a, b, c, constant, d, e, f)

duplication for threading might turn that into

BBn:
x = phi (a, b, c, d, e, f)  // The original


elsewhere

BBm:
x' = phi (constant)  // the duplicate


Propagating the constant for x' in BBm and eliminating the degenerate is what
the phi-only cprop pass does.  If the tracer generates similar things, then
running phi-only cprop after it might be useful as well.  It *should* be very
fast.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] virtual call specialization not happening with FDO
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2013-12-17 20:53 ` law at redhat dot com
@ 2013-12-17 21:15 ` hubicka at ucw dot cz
  2014-09-25 21:29 ` hubicka at gcc dot gnu.org
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: hubicka at ucw dot cz @ 2013-12-17 21:15 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #9 from Jan Hubicka <hubicka at ucw dot cz> ---
> It's not a matter of cost model, but if propagating the values to their uses. 
> I haven't looked closely at the tracer, but wouldn't it benefit by having
> constants in particular propagated to their uses?

Tracer depends on the usual estimate_num_insns limits
 (it is 12 years since I wrote it, so what I recall)
It collects BBs that are interesting starts of traces, takes them in priority
order and duplicates from the seeds until code growth is met or it runs out of
interesting candidates by other criteria.

I think it generally tends to starve on candidates as the definition of trace
is
relatively strong, but I am not 100% sure on it. So it is not that much
dependent
on bounds given by code size metric.

If we had unlimited time, it would  be better to propagate constants and
cleanup
both before and after tracer.

If we can chose whether we want to do tracer before last pass that is able
to propagate and fold constants or after, I would chose before for the reason
I mentioned on begginig; the whole point of the tail duplication is to simplify
CFG and allow better propagation.

I think missed tracing here and there is less painful than missed optimizatoin
in duplicated code.

We may even consider pushing tracer before DOM, since tail duplication may
enable
DOM to produce more useful threading/propagation and code after tracer is not
too painfuly obstructated. Sure you can end up with PHI that has only one
constant
argument.  I can see that DOM may miss optimization here.
> Propagating the constant for x' in BBm and eliminating the degenerate is what
> the phi-only cprop pass does.  If the tracer generates similar things, then
> running phi-only cprop after it might be useful as well.  It *should* be very
> fast.

Yes, tracer does similar things.  You can think about it as about speculative
jump threading - if one path through meet points seems more likely than the
other based on profile, tracer will duplicate it in a hope that later
optimization pass will prove some of conditionals constant over the duplicated
path.  For that it needs subsequent propagation pass (CCP or better VRP) to
match.  That is why its current place in pass queue is unlucky.  Possible
benefits of tail duplications are of course not limited to threading.

We can do one extra cleanup pass, too.  Tracer is on by default only with
-fprofile-use so extra phi-only cprop with -ftracer probably is not dangerous
to overall compile time experience.

Honza

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] virtual call specialization not happening with FDO
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2013-12-17 21:15 ` hubicka at ucw dot cz
@ 2014-09-25 21:29 ` hubicka at gcc dot gnu.org
  2014-09-26  7:44 ` [Bug middle-end/35545] tracer pass is run too late rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: hubicka at gcc dot gnu.org @ 2014-09-25 21:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #12 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
We still fail to fold here.  After tracer we get:
  # ap_2 = PHI <ap_8(4)>
  # prephitmp_14 = PHI <&MEM[(void *)&_ZTV1A + 16B](4)>
  _19 = *prephitmp_14;
  PROF_24 = [obj_type_ref] OBJ_TYPE_REF(_19;(struct A)ap_2->0);
  if (PROF_24 == foo)
    goto <bb 9>;
  else
    goto <bb 8>;

and the necessary forward propagation & folding to realize that:
  _19 = foo;
  PROF_24 = [obj_type_ref] OBJ_TYPE_REF(_19;(struct A)ap_8->0);
  if (PROF_24 == foo)

happens only in .optimized dump.

I would go with patch from comment 5 moving tracer earlier - the tail
duplication intends to make code more optimizable and it is stupid to run it
only after all optimizations are done.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] tracer pass is run too late
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2014-09-25 21:29 ` hubicka at gcc dot gnu.org
@ 2014-09-26  7:44 ` rguenth at gcc dot gnu.org
  2014-09-26 15:52 ` law at redhat dot com
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-09-26  7:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
Well - ideally you'd want to do some CSE, not only VRP (constant propagation in
this case).  So I'd move it even earlier, after pass_lower_vector_ssa?  I
can't see what tracing would break (some odd cases can destroy loop info, but
I think I have disabled all those).


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] tracer pass is run too late
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2014-09-26  7:44 ` [Bug middle-end/35545] tracer pass is run too late rguenth at gcc dot gnu.org
@ 2014-09-26 15:52 ` law at redhat dot com
  2014-09-27  0:04 ` hubicka at gcc dot gnu.org
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: law at redhat dot com @ 2014-09-26 15:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #14 from Jeffrey A. Law <law at redhat dot com> ---
This feels like the kind of situation where I've always wanted a pass to be
able to say something like "I've done some set of transformations", schedule
the appropriate cleanup passes to run".

It's heavyweight, but making  the tracer imply a later DOM or VRP pass might
make sense.

Jeff


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] tracer pass is run too late
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2014-09-26 15:52 ` law at redhat dot com
@ 2014-09-27  0:04 ` hubicka at gcc dot gnu.org
  2014-09-27  0:14 ` hubicka at gcc dot gnu.org
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: hubicka at gcc dot gnu.org @ 2014-09-27  0:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #15 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Author: hubicka
Date: Sat Sep 27 00:03:23 2014
New Revision: 215651

URL: https://gcc.gnu.org/viewcvs?rev=215651&root=gcc&view=rev
Log:

    PR middle-end/35545
    * passes.def (pass_tracer): Move before last dominator pass.
    * g++.dg/tree-prof/pr35545.C: New testcase.

Added:
    trunk/gcc/testsuite/g++.dg/tree-prof/pr35545.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/passes.def
    trunk/gcc/testsuite/ChangeLog


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] tracer pass is run too late
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2014-09-27  0:04 ` hubicka at gcc dot gnu.org
@ 2014-09-27  0:14 ` hubicka at gcc dot gnu.org
  2014-09-27  0:23 ` xinliangli at gmail dot com
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: hubicka at gcc dot gnu.org @ 2014-09-27  0:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |mliska at suse dot cz
           Assignee|unassigned at gcc dot gnu.org      |hubicka at gcc dot gnu.org

--- Comment #16 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
I have moved tracer before the late cleanups that seems to be rather obbious
thing to do. This lets us to optimize the testcase (with -O2):
int main() ()
{
  struct A * ap;
  int i;
  int _6;

  <bb 2>:

  <bb 3>:
  # i_29 = PHI <i_22(6), 0(2)>
  _6 = i_29 % 7;
  if (_6 == 0)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  ap_8 = operator new (16);
  ap_8->i = 0;
  ap_8->_vptr.A = &MEM[(void *)&_ZTV1A + 16B];
  goto <bb 6>;

  <bb 5>:
  ap_13 = operator new (16);
  MEM[(struct B *)ap_13].D.2244.i = 0;
  MEM[(struct B *)ap_13].b = 0;
  MEM[(struct B *)ap_13].D.2244._vptr.A = &MEM[(void *)&_ZTV1B + 16B];

  <bb 6>:
  # ap_4 = PHI <ap_13(5), ap_8(4)>
  operator delete (ap_4);
  i_22 = i_29 + 1;
  if (i_22 != 10000)
    goto <bb 3>;
  else
    goto <bb 7>;

  <bb 7>:
  return 0;

}

Martin, I do not have SPEC setup, do you think you can benchmark the attached
patch with SPEC and profile feedback and also non-FDO -O3 -ftracer compared to
-O3, please?
It would be nice to know code size impact, too.
Index: passes.def
===================================================================
--- passes.def  (revision 215651)
+++ passes.def  (working copy)
@@ -155,6 +155,7 @@ along with GCC; see the file COPYING3.
       NEXT_PASS (pass_dce);
       NEXT_PASS (pass_call_cdce);
       NEXT_PASS (pass_cselim);
+      NEXT_PASS (pass_tracer);
       NEXT_PASS (pass_copy_prop);
       NEXT_PASS (pass_tree_ifcombine);
       NEXT_PASS (pass_phiopt);
@@ -252,7 +253,6 @@ along with GCC; see the file COPYING3.
       NEXT_PASS (pass_cse_reciprocals);
       NEXT_PASS (pass_reassoc);
       NEXT_PASS (pass_strength_reduction);
-      NEXT_PASS (pass_tracer);
       NEXT_PASS (pass_dominator);
       NEXT_PASS (pass_strlen);
       NEXT_PASS (pass_vrp);

Doing it at same approximately the same place as loop header copying seems to
make most sense to me.  It benefits from early cleanups and DCE definitly and
it should enable more fun with the later scalar passes that are almost all
rerun then.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] tracer pass is run too late
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2014-09-27  0:14 ` hubicka at gcc dot gnu.org
@ 2014-09-27  0:23 ` xinliangli at gmail dot com
  2014-09-27  1:04 ` hubicka at ucw dot cz
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: xinliangli at gmail dot com @ 2014-09-27  0:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #17 from davidxl <xinliangli at gmail dot com> ---
(In reply to Jan Hubicka from comment #16)
> I have moved tracer before the late cleanups that seems to be rather obbious
> thing to do. This lets us to optimize the testcase (with -O2):
> int main() ()
> {
>   struct A * ap;
>   int i;
>   int _6;
> 
>   <bb 2>:
> 
>   <bb 3>:
>   # i_29 = PHI <i_22(6), 0(2)>
>   _6 = i_29 % 7;
>   if (_6 == 0)
>     goto <bb 4>;
>   else
>     goto <bb 5>;
> 
>   <bb 4>:
>   ap_8 = operator new (16);
>   ap_8->i = 0;
>   ap_8->_vptr.A = &MEM[(void *)&_ZTV1A + 16B];
>   goto <bb 6>;
> 
>   <bb 5>:
>   ap_13 = operator new (16);
>   MEM[(struct B *)ap_13].D.2244.i = 0;
>   MEM[(struct B *)ap_13].b = 0;
>   MEM[(struct B *)ap_13].D.2244._vptr.A = &MEM[(void *)&_ZTV1B + 16B];
> 
>   <bb 6>:
>   # ap_4 = PHI <ap_13(5), ap_8(4)>
>   operator delete (ap_4);
>   i_22 = i_29 + 1;
>   if (i_22 != 10000)
>     goto <bb 3>;
>   else
>     goto <bb 7>;
> 
>   <bb 7>:
>   return 0;
> 
> }
> 
> Martin, I do not have SPEC setup, do you think you can benchmark the
> attached patch with SPEC and profile feedback and also non-FDO -O3 -ftracer
> compared to -O3, please?
> It would be nice to know code size impact, too.
> Index: passes.def
> ===================================================================
> --- passes.def  (revision 215651)
> +++ passes.def  (working copy)
> @@ -155,6 +155,7 @@ along with GCC; see the file COPYING3.
>        NEXT_PASS (pass_dce);
>        NEXT_PASS (pass_call_cdce);
>        NEXT_PASS (pass_cselim);
> +      NEXT_PASS (pass_tracer);
>        NEXT_PASS (pass_copy_prop);
>        NEXT_PASS (pass_tree_ifcombine);
>        NEXT_PASS (pass_phiopt);
> @@ -252,7 +253,6 @@ along with GCC; see the file COPYING3.
>        NEXT_PASS (pass_cse_reciprocals);
>        NEXT_PASS (pass_reassoc);
>        NEXT_PASS (pass_strength_reduction);
> -      NEXT_PASS (pass_tracer);
>        NEXT_PASS (pass_dominator);
>        NEXT_PASS (pass_strlen);
>        NEXT_PASS (pass_vrp);
> 
> Doing it at same approximately the same place as loop header copying seems
> to make most sense to me.  It benefits from early cleanups and DCE definitly
> and it should enable more fun with the later scalar passes that are almost
> all rerun then.


WE can try some internal benchmarks with this change too.

David


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] tracer pass is run too late
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2014-09-27  0:23 ` xinliangli at gmail dot com
@ 2014-09-27  1:04 ` hubicka at ucw dot cz
  2014-09-28 18:22 ` xinliangli at gmail dot com
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: hubicka at ucw dot cz @ 2014-09-27  1:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #18 from Jan Hubicka <hubicka at ucw dot cz> ---
> WE can try some internal benchmarks with this change too.

That would be very welcome.  Tracer used to be quite useful pass in old days,
doing 1.6% on -O3+FDO SPECint (for 1.4% code size cost) that put it very close
to the inliner (1.8%) that however needed 12.9% of code size.

http://www.ucw.cz/~hubicka/papers/amd64/node4.html (see aggressive optimization
table)

I do not think it was ever resonably returned for GIMPLE. Martin, do you have
any current scores? Part of benefits came from lack of global optimizers, but
I think tail duplication has a potential in modern compiler, too.

Perhaps we should try to do that and given that we now have tail merging pass,
consider getting it useful without FDO, too.

Honza

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] tracer pass is run too late
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (14 preceding siblings ...)
  2014-09-27  1:04 ` hubicka at ucw dot cz
@ 2014-09-28 18:22 ` xinliangli at gmail dot com
  2014-09-29  8:17 ` rguenther at suse dot de
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: xinliangli at gmail dot com @ 2014-09-28 18:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #19 from davidxl <xinliangli at gmail dot com> ---
(In reply to Jan Hubicka from comment #18)
> > WE can try some internal benchmarks with this change too.
> 
> That would be very welcome.  Tracer used to be quite useful pass in old days,
> doing 1.6% on -O3+FDO SPECint (for 1.4% code size cost) that put it very
> close
> to the inliner (1.8%) that however needed 12.9% of code size.
> 
> http://www.ucw.cz/~hubicka/papers/amd64/node4.html (see aggressive
> optimization table)
> 
> I do not think it was ever resonably returned for GIMPLE. Martin, do you have
> any current scores? Part of benefits came from lack of global optimizers, but
> I think tail duplication has a potential in modern compiler, too.
> 
> Perhaps we should try to do that and given that we now have tail merging
> pass,
> consider getting it useful without FDO, too.
> 
> Honza

I saw small improvement across most of the benchmarks with this change on
sandybridge machines with FDO. The geomean improvement is about 0.3%. The
baseline compiler is Google gcc-49 compiler, and the test compiler is Google
gcc-49 with the patch.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] tracer pass is run too late
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (15 preceding siblings ...)
  2014-09-28 18:22 ` xinliangli at gmail dot com
@ 2014-09-29  8:17 ` rguenther at suse dot de
  2014-09-29 10:22 ` rguenther at suse dot de
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenther at suse dot de @ 2014-09-29  8:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #20 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 26 Sep 2014, law at redhat dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545
> 
> --- Comment #14 from Jeffrey A. Law <law at redhat dot com> ---
> This feels like the kind of situation where I've always wanted a pass to be
> able to say something like "I've done some set of transformations", schedule
> the appropriate cleanup passes to run".
> 
> It's heavyweight, but making  the tracer imply a later DOM or VRP pass might
> make sense.

Uh that idea will lead to very big compile-time increases.  One
thing I'd liked to do at some point is make the SSA propagators
and/or DOM work on MEME regions so we can schedule them on
a sub-CFG.

Richard.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] tracer pass is run too late
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (16 preceding siblings ...)
  2014-09-29  8:17 ` rguenther at suse dot de
@ 2014-09-29 10:22 ` rguenther at suse dot de
  2014-09-29 16:48 ` hubicka at ucw dot cz
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenther at suse dot de @ 2014-09-29 10:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #21 from rguenther at suse dot de <rguenther at suse dot de> ---
nOn Sat, 27 Sep 2014, hubicka at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545
> 
> Jan Hubicka <hubicka at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|NEW                         |ASSIGNED
>                  CC|                            |mliska at suse dot cz
>            Assignee|unassigned at gcc dot gnu.org      |hubicka at gcc dot gnu.org
> 
> --- Comment #16 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
> I have moved tracer before the late cleanups that seems to be rather obbious
> thing to do. This lets us to optimize the testcase (with -O2):
> int main() ()
> {
>   struct A * ap;
>   int i;
>   int _6;
> 
>   <bb 2>:
> 
>   <bb 3>:
>   # i_29 = PHI <i_22(6), 0(2)>
>   _6 = i_29 % 7;
>   if (_6 == 0)
>     goto <bb 4>;
>   else
>     goto <bb 5>;
> 
>   <bb 4>:
>   ap_8 = operator new (16);
>   ap_8->i = 0;
>   ap_8->_vptr.A = &MEM[(void *)&_ZTV1A + 16B];
>   goto <bb 6>;
> 
>   <bb 5>:
>   ap_13 = operator new (16);
>   MEM[(struct B *)ap_13].D.2244.i = 0;
>   MEM[(struct B *)ap_13].b = 0;
>   MEM[(struct B *)ap_13].D.2244._vptr.A = &MEM[(void *)&_ZTV1B + 16B];
> 
>   <bb 6>:
>   # ap_4 = PHI <ap_13(5), ap_8(4)>
>   operator delete (ap_4);
>   i_22 = i_29 + 1;
>   if (i_22 != 10000)
>     goto <bb 3>;
>   else
>     goto <bb 7>;
> 
>   <bb 7>:
>   return 0;
> 
> }
> 
> Martin, I do not have SPEC setup, do you think you can benchmark the attached
> patch with SPEC and profile feedback and also non-FDO -O3 -ftracer compared to
> -O3, please?
> It would be nice to know code size impact, too.
> Index: passes.def
> ===================================================================
> --- passes.def  (revision 215651)
> +++ passes.def  (working copy)
> @@ -155,6 +155,7 @@ along with GCC; see the file COPYING3.
>        NEXT_PASS (pass_dce);
>        NEXT_PASS (pass_call_cdce);
>        NEXT_PASS (pass_cselim);
> +      NEXT_PASS (pass_tracer);
>        NEXT_PASS (pass_copy_prop);
>        NEXT_PASS (pass_tree_ifcombine);
>        NEXT_PASS (pass_phiopt);
> @@ -252,7 +253,6 @@ along with GCC; see the file COPYING3.
>        NEXT_PASS (pass_cse_reciprocals);
>        NEXT_PASS (pass_reassoc);
>        NEXT_PASS (pass_strength_reduction);
> -      NEXT_PASS (pass_tracer);
>        NEXT_PASS (pass_dominator);
>        NEXT_PASS (pass_strlen);
>        NEXT_PASS (pass_vrp);
> 
> Doing it at same approximately the same place as loop header copying seems to
> make most sense to me.  It benefits from early cleanups and DCE definitly and
> it should enable more fun with the later scalar passes that are almost all
> rerun then.

We need to make sure tracer doesn't mess too much with loops then.
Btw, "useless" tracing may be undone again by tail-merging.

Tracer seems to consume only profile information and thus doesn't
rely on any other transforms (well, apart from cleanups which could
affect its cost function).  Why not schedule it even earlier?
Like to before pass_build_alias?  (the pipeline up to loop transforms
is quite a mess...)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] tracer pass is run too late
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (17 preceding siblings ...)
  2014-09-29 10:22 ` rguenther at suse dot de
@ 2014-09-29 16:48 ` hubicka at ucw dot cz
  2014-09-29 19:30 ` xinliangli at gmail dot com
  2014-10-01 11:30 ` mliska at suse dot cz
  20 siblings, 0 replies; 22+ messages in thread
From: hubicka at ucw dot cz @ 2014-09-29 16:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #22 from Jan Hubicka <hubicka at ucw dot cz> ---
> > Doing it at same approximately the same place as loop header copying seems to
> > make most sense to me.  It benefits from early cleanups and DCE definitly and
> > it should enable more fun with the later scalar passes that are almost all
> > rerun then.
> 
> We need to make sure tracer doesn't mess too much with loops then.
> Btw, "useless" tracing may be undone again by tail-merging.

Tracer already does:

              /* We have the tendency to duplicate the loop header
                 of all do { } while loops.  Do not do that - it is
                 not profitable and it might create a loop with multiple
                 entries or at least rotate the loop.  */
              && bb2->loop_father->header != bb2)

so it won't kill natural loops and peel (I should find time to update the
peeling pass). It also has:

      if (bb_seen_p (bb2) || (e->flags & (EDGE_DFS_BACK | EDGE_COMPLEX))
          || find_best_successor (bb2) != e)
        break;

to not unroll.  So i think it is safe for loop optimizer - all it can do
is expanding a loop that has control flow in it reducing its unrollability
later.  
> 
> Tracer seems to consume only profile information and thus doesn't
> rely on any other transforms (well, apart from cleanups which could
> affect its cost function).  Why not schedule it even earlier?
> Like to before pass_build_alias?  (the pipeline up to loop transforms
> is quite a mess...)

It uses profile information and code size estiamtes.  I expect that (especially
for C++ stuff) the early post inline passes will remove a lot of code and thus
improve traceability.  This is why I looked for spot after first DCE/DSE.

David, can you please be more specific about how you tested? Was it with
profile feedback?  What about code size metrics?

Honza


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] tracer pass is run too late
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (18 preceding siblings ...)
  2014-09-29 16:48 ` hubicka at ucw dot cz
@ 2014-09-29 19:30 ` xinliangli at gmail dot com
  2014-10-01 11:30 ` mliska at suse dot cz
  20 siblings, 0 replies; 22+ messages in thread
From: xinliangli at gmail dot com @ 2014-09-29 19:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #23 from davidxl <xinliangli at gmail dot com> ---
(In reply to Jan Hubicka from comment #22)
> > > Doing it at same approximately the same place as loop header copying seems to
> > > make most sense to me.  It benefits from early cleanups and DCE definitly and
> > > it should enable more fun with the later scalar passes that are almost all
> > > rerun then.
> > 
> > We need to make sure tracer doesn't mess too much with loops then.
> > Btw, "useless" tracing may be undone again by tail-merging.
> 
> Tracer already does:
> 
>               /* We have the tendency to duplicate the loop header
>                  of all do { } while loops.  Do not do that - it is
>                  not profitable and it might create a loop with multiple
>                  entries or at least rotate the loop.  */
>               && bb2->loop_father->header != bb2)
> 
> so it won't kill natural loops and peel (I should find time to update the
> peeling pass). It also has:
> 
>       if (bb_seen_p (bb2) || (e->flags & (EDGE_DFS_BACK | EDGE_COMPLEX))
>           || find_best_successor (bb2) != e)
>         break;
> 
> to not unroll.  So i think it is safe for loop optimizer - all it can do
> is expanding a loop that has control flow in it reducing its unrollability
> later.  
> > 
> > Tracer seems to consume only profile information and thus doesn't
> > rely on any other transforms (well, apart from cleanups which could
> > affect its cost function).  Why not schedule it even earlier?
> > Like to before pass_build_alias?  (the pipeline up to loop transforms
> > is quite a mess...)
> 
> It uses profile information and code size estiamtes.  I expect that
> (especially
> for C++ stuff) the early post inline passes will remove a lot of code and
> thus
> improve traceability.  This is why I looked for spot after first DCE/DSE.
> 
> David, can you please be more specific about how you tested? Was it with
> profile feedback?  What about code size metrics?

It is with profile feedback. There is also reduction in code size, though very
tiny change.

David

> 
> Honza


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] tracer pass is run too late
       [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
                   ` (19 preceding siblings ...)
  2014-09-29 19:30 ` xinliangli at gmail dot com
@ 2014-10-01 11:30 ` mliska at suse dot cz
  20 siblings, 0 replies; 22+ messages in thread
From: mliska at suse dot cz @ 2014-10-01 11:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #25 from Martin Liška <mliska at suse dot cz> ---
SPEC CPU numbers (--size=train, --iterations=3):

First 3 columns present time (where smaller means faster) and for binary size
the same.

+------------+----------+------------+---------------+-------------+-----------------+--------------------+
|            | gcc-O3   | gcc-O3-fdo | gcc-O3-tracer | gcc-O3_SIZE |
gcc-O3-fdo_SIZE | gcc-O3-tracer_SIZE |
+------------+----------+------------+---------------+-------------+-----------------+--------------------+
| GemsFDTD   | 100.00 % |    96.00 % |       95.56 % |    100.00 % |       
101.14 % |           109.58 % |
| hmmer      | 100.00 % |    97.44 % |       97.14 % |    100.00 % |        
73.94 % |           105.71 % |
| sphinx3    | 100.00 % |    95.42 % |       97.76 % |    100.00 % |        
88.44 % |           104.29 % |
| milc       | 100.00 % |    99.35 % |      101.13 % |    100.00 % |       
101.48 % |           103.98 % |
| cactusADM  | 100.00 % |    99.97 % |       99.46 % |    100.00 % |        
76.84 % |           106.86 % |
| tonto      | 100.00 % |    94.72 % |       98.04 % |    100.00 % |        
80.27 % |           107.15 % |
| bwaves     | 100.00 % |   104.47 % |      101.77 % |    100.00 % |       
100.33 % |           103.21 % |
| lbm        | 100.00 % |    96.65 % |       98.49 % |    100.00 % |       
106.25 % |           102.04 % |
| wrf        | 100.00 % |    61.39 % |       65.49 % |    100.00 % |        
64.51 % |           112.40 % |
| bzip2      | 100.00 % |    93.79 % |      100.10 % |    100.00 % |       
115.36 % |           108.78 % |
| leslie3d   | 100.00 % |    96.52 % |       98.73 % |    100.00 % |       
100.24 % |           107.03 % |
| gromacs    | 100.00 % |    97.12 % |      100.75 % |    100.00 % |        
77.08 % |           104.82 % |
| sjeng      | 100.00 % |    98.77 % |       99.80 % |    100.00 % |        
84.01 % |           101.87 % |
| calculix   | 100.00 % |   101.55 % |      100.28 % |    100.00 % |        
79.58 % |           104.36 % |
| astar      | 100.00 % |   101.70 % |       97.19 % |    100.00 % |       
102.73 % |           101.95 % |
| zeusmp     | 100.00 % |    98.79 % |      101.00 % |    100.00 % |       
113.59 % |           102.36 % |
| povray     | 100.00 % |    89.17 % |      100.35 % |    100.00 % |        
83.61 % |           105.07 % |
| omnetpp    | 100.00 % |    88.23 % |      102.54 % |    100.00 % |        
90.70 % |           102.47 % |
| mcf        | 100.00 % |    95.92 % |      100.03 % |    100.00 % |       
124.69 % |           102.62 % |
| gcc        | 100.00 % |    97.04 % |       98.88 % |    100.00 % |       
108.29 % |           101.24 % |
| h264ref    | 100.00 % |    94.61 % |      100.58 % |    100.00 % |        
88.25 % |           105.93 % |
| perlbench  | 100.00 % |    98.90 % |      100.46 % |    100.00 % |       
106.21 % |           106.23 % |
| xalancbmk  | 100.00 % |    75.17 % |       82.56 % |    100.00 % |        
87.10 % |           103.41 % |
| namd       | 100.00 % |    99.48 % |      100.59 % |    100.00 % |        
88.25 % |           100.74 % |
| gamess     | 100.00 % |   130.31 % |      131.62 % |    100.00 % |        
96.00 % |           101.59 % |
| libquantum | 100.00 % |    97.04 % |       98.98 % |    100.00 % |        
68.74 % |           104.55 % |
| soplex     | 100.00 % |    96.15 % |      100.92 % |    100.00 % |        
87.04 % |           100.96 % |
| gobmk      | 100.00 % |    91.22 % |      104.56 % |    100.00 % |        
84.16 % |           106.75 % |
+------------+----------+------------+---------------+-------------+-----------------+--------------------+
| AVG        | 100.00 % |    95.96 % |       99.10 % |    100.00 % |        
92.10 % |           104.57 % |
+------------+----------+------------+---------------+-------------+-----------------+--------------------+

Martin
>From gcc-bugs-return-463009-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Wed Oct 01 11:32:05 2014
Return-Path: <gcc-bugs-return-463009-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 18057 invoked by alias); 1 Oct 2014 11:32:04 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 18007 invoked by uid 48); 1 Oct 2014 11:32:00 -0000
From: "jakub at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/63428] New: [4.8/4.9/5 Regression] vshuf-v4di.c miscompilation
Date: Wed, 01 Oct 2014 11:32:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords: wrong-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: jakub at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: jakub at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status keywords bug_severity priority component assigned_to reporter cf_gcctarget
Message-ID: <bug-63428-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-10/txt/msg00030.txt.bz2
Content-length: 641

https://gcc.gnu.org/bugzilla/show_bug.cgi?idc428

            Bug ID: 63428
           Summary: [4.8/4.9/5 Regression] vshuf-v4di.c miscompilation
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Keywords: wrong-code
          Severity: normal
          Priority: P3
         Component: target
          Assignee: jakub at gcc dot gnu.org
          Reporter: jakub at gcc dot gnu.org
            Target: x86_64-linux

GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \
RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
fails starting with 4.8 (4.7 works), test_122 function is miscompiled.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug middle-end/35545] virtual call specialization not happening with FDO
  2008-03-12  5:56 [Bug middle-end/35545] New: virtual call specialization not happening with FDO xinliangli at gmail dot com
@ 2008-03-12  6:08 ` pinskia at gcc dot gnu dot org
  0 siblings, 0 replies; 22+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-03-12  6:08 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from pinskia at gcc dot gnu dot org  2008-03-12 06:07 -------
Jump threading happened by we did not prop the call correctly:
  ap = (struct A *) D.2075;
  ap->_vptr$A = &_ZTV1A[2];
  OBJ_TYPE_REF(*ap->_vptr$A;ap->0) (ap);

...
  this = (struct B *) D.2076;
  this->D.2018._vptr$A = &_ZTV1B[2];
  ap.61 = &this->D.2018;
  OBJ_TYPE_REF(*ap.61->_vptr$A;ap.61->0) (ap.61);


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-10-01 11:30 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-35545-4@http.gcc.gnu.org/bugzilla/>
2013-12-17 16:34 ` [Bug middle-end/35545] virtual call specialization not happening with FDO hubicka at gcc dot gnu.org
2013-12-17 17:29 ` hubicka at gcc dot gnu.org
2013-12-17 18:06 ` rguenther at suse dot de
2013-12-17 18:09 ` hubicka at gcc dot gnu.org
2013-12-17 20:33 ` law at redhat dot com
2013-12-17 20:39 ` hubicka at ucw dot cz
2013-12-17 20:53 ` law at redhat dot com
2013-12-17 21:15 ` hubicka at ucw dot cz
2014-09-25 21:29 ` hubicka at gcc dot gnu.org
2014-09-26  7:44 ` [Bug middle-end/35545] tracer pass is run too late rguenth at gcc dot gnu.org
2014-09-26 15:52 ` law at redhat dot com
2014-09-27  0:04 ` hubicka at gcc dot gnu.org
2014-09-27  0:14 ` hubicka at gcc dot gnu.org
2014-09-27  0:23 ` xinliangli at gmail dot com
2014-09-27  1:04 ` hubicka at ucw dot cz
2014-09-28 18:22 ` xinliangli at gmail dot com
2014-09-29  8:17 ` rguenther at suse dot de
2014-09-29 10:22 ` rguenther at suse dot de
2014-09-29 16:48 ` hubicka at ucw dot cz
2014-09-29 19:30 ` xinliangli at gmail dot com
2014-10-01 11:30 ` mliska at suse dot cz
2008-03-12  5:56 [Bug middle-end/35545] New: virtual call specialization not happening with FDO xinliangli at gmail dot com
2008-03-12  6:08 ` [Bug middle-end/35545] " pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).