[Bug tree-optimization/21829] [4.1 Regression] missed jump threading after unroller

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/21829] [4.1 Regression] missed jump threading after unroller
       [not found] <bug-21829-6528@http.gcc.gnu.org/bugzilla/>
@ 2005-10-29 15:52 ` pinskia at gcc dot gnu dot org
  2005-10-30 23:32 ` pinskia at gcc dot gnu dot org
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-10-29 15:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from pinskia at gcc dot gnu dot org  2005-10-29 15:52 -------
This really does not need to be targeted towards 4.1 (this is a bug I filed
looking at code gen problems with some weird code which I made up).


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2005-09-16 23:14:06         |2005-10-29 15:52:25
               date|                            |
   Target Milestone|4.1.0                       |4.2.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21829


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/21829] [4.1 Regression] missed jump threading after unroller
       [not found] <bug-21829-6528@http.gcc.gnu.org/bugzilla/>
  2005-10-29 15:52 ` [Bug tree-optimization/21829] [4.1 Regression] missed jump threading after unroller pinskia at gcc dot gnu dot org
@ 2005-10-30 23:32 ` pinskia at gcc dot gnu dot org
  2006-02-09  3:20 ` [Bug tree-optimization/21829] [4.1/4.2 " law at redhat dot com
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-10-30 23:32 UTC (permalink / raw)
  To: gcc-bugs



-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P5
   Target Milestone|4.2.0                       |4.1.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21829


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/21829] [4.1/4.2 Regression] missed jump threading after unroller
       [not found] <bug-21829-6528@http.gcc.gnu.org/bugzilla/>
  2005-10-29 15:52 ` [Bug tree-optimization/21829] [4.1 Regression] missed jump threading after unroller pinskia at gcc dot gnu dot org
  2005-10-30 23:32 ` pinskia at gcc dot gnu dot org
@ 2006-02-09  3:20 ` law at redhat dot com
  2006-02-11  0:59 ` pinskia at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: law at redhat dot com @ 2006-02-09  3:20 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from law at redhat dot com  2006-02-09 03:19 -------
I'll note this really isn't a jump threading issue.  This is a fundamental
weakness in a dominator based optimizer vs a truly global optimizer.

What we've got is a block which looks something like this:

# u_18 = PHI <u_13(4), u_26(3)>;
<L1>:;
D.1528_10 = u_18 % 2;
if (D.1528_10 == 1) goto <L2>; else goto <L3>;

Note carefully that this block is at a dominance frontier (thus the PHI node). 
At a dominance frontier, a dominator walk does _not_ guarantee that the CFG
parents are visited before the node itself.  ie, we may visit this block before
its CFG parents.    That is in fact precisely what happens.  Thus we do not
discover that u_13 and u_26 both have the value 1 until after we have processed
this block.

Possible solutions:

  1. Iterating DOM.  I don't want to do that.

  2. Run CCP before DOM.  It's something I've pondered, mostly because I want
     to remove constant propagation from DOM.  Compile-time issues worry me.

  3. Simple constant/copy propagation in the renaming phase.  I've always
     been a fan of this to avoid clutter in the IL, but I've been overruled
     on this as a design decision.

  4. Somehow arrange to sort the et-tree dominance information so that we
     visit immediate descendants of a CFG node which are dominated by the
     node before visiting more distant CFG descendants.  This could be
     horribly ugly.

  5. Somehow get really really smart in the block duplication code and arrange
     for it to propagate constants.  I've pondered this, but I don't think
     it's really that feasible both from a design and implementation
standpoint.

  6. Post-dom cleanup phase using worklists seeded by degenerate PHI nodes to
     identify statements that can be simplified as a result of discovery of
     the degenerate PHI.  [Would replace the mini-copyprop pass we do after
     DOM. ]


I suspect #2 or #6 are the most likely long term solutions.

jeff


-- 

law at redhat dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21829


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/21829] [4.1/4.2 Regression] missed jump threading after unroller
       [not found] <bug-21829-6528@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2006-02-09  3:20 ` [Bug tree-optimization/21829] [4.1/4.2 " law at redhat dot com
@ 2006-02-11  0:59 ` pinskia at gcc dot gnu dot org
  2006-02-28 20:39 ` mmitchel at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-02-11  0:59 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from pinskia at gcc dot gnu dot org  2006-02-11 00:59 -------
The problem with this one after Jeff's recent patches is that we have:
<L13>:;
  D.1402_17 = 0;
  if (D.1402_17 == 1) goto <L15>; else goto <L14>;

<L15>:;
  x_18 = 1;

  # x_19 = PHI <0(2), 0(3), x_18(4)>;
<L14>:;

Which causes us not to be able to the jump threading as we do a CCP in VRP and
then we get:
<bb 2>:
  if (v_8 < 0) goto <L13>; else goto <L14>;

<L13>:;
  D.1402_17 = 0;
  goto <bb 8> (<L18>);

  # x_19 = PHI <0(2)>;
<L14>:;
  u_20 = 1;
  ivtmp.26_21 = 1;
  ivtmp.26_3 = 1;
  u_14 = 1;
  x_13 = 0;
  if (v_8 <= 0) goto <L1>; else goto <L3>;

So we need to be able to do some CCP and some DCE before invoking VRP.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21829


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/21829] [4.1/4.2 Regression] missed jump threading after unroller
       [not found] <bug-21829-6528@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2006-02-11  0:59 ` pinskia at gcc dot gnu dot org
@ 2006-02-28 20:39 ` mmitchel at gcc dot gnu dot org
  2006-03-21  5:09 ` law at redhat dot com
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2006-02-28 20:39 UTC (permalink / raw)
  To: gcc-bugs



-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.1.0                       |4.1.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21829


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/21829] [4.1/4.2 Regression] missed jump threading after unroller
       [not found] <bug-21829-6528@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2006-02-28 20:39 ` mmitchel at gcc dot gnu dot org
@ 2006-03-21  5:09 ` law at redhat dot com
  2006-03-21  5:10 ` law at redhat dot com
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: law at redhat dot com @ 2006-03-21  5:09 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #6 from law at redhat dot com  2006-03-21 05:09 -------
Subject: Re:  [4.1/4.2 Regression] missed jump
        threading after unroller

On Sat, 2006-02-11 at 00:59 +0000, pinskia at gcc dot gnu dot org wrote:
> 
> ------- Comment #5 from pinskia at gcc dot gnu dot org  2006-02-11 00:59 -------
> The problem with this one after Jeff's recent patches is that we have:
> <L13>:;
>   D.1402_17 = 0;
>   if (D.1402_17 == 1) goto <L15>; else goto <L14>;
> 
> <L15>:;
>   x_18 = 1;
> 
>   # x_19 = PHI <0(2), 0(3), x_18(4)>;
> <L14>:;
> 
> Which causes us not to be able to the jump threading as we do a CCP in VRP and
> then we get:
> <bb 2>:
>   if (v_8 < 0) goto <L13>; else goto <L14>;
> 
> <L13>:;
>   D.1402_17 = 0;
>   goto <bb 8> (<L18>);
> 
>   # x_19 = PHI <0(2)>;
> <L14>:;
>   u_20 = 1;
>   ivtmp.26_21 = 1;
>   ivtmp.26_3 = 1;
>   u_14 = 1;
>   x_13 = 0;
>   if (v_8 <= 0) goto <L1>; else goto <L3>;
> 
> So we need to be able to do some CCP and some DCE before invoking VRP.
As I mentioned earlier, the problem is inherent with non-iterating
dominator optimizations -- namely that we can't guarantee that for
block BB that we will visit all of BB's predecessors before visiting
BB when BB is at a dominance frontier.

The net result is that we may still be propagating values into a PHI
node for a block which has already been visited.  Those propagations
may result in the PHI turning into a degenerate too late for the
dominator optimizer to discover the degenerate PHI and record the
appropriate equivalence for the LHS of the degenerate PHI.

I had added support for a stripped down copy-prop to run after DOM
to pick up these kind of optimization opportunities rooted at degenerate
PHI nodes that were trivial copies.  (previously we had been running
the full-blown copy-prop pass).

However, that code does not catch this case because our PHI looks like

x_2 = PHI (0 (BB1), 0 (BB2))

ie, it's a constant initialization, not a PHI-copy.

I had played with running a similarly stripped down CCP after DOM,
but that's really expensive compile-time wise.

I then experimented with speeding up the propagation engine.  While
I may have found a couple of micro-opts, I wasn't able to find anything
which was going to give us a big enough improvement to make running
a phi-only CCP a viable option from a compile-time standpoint.

I then proceeded to implement a concept that had been floating around
in the back of my mind.  Namely a specialized PHI const/copy optimizer
which used a dominator walk plus a worklist of statements/phis to
revisit after the DOM walk (which allows us to detect secondary
effects).

It turns out this specialized PHI optimization pass is as effective
as running copy-prop and CCP on PHI nodes after DOM.  Better yet, it
is a teeny tiny slowdown compared to just running the stripped down
copyprop.  ie, for an almost unmeasurable slowdown we can do both
constant and copy propagation instead of just copy propagation.

Net result is in this PR we're able to clean up the extraneous
modulo operation and propagate its result as well.

Note that the resulting code could be simplified even more, namely
iterating VRP could allow simplification of a relational into an
equality test.  That's simply not going to happen.  We'll have to
live with less than 100% optimized code.

Bootstrapped and regression tested on i686-pc-linux-gnu.

------- Comment #7 from law at redhat dot com  2006-03-21 05:09 -------
Created an attachment (id=11077)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11077&action=view)

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21829

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/21829] [4.1/4.2 Regression] missed jump threading after unroller
       [not found] <bug-21829-6528@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2006-03-21  5:09 ` law at redhat dot com
@ 2006-03-21  5:10 ` law at redhat dot com
  2006-03-22 11:14 ` richard dot guenther at gmail dot com
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: law at redhat dot com @ 2006-03-21  5:10 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from law at redhat dot com  2006-03-21 05:10 -------
Today's patch picks up the missed const-propagation and allows 
simplification of the modulo operation.

THere's still the opportunitity to use range information to simplify
a conditional.  However, fixing that is just basically iterating
VRP in response to CFG changes -- not something we're going to
do as it's simply too expensive.

Thus I'm changing the state of this bug to WONTFIX.


-- 

law at redhat dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21829


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/21829] [4.1/4.2 Regression] missed jump threading after unroller
       [not found] <bug-21829-6528@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2006-03-21  5:10 ` law at redhat dot com
@ 2006-03-22 11:14 ` richard dot guenther at gmail dot com
  2006-03-22 14:01 ` law at redhat dot com
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: richard dot guenther at gmail dot com @ 2006-03-22 11:14 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from richard dot guenther at gmail dot com  2006-03-22 11:14 -------
Subject: Re:  [4.1/4.2 Regression] missed jump threading after unroller

On 3/21/06, Jeffrey A Law <law@redhat.com> wrote:
> It turns out this specialized PHI optimization pass is as effective
> as running copy-prop and CCP on PHI nodes after DOM.  Better yet, it
> is a teeny tiny slowdown compared to just running the stripped down
> copyprop.  ie, for an almost unmeasurable slowdown we can do both
> constant and copy propagation instead of just copy propagation.

This patch caused a compile-time regression from 139s to 143s, resp.
192s to 197s (leafify) accounted by increases of operand scan / SSA incremental
and tree CCP times for compiling tramp3d.  Also memory usage during compiling
went up from 655494 kB to 660626kB (this may be due to the VRP patch, though).

Runtime of tramp3d did not improve but regress slightly (but that
might be in the
noise - we'll see).

For this simple cleanup pass can you try updating SSA form manually please?

Thanks,
Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21829


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/21829] [4.1/4.2 Regression] missed jump threading after unroller
       [not found] <bug-21829-6528@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2006-03-22 11:14 ` richard dot guenther at gmail dot com
@ 2006-03-22 14:01 ` law at redhat dot com
  2006-03-22 15:06 ` richard dot guenther at gmail dot com
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: law at redhat dot com @ 2006-03-22 14:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from law at redhat dot com  2006-03-22 14:01 -------
Subject: Re:  [4.1/4.2 Regression] missed jump
        threading after unroller

On Wed, 2006-03-22 at 12:14 +0100, Richard Guenther wrote:
> On 3/21/06, Jeffrey A Law <law@redhat.com> wrote:
> > It turns out this specialized PHI optimization pass is as effective
> > as running copy-prop and CCP on PHI nodes after DOM.  Better yet, it
> > is a teeny tiny slowdown compared to just running the stripped down
> > copyprop.  ie, for an almost unmeasurable slowdown we can do both
> > constant and copy propagation instead of just copy propagation.
> 
> This patch caused a compile-time regression from 139s to 143s, resp.
> 192s to 197s (leafify) accounted by increases of operand scan / SSA incremental
> and tree CCP times for compiling tramp3d.  Also memory usage during compiling
> went up from 655494 kB to 660626kB (this may be due to the VRP patch, though).
> 
> Runtime of tramp3d did not improve but regress slightly (but that
> might be in the
> noise - we'll see).
> 
> For this simple cleanup pass can you try updating SSA form manually please?
I'm more than happy to look at it; however, be aware that if you're
seeing increased time in CCP then either you're seeing some truly
bizzarre secondary effect or your testing methodology is suspect.
The patch did not affect CCP.  In fact, the changes only affect
passes which run *after* CCP in the optimization pipeline.

Jeff


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21829


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/21829] [4.1/4.2 Regression] missed jump threading after unroller
       [not found] <bug-21829-6528@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2006-03-22 14:01 ` law at redhat dot com
@ 2006-03-22 15:06 ` richard dot guenther at gmail dot com
  2006-03-22 15:36 ` law at redhat dot com
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: richard dot guenther at gmail dot com @ 2006-03-22 15:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from richard dot guenther at gmail dot com  2006-03-22 15:06 -------
Subject: Re:  [4.1/4.2 Regression] missed jump threading after unroller

On 3/22/06, Jeffrey A Law <law@redhat.com> wrote:
> On Wed, 2006-03-22 at 12:14 +0100, Richard Guenther wrote:
> > On 3/21/06, Jeffrey A Law <law@redhat.com> wrote:
> > > It turns out this specialized PHI optimization pass is as effective
> > > as running copy-prop and CCP on PHI nodes after DOM.  Better yet, it
> > > is a teeny tiny slowdown compared to just running the stripped down
> > > copyprop.  ie, for an almost unmeasurable slowdown we can do both
> > > constant and copy propagation instead of just copy propagation.
> >
> > This patch caused a compile-time regression from 139s to 143s, resp.
> > 192s to 197s (leafify) accounted by increases of operand scan / SSA incremental
> > and tree CCP times for compiling tramp3d.  Also memory usage during compiling
> > went up from 655494 kB to 660626kB (this may be due to the VRP patch, though).
> >
> > Runtime of tramp3d did not improve but regress slightly (but that
> > might be in the
> > noise - we'll see).
> >
> > For this simple cleanup pass can you try updating SSA form manually please?
> I'm more than happy to look at it; however, be aware that if you're
> seeing increased time in CCP then either you're seeing some truly
> bizzarre secondary effect or your testing methodology is suspect.
> The patch did not affect CCP.  In fact, the changes only affect
> passes which run *after* CCP in the optimization pipeline.

struct tree_opt_pass pass_phi_only_cprop =
{
  "phicprop",                           /* name */
  gate_dominator,                       /* gate */
  eliminate_degenerate_phis,            /* execute */
  NULL,                                 /* sub */
  NULL,                                 /* next */
  0,                                    /* static_pass_number */
  TV_TREE_CCP,                          /* tv_id */
  PROP_cfg | PROP_ssa | PROP_alias,     /* properties_required */
  0,                                    /* properties_provided */
  PROP_smt_usage,                       /* properties_destroyed */
  0,                                    /* todo_flags_start */
  TODO_cleanup_cfg | TODO_dump_func
    | TODO_ggc_collect | TODO_verify_ssa
    | TODO_verify_stmts | TODO_update_smt_usage
    | TODO_update_ssa, /* todo_flags_finish */
  0                                     /* letter */
};

see tv_id - so I guess increased CCP times are expected.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21829


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/21829] [4.1/4.2 Regression] missed jump threading after unroller
       [not found] <bug-21829-6528@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2006-03-22 15:06 ` richard dot guenther at gmail dot com
@ 2006-03-22 15:36 ` law at redhat dot com
  2006-03-28 19:14 ` law at redhat dot com
  2006-03-30 17:15 ` law at redhat dot com
  12 siblings, 0 replies; 13+ messages in thread
From: law at redhat dot com @ 2006-03-22 15:36 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from law at redhat dot com  2006-03-22 15:36 -------
Subject: Re:  [4.1/4.2 Regression] missed jump
        threading after unroller

On Wed, 2006-03-22 at 16:06 +0100, Richard Guenther wrote:
> ;
> 
> see tv_id - so I guess increased CCP times are expected.
Nuts.  I should have made a separate tv_id.  I'll fix that
shortly.  

I probably won't be able to look at the compile-time regression
for your testcase until Friday.

jeff


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21829


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/21829] [4.1/4.2 Regression] missed jump threading after unroller
       [not found] <bug-21829-6528@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2006-03-22 15:36 ` law at redhat dot com
@ 2006-03-28 19:14 ` law at redhat dot com
  2006-03-30 17:15 ` law at redhat dot com
  12 siblings, 0 replies; 13+ messages in thread
From: law at redhat dot com @ 2006-03-28 19:14 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from law at redhat dot com  2006-03-28 19:13 -------
Subject: Re:  [4.1/4.2 Regression] missed jump
        threading after unroller

On Wed, 2006-03-22 at 16:06 +0100, Richard Guenther wrote:
> On 3/22/06, Jeffrey A Law <law@redhat.com> wrote:
> > On Wed, 2006-03-22 at 12:14 +0100, Richard Guenther wrote:
> > > On 3/21/06, Jeffrey A Law <law@redhat.com> wrote:
> > > > It turns out this specialized PHI optimization pass is as effective
> > > > as running copy-prop and CCP on PHI nodes after DOM.  Better yet, it
> > > > is a teeny tiny slowdown compared to just running the stripped down
> > > > copyprop.  ie, for an almost unmeasurable slowdown we can do both
> > > > constant and copy propagation instead of just copy propagation.
> > >
> > > This patch caused a compile-time regression from 139s to 143s, resp.
> > > 192s to 197s (leafify) accounted by increases of operand scan / SSA incremental
> > > and tree CCP times for compiling tramp3d.  Also memory usage during compiling
> > > went up from 655494 kB to 660626kB (this may be due to the VRP patch, though).
> > >
> > > Runtime of tramp3d did not improve but regress slightly (but that
> > > might be in the
> > > noise - we'll see).
> > >
> > > For this simple cleanup pass can you try updating SSA form manually please?
> > I'm more than happy to look at it; however, be aware that if you're
> > seeing increased time in CCP then either you're seeing some truly
> > bizzarre secondary effect or your testing methodology is suspect.
> > The patch did not affect CCP.  In fact, the changes only affect
> > passes which run *after* CCP in the optimization pipeline.
> 
> struct tree_opt_pass pass_phi_only_cprop =
> {
>   "phicprop",                           /* name */
>   gate_dominator,                       /* gate */
>   eliminate_degenerate_phis,            /* execute */
>   NULL,                                 /* sub */
>   NULL,                                 /* next */
>   0,                                    /* static_pass_number */
>   TV_TREE_CCP,                          /* tv_id */
>   PROP_cfg | PROP_ssa | PROP_alias,     /* properties_required */
>   0,                                    /* properties_provided */
>   PROP_smt_usage,                       /* properties_destroyed */
>   0,                                    /* todo_flags_start */
>   TODO_cleanup_cfg | TODO_dump_func
>     | TODO_ggc_collect | TODO_verify_ssa
>     | TODO_verify_stmts | TODO_update_smt_usage
>     | TODO_update_ssa, /* todo_flags_finish */
>   0                                     /* letter */
> };
> 
> see tv_id - so I guess increased CCP times are expected.
I've created a separate timevar for the phi-only cprop code so
at least we can see how much time's taking.  Note that while we
should be seeing a small amount of time in phi-cprop, we should
be seeing a reduction in TV_TREE_CCP and TV_TREE_COPY_PROP.

Can you send me a recent tramp3d.ii file so that I can poke
around and see if there's anything we can do about the compile
time regression?

Thanks,
jeff


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21829


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/21829] [4.1/4.2 Regression] missed jump threading after unroller
       [not found] <bug-21829-6528@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2006-03-28 19:14 ` law at redhat dot com
@ 2006-03-30 17:15 ` law at redhat dot com
  12 siblings, 0 replies; 13+ messages in thread
From: law at redhat dot com @ 2006-03-30 17:15 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #14 from law at redhat dot com  2006-03-30 17:15 -------
Just a note on the compile-time regressions for tramp3d...

After fixing the timevars it was pretty clear that it isn't the cprop code
itself that is slow, it is in fact very fast.  THe slowdowns for tramp3d are in
operand scanning and incremental SSA updates.

I built a little instrumentation code and was rewarded with some insight into
why tramp3d behaves differently for operand scanning.   When we propagate into
a statement, we (of course) fold and rescan the operands for the use statement.
 Clearly if we propagate several distinct copies into a single use statement,
then we end up wasting time rescanning the use statement.

My instrumentation recorded how often we perform more than one propagation into
a statement vs how often we only propagate into a statement one time.  In my
test bucket that ratio is a little less than 1:1.  I would have expected
significantly smaller, but it is what it is.

What's interesting is that for tramp3d that ratio is about 3:1 -- primarily
from  copy propagating into VUSE and V_MAY_DEF operands.

I'm currently experimenting with some code to queue folding/rescanning of use
statements in cases where there's a reasonable chance we're going to do more
than one propagation into the statement.  Initial experiments are showing a
measurable compile-time improvement for both tramp3d as well as my testbucket.

[ Note that Diego's memory tag rewrite work may make all this moot one day... ]

Jeff

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21829

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2006-03-30 17:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-21829-6528@http.gcc.gnu.org/bugzilla/>
2005-10-29 15:52 ` [Bug tree-optimization/21829] [4.1 Regression] missed jump threading after unroller pinskia at gcc dot gnu dot org
2005-10-30 23:32 ` pinskia at gcc dot gnu dot org
2006-02-09  3:20 ` [Bug tree-optimization/21829] [4.1/4.2 " law at redhat dot com
2006-02-11  0:59 ` pinskia at gcc dot gnu dot org
2006-02-28 20:39 ` mmitchel at gcc dot gnu dot org
2006-03-21  5:09 ` law at redhat dot com
2006-03-21  5:10 ` law at redhat dot com
2006-03-22 11:14 ` richard dot guenther at gmail dot com
2006-03-22 14:01 ` law at redhat dot com
2006-03-22 15:06 ` richard dot guenther at gmail dot com
2006-03-22 15:36 ` law at redhat dot com
2006-03-28 19:14 ` law at redhat dot com
2006-03-30 17:15 ` law at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).