From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-106501-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 11311 invoked by alias); 13 Dec 2004 17:57:43 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 11267 invoked from network); 13 Dec 2004 17:57:31 -0000
Received: from unknown (HELO mail.cs.umass.edu) (128.119.243.168)
  by sourceware.org with SMTP; 13 Dec 2004 17:57:31 -0000
Received: from localhost (IDENT:X7ej+KJ+TAl1exPunp4K+C7uDHldR3Wd@loki.cs.umass.edu [128.119.243.168])
	by mail.cs.umass.edu (8.12.11/8.12.5) with ESMTP id iBDHvUeo028379;
	Mon, 13 Dec 2004 12:57:30 -0500
Date: Mon, 13 Dec 2004 17:57:00 -0000
Message-Id: <20041213.125724.34765950.kazu@cs.umass.edu>
To: law@redhat.com
Cc: gcc@gcc.gnu.org, stevenb@suse.de, dvorakz@suse.cz
Subject: Re: Good news about increased jump threading from my PHI merge
 patch
From: Kazu Hirata <kazu@cs.umass.edu>
In-Reply-To: <1102955877.14666.94.camel@localhost.localdomain>
References: <20041211.022458.119878301.kazu@cs.umass.edu>
	<1102955877.14666.94.camel@localhost.localdomain>
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-SW-Source: 2004-12/txt/msg00463.txt.bz2

Hi Jeff,

> Thanks.  It's still on my list of things to evaluate for 4.0, mostly
> because of its potential to help compile-times on some of our 
> problematical cases without introducing lots of regressions elsewhere.

If we are just merging PHI nodes and not constant-propagating PHI
nodes (like killing casts), I get about 1% speed up on some test
cases.  (For more accurate number, I have to retest.)

What I am doing is really the same as Zdenek's thread_jumps rewrite,
which you recently approved.  The only difference is that I am trying
to remove forwarder blocks with PHI nodes.  Zdenek's version tries to
remove those without.

But do note, though, that my pass helps if it is run once or twice,
but if it's part of cleanup_tree_cfg, the compiler measurably slows
down because cleanup_tree_cfg is run so many times.  I think my pass
basically requires DCE.  We clean up CFG before entering SSA, so we
shouldn't have any PHI merge opportunity unless some dead code between
two PHI nodes are removed.

Here is a quick advertisement. :-) Let me mention that I can quickly
determine if I can merge a forwarder block A (and its PHI nodes) into
basic block B most of the time.  The only condition I need is:

  if (!dominated_by_p (CDI_DOMINATORS, B, A))

because if A does not dominate B, then the only valid place where any
values produced in A can be used is PHI arguments on edge A->B.
Otherwise, we would have a use that is not dominated by its def.  In
other words, we don't need to compute immediate uses.

The reason why I said "most of the time" above is that sometimes A
dominates B even if B has PHI nodes (and thus multiple incoming
edges).  This case applies to B being a loop header.  I don't know how
aggressive we want to be in this case because we are effectively
undoing create_preheader.

> You mentioned that we can't handle large PHI nodes.  Can you comment
> more on that -- that would be my largest individual concern right now.

The only thing that's technically blocking my PHI merge is PHI-OPT.
IIRC, PHI-OPT looks for a basic block with a single PHI node with
exactly two arguments like so:

  a_1 = PHI <0(here), 1(there)>
<L123>:;

and it tries to convert this to

<L123>:;
  a_1 = COND;

The newer version of PHI-OPT on tree cleanup branch looks at COND_EXPR
instead of a PHI node and sees if two arms of COND_EXPR feed 0 and 1
into some PHI node, so we end up with

<L100>:;
  :
  :
  c_7 = COND;
  goto <L123>;

  a_1 = PHI <c_7(here), ... arbitrarily long ...>
<L123>:;

Of course, in the special case of two incoming edges, the PHI node is
equivalent to a copy statement because we would have

  a_1 = PHI <c_7(here)>
<L123>:;

Kazu Hirata