From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-106428-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 27027 invoked by alias); 11 Dec 2004 07:25:15 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 27002 invoked from network); 11 Dec 2004 07:25:10 -0000
Received: from unknown (HELO mail.cs.umass.edu) (128.119.243.168)
  by sourceware.org with SMTP; 11 Dec 2004 07:25:10 -0000
Received: from localhost (IDENT:65cYLDDAClx3fsKA3ww9ZBZEsrT066Qm@loki.cs.umass.edu [128.119.243.168])
	by mail.cs.umass.edu (8.12.11/8.12.5) with ESMTP id iBB7P5WM027366;
	Sat, 11 Dec 2004 02:25:07 -0500
Date: Sat, 11 Dec 2004 07:25:00 -0000
Message-Id: <20041211.022458.119878301.kazu@cs.umass.edu>
To: gcc@gcc.gnu.org
CC: stevenb@suse.de, law@redhat.com, dvorakz@suse.cz
Subject: Good news about increased jump threading from my PHI merge patch
From: Kazu Hirata <kazu@cs.umass.edu>
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-SW-Source: 2004-12/txt/msg00390.txt.bz2

Hi,

I was playing with my PHI merge patch that merge two PHI nodes like

  # tem_6 = PHI <tem_17(8), tem_23(7)>;
<L9>:;

  # tem_3 = PHI <tem_6(9), tem_2(5)>;
<L10>:;

into one PHI node like so

  # tem_3 = PHI <tem_23(7), tem_2(5), tem_17(8)>;
<L10>:;

I've been saying my PHI merge patch would improve jump threading but
never said by how much.  So here is that number.

Specifically, I compiled cc1-i files.  I grepped for "Threaded" in
t??.dom[123] dumps to see how many times we do tree-level jump
threading.  Similarly, I grepped for "JUMP-BYPASS" in 07.bypass to see
how many times we do RTL-level threading.

     mainline patched   diff%
tree    18918   20524 +7.824%
RTL      2648    2415 -9.648%
-----------------------------
total   21566   22939 +5.985%

OK, so from the point of view of final generated code, the net
increase of 6% is not bad.  I have yet to measure the speed of the
generated code.

It's also good from the point of view of doing more work at tree level
than at RTL level.  We've taken away nearly 10% of work from the RTL
level jump threader.

Compile time is reduced by 0.3% or so.  More on this later in this
post.

Actually, what I am doing is a little more than the PHI merge
described above.  Sometimes a PHI merge opportunity is "blocked" with
an intervening cast(s) like so

  # tem_6 = PHI <0(8), 1(7)>;
<L9>:;
  tem_7 = (_Bool) tem_6;

  # tem_3 = PHI <tem_7(9), 1(5)>;
<L10>:;

To merge PHI nodes even in this kind of situaiton, my pass has a
PHI-specific constant propagator to eat the cast like so

  # tem_7 = PHI <0(8), 1(7)>;  <- Notice PHI result is tmp_7, not tmp_6
<L9>:;

  # tem_3 = PHI <tem_7(9), 1(5)>;
<L10>:;

To be precise, my patch is actually a fixed point operation of two
subpasses, PHI merge and this PHI constant propagator.  They create
opportunities for each other.  Without this PHI constant propagator,
my patch brings about 1% compile-time speed improvement.  (Yes, I need
to speed up the PHI-specific constant propagator or use a smart
worklist shared between the two subpasses.)

I have not included any patch to fix PR 18649 during this benchmark.
I don't expect any big difference because the PR is basically a rather
rare corner case.

Aside from being in Stage 3, let me note another reason for holding
this until 4.1.  We create large PHI nodes.  The current PHI-OPT
cannot handle this, but the one in tcb branch can thanks to Andrew
Pinski.

Right now I am running this pass only once immediately after the first
forwprop.  The key in this pass is lack of dead code so that I can
find forwarder blocks with PHI nodes.  It may be beneficial to run
this one or two more times during tree optimization.

In any event, I think this is a good candidate to consider for 4.1.
Any comment?

Kazu Hirata