From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27027 invoked by alias); 11 Dec 2004 07:25:15 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 27002 invoked from network); 11 Dec 2004 07:25:10 -0000 Received: from unknown (HELO mail.cs.umass.edu) (128.119.243.168) by sourceware.org with SMTP; 11 Dec 2004 07:25:10 -0000 Received: from localhost (IDENT:65cYLDDAClx3fsKA3ww9ZBZEsrT066Qm@loki.cs.umass.edu [128.119.243.168]) by mail.cs.umass.edu (8.12.11/8.12.5) with ESMTP id iBB7P5WM027366; Sat, 11 Dec 2004 02:25:07 -0500 Date: Sat, 11 Dec 2004 07:25:00 -0000 Message-Id: <20041211.022458.119878301.kazu@cs.umass.edu> To: gcc@gcc.gnu.org CC: stevenb@suse.de, law@redhat.com, dvorakz@suse.cz Subject: Good news about increased jump threading from my PHI merge patch From: Kazu Hirata Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-SW-Source: 2004-12/txt/msg00390.txt.bz2 Hi, I was playing with my PHI merge patch that merge two PHI nodes like # tem_6 = PHI ; :; # tem_3 = PHI ; :; into one PHI node like so # tem_3 = PHI ; :; I've been saying my PHI merge patch would improve jump threading but never said by how much. So here is that number. Specifically, I compiled cc1-i files. I grepped for "Threaded" in t??.dom[123] dumps to see how many times we do tree-level jump threading. Similarly, I grepped for "JUMP-BYPASS" in 07.bypass to see how many times we do RTL-level threading. mainline patched diff% tree 18918 20524 +7.824% RTL 2648 2415 -9.648% ----------------------------- total 21566 22939 +5.985% OK, so from the point of view of final generated code, the net increase of 6% is not bad. I have yet to measure the speed of the generated code. It's also good from the point of view of doing more work at tree level than at RTL level. We've taken away nearly 10% of work from the RTL level jump threader. Compile time is reduced by 0.3% or so. More on this later in this post. Actually, what I am doing is a little more than the PHI merge described above. Sometimes a PHI merge opportunity is "blocked" with an intervening cast(s) like so # tem_6 = PHI <0(8), 1(7)>; :; tem_7 = (_Bool) tem_6; # tem_3 = PHI ; :; To merge PHI nodes even in this kind of situaiton, my pass has a PHI-specific constant propagator to eat the cast like so # tem_7 = PHI <0(8), 1(7)>; <- Notice PHI result is tmp_7, not tmp_6 :; # tem_3 = PHI ; :; To be precise, my patch is actually a fixed point operation of two subpasses, PHI merge and this PHI constant propagator. They create opportunities for each other. Without this PHI constant propagator, my patch brings about 1% compile-time speed improvement. (Yes, I need to speed up the PHI-specific constant propagator or use a smart worklist shared between the two subpasses.) I have not included any patch to fix PR 18649 during this benchmark. I don't expect any big difference because the PR is basically a rather rare corner case. Aside from being in Stage 3, let me note another reason for holding this until 4.1. We create large PHI nodes. The current PHI-OPT cannot handle this, but the one in tcb branch can thanks to Andrew Pinski. Right now I am running this pass only once immediately after the first forwprop. The key in this pass is lack of dead code so that I can find forwarder blocks with PHI nodes. It may be beneficial to run this one or two more times during tree optimization. In any event, I think this is a good candidate to consider for 4.1. Any comment? Kazu Hirata