From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17087 invoked by alias); 18 Jun 2011 23:49:03 -0000 Received: (qmail 17078 invoked by uid 22791); 18 Jun 2011 23:49:02 -0000 X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,TW_CF X-Spam-Check-By: sourceware.org Received: from mail-qw0-f47.google.com (HELO mail-qw0-f47.google.com) (209.85.216.47) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 18 Jun 2011 23:48:39 +0000 Received: by qwh5 with SMTP id 5so770346qwh.20 for ; Sat, 18 Jun 2011 16:48:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.212.8 with SMTP id gq8mr1588296qcb.73.1308440916572; Sat, 18 Jun 2011 16:48:36 -0700 (PDT) Received: by 10.229.47.78 with HTTP; Sat, 18 Jun 2011 16:48:36 -0700 (PDT) In-Reply-To: <4DF985ED.30406@redhat.com> References: <4DF985ED.30406@redhat.com> Date: Sun, 19 Jun 2011 07:30:00 -0000 Message-ID: Subject: Re: Improve jump threading #5 of N From: "H.J. Lu" To: Jeff Law Cc: gcc-patches Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-06/txt/msg01405.txt.bz2 On Wed, Jun 15, 2011 at 9:26 PM, Jeff Law wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > > > So as I've mentioned previously, I've been working on a relatively small > change to the jump threading code which would allow it to duplicate a > join block when doing so allows us to thread through a successor of the > join block. =A0This is expected to be the last extension to the existing > jump threading code. > > This was mainly done to improve our ability to eliminate unexecutable > paths through the CFG which helps avoid false positives with certain > warnings. =A0It also has the nice property that it eliminates conditionals > and often results in further optimization of nearby code. > > To help evaluate the code generation improvements of this change I built > gcc-4.6 (checking enabled) using a compiler with and without this > improvement. =A0I then used the 4.6 cc1s to compile a bunch of .i files > under the watchful eye of valgrind. > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0without patch =A0 =A0with = patch > Total cbranches =A0 =A0 =A0 =A0 =A0231072754220 =A0 =A0 229626578262 > Total ibranches: =A0 =A0 =A0 =A0 =A0 7687404775 =A0 =A0 =A0 7686994201 > > > cbranches shows the number of dynamically executed conditional branches. > =A0As you can see, with the patch we eliminated about .625% of the runtime > conditional branches. =A0Not bad at all. =A0We eliminated a trivial number > of indirect branches. =A0In all we eliminated 1446595532 runtime branches. > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0without patch =A0 =A0with = patch > Total instructions: =A0 =A0 1254106133886 =A0 =A01247718004946 > > > I was expecting a reduction in the total number of instructions > executed, but was quite surprised at the actual data. =A0We end up > eliminating 6388128940 dynamic instructions --- which means that for > every dynamic branch eliminated, on average we were able to eliminate an > additional 3.4 dynamic instructions -- that's a huge secondary effect. > Clearly improving jump threading in this way is allowing the rest of the > optimizers to do a better job. > > Anyway, attached is the patch. =A0Again, the concept is pretty simple, > when we have a join block which can not be threaded, we peek at the > successors of the join block and see if one or more of them can be thread= ed. > > If so, we make a duplicate of the join block, wire the incoming edge we > were originally trying to thread to reach the duplicate rather than the > original join block. =A0We then wire the outgoing edge from the duplicate > to the final jump thread target. > > So if given a CFG like this (from =A0a routine in cfgexpand): > > =A0 =A0 =A0 =A0 =A0 A > =A0 =A0 =A0 =A0 =A0/ \ > =A0 =A0 =A0 =A0 B =A0 C > =A0 =A0 =A0 =A0 | =A0/ \ > =A0 =A0 =A0 =A0 | D =A0 E > =A0 =A0 =A0 =A0 | | =A0/ \ > =A0 =A0 =A0 =A0 | | F =A0 G > =A0 =A0 =A0 =A0 =A0\| | > =A0 =A0 =A0 =A0 =A0 =A0\| > =A0 =A0 =A0 =A0 =A0 =A0 H > =A0 =A0 =A0 =A0 =A0 =A0/ \ > =A0 =A0 =A0 =A0 =A0 I =A0 J > =A0 =A0 =A0 =A0 =A0/ \ > =A0 =A0 =A0 =A0 L =A0 M > =A0 =A0 =A0 =A0 | =A0/ \ > =A0 =A0 =A0 =A0 | N =A0 O > =A0 =A0 =A0 =A0 | | =A0/ \ > =A0 =A0 =A0 =A0 | | P =A0 Q > =A0 =A0 =A0 =A0 =A0\| | > =A0 =A0 =A0 =A0 =A0 =A0\| > =A0 =A0 =A0 =A0 =A0 =A0 R > > > As it turns out some blocks have the same condition (A,I), (C,M), (E,O). > But because of the merge block H, no threading is possible. =A0What we > want to do is make 3 copies of H, each reachable from one predecessor of > the original H. =A0That exposes the jump threading opportunities B->L, > D->N and F->P. =A0The final CFG looks something like this: > > =A0 =A0 =A0 =A0 =A0 A > =A0 =A0 =A0 =A0 =A0/ \ > =A0 =A0 =A0 =A0BH'L C > =A0 =A0 =A0 =A0 | =A0/ \ > =A0 =A0 =A0 =A0 |DH'N E > =A0 =A0 =A0 =A0 | | =A0/ \ > =A0 =A0 =A0 =A0 | |FH'P G > =A0 =A0 =A0 =A0 =A0\| | > =A0 =A0 =A0 =A0 =A0 =A0\| > =A0 =A0 =A0 =A0 =A0 =A0 R > > > > Where each H' also has an edge to J from the original CFG, but which is > hard to show here... Note that I, M, O & Q all disappear and each > dynamic path through the cfg is shortened, even though we had to > duplicate H multiple times. > > Bootstrapped and regression tested on x86_64-unknown-linux-gnu. > > OK for mainline? > This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D49465 --=20 H.J.