From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28887 invoked by alias); 23 Jul 2014 13:47:15 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 28877 invoked by uid 89); 23 Jul 2014 13:47:14 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL,BAYES_00,DATE_IN_PAST_06_12,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=no version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Wed, 23 Jul 2014 13:47:13 +0000 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s6NDlBj3008452 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 23 Jul 2014 09:47:11 -0400 Received: from stumpy.slc.redhat.com (ovpn-113-157.phx2.redhat.com [10.3.113.157]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s6NDlA3Q002137; Wed, 23 Jul 2014 09:47:11 -0400 Message-ID: <53CF1DFD.7080805@redhat.com> Date: Wed, 23 Jul 2014 13:47:00 -0000 From: Jeff Law User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Teresa Johnson , "gcc-patches@gcc.gnu.org" , Jan Hubicka CC: David Li Subject: Re: [PATCH] Redesign jump threading profile updates References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2014-07/txt/msg01525.txt.bz2 On 03/26/14 17:44, Teresa Johnson wrote: > Recently I discovered that the profile updates being performed by jump > threading were incorrect in many cases, particularly in the case where > the threading path contains a joiner. Some of the duplicated > blocks/edges were not getting any counts, leading to incorrect > function splitting and other downstream optimizations, and there were > other insanities as well. After making a few attempts to fix the > handling I ended up completely redesigning the profile update code, > removing a few places throughout the code where it was attempting to > do some updates. > > The biggest complication (see the large comment and example above the > new routine compute_path_counts) is that we duplicate a conditional > jump in the joiner case, possibly multiple times for multiple jump > thread paths through that joiner, and it isn't trivial to figure out > what probability to assign each of the duplicated successor edges (and > the original after threading). Each jump thread path may need to have > a different probability of staying on path through the joiner in order > to keep the counts going out of the threading path sane. > > The patch below was bootstrapped and tested on > x86_64-unknown-linux-gnu, and also tested with a profiledbootstrap. I > additionally tested with cpu2006, confirming that the amount of > resulting cycle samples in the split cold sections reduced, and > through manual inspection that many different cases were now correct. > I also measured performance with cpu2006, running each benchmark > multiple times on a Westmere and see some speedups (453.povray 1-2%, > 403.gcc 1-1.5%, and noisy but positive speedups in 471.omnetpp and > 483.xalancbmk). > > Looks like my mailer is corrupting the spacing, which makes it harder > to look at the CFG examples in the big header comment block I added. > So I have also included the patch as an attachment. > > Ok for stage 1? > > Thanks, > Teresa > > 2014-03-26 Teresa Johnson > > * tree-ssa-threadupdate.c (struct ssa_local_info_t): New > duplicate_blocks bitmap. > (remove_ctrl_stmt_and_useless_edges): Ditto. > (create_block_for_threading): Ditto. > (compute_path_counts): New function. > (update_profile): Ditto. > (deduce_freq): Ditto. > (recompute_probabilities): Ditto. > (update_joiner_offpath_counts): Ditto. > (ssa_fix_duplicate_block_edges): Update profile info. > (ssa_create_duplicates): Pass new parameter. > (ssa_redirect_edges): Remove old profile update. > (thread_block_1): New duplicate_blocks bitmap, > remove old profile update. > (thread_single_edge): Pass new parameter. First off, sorry this took so long to get reviewed. Most of what's going on in here is similar to something I sketched out, but never coded up a while back -- with the significant difference that you're handling joiner blocks as well. Everything looks to be well thought through and documented in the code at a level I wish existed throughout GCC. The only thing I see missing is regression tests. I don't think you need to do anything huge here, but it ought to be possible to set up relatively simple cases which show the probabilities/counts being updated properly. Otherwise it looks excellent. It's pre-approved once you've added some kind of testing and fixed the nits noted below. > + In the aboe example, after all jump threading is complete, we will s/aboe/above/ > + struct el *next, *el; > + bitmap in_edge_srcs = BITMAP_ALLOC (NULL); > + for (el = rd->incoming_edges; el; el = next) > + { > + next = el->next; > + bitmap_set_bit (in_edge_srcs, el->e->src->index); > + } Please add vertical whitespace after this loop, but before declaring variables for the next loop. Jeff