From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26309 invoked by alias); 19 Nov 2013 17:40:54 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 26299 invoked by uid 89); 19 Nov 2013 17:40:53 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.9 required=5.0 tests=AWL,BAYES_50,RDNS_NONE,SPF_HELO_PASS,SPF_PASS,URIBL_BLOCKED autolearn=no version=3.3.2 X-HELO: mx1.redhat.com Received: from Unknown (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 19 Nov 2013 17:40:51 +0000 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id rAJHegxk014759 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 19 Nov 2013 12:40:43 -0500 Received: from stumpy.slc.redhat.com (ovpn-113-162.phx2.redhat.com [10.3.113.162]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id rAJHefI5014480; Tue, 19 Nov 2013 12:40:41 -0500 Message-ID: <528BA299.7040606@redhat.com> Date: Tue, 19 Nov 2013 19:32:00 -0000 From: Jeff Law User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: Teresa Johnson , Jan Hubicka CC: =?windows-1252?Q?Martin_Li=9Aka?= , "gcc-patches@gcc.gnu.org" Subject: Re: [PATCH i386] Enable -freorder-blocks-and-partition References: <20131119154407.GA1742@kam.mff.cuni.cz> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2013-11/txt/msg02386.txt.bz2 On 11/19/13 10:24, Teresa Johnson wrote: > On Tue, Nov 19, 2013 at 7:44 AM, Jan Hubicka wrote: >> Martin, >> can you, please, generate the updated systemtap with >> -freorder-blocks-and-partition enabled? >> >> I am in favour of enabling this - it is usefull pass and it is pointless ot >> have passes that are not enabled by default. >> Is there reason why this would not work on other ELF target? Is it working >> with Darwin and Windows? > > I don't know how to test these (I don't see any machines listed in the > gcc compile farm of those types). For Windows, I assume you mean > MinGW, which should be enabled as it is under i386. Should I disable > it there and for Darwin? > >> >>> This patch enables -freorder-blocks-and-partition by default for x86 >>> at -O2 and up. It is showing some modest gains in cpu2006 performance >>> with profile feedback and -O2 on an Intel Westmere system. Specifically, >>> I am seeing consistent improvements in 401.bzip2 (1.5-3%), 483.xalancbmk >>> (1.5-3%), and 453.povray (2.5-3%), and no apparent regressions. >> >> This actually sounds very good ;) >> >> Lets see how the systemtap graphs goes. If we will end up with problem >> of too many accesses to cold section, I would suggest making cold section >> subdivided into .unlikely and .unlikely.part (we could have better name) >> with the second consisting only of unlikely parts of hot&normal functions. >> >> This should reduce the problems we are seeing with mistakely identifying >> code to be cold because of roundoff errors (and it probably makes sense >> in general, too). >> We will however need to update gold and ld for that. > > Note that I don't think this would help much unless the linker is > changed to move the cold split section close to the hot section. There > is probably some fine-tuning we could do eventually in the linker > under -ffunction-sections without putting the split portions in a > separate section. I.e. clump the split parts together within unlikely. > But hopefully this can all be done later on as follow-on work to boost > the performance further. > >>> >>> Bootstrapped and tested on x86-64-unknown-linux-gnu with a normal >>> bootstrap, a profiledbootstrap and an LTO profiledbootstrap. All were >>> configured with --enable-languages=all,obj-c++ and tested for both >>> 32 and 64-bit with RUNTESTFLAGS="--target_board=unix\{-m32,-m64\}". >>> >>> It would be good to enable this for additional targets as a follow on, >>> but it needs more testing for both correctness and performance on those >>> other targets (i.e for correctness because I see a number of places >>> in other config/*/*.c files that do some special handling under this >>> option for different targets or simply disable it, so I am not sure >>> how well-tested it is under different architectural constraints). >>> >>> Ok for trunk? >>> >>> Thanks, >>> Teresa >>> >>> 2013-11-19 Teresa Johnson >>> >>> * common/config/i386/i386-common.c: Enable >>> -freorder-blocks-and-partition at -O2 and up for x86. >>> * opts.c (finish_options): Only warn if -freorder-blocks-and- >>> partition was set on command line. >> >> You probably mis doc/invoke.texi update. >> Thank you for working on this! > > Yes, thanks. Here is the patch with the invoke.texi update. > > Teresa > > > 2013-11-19 Teresa Johnson > > * common/config/i386/i386-common.c: Enable > -freorder-blocks-and-partition at -O2 and up for x86. > * doc/invoke.texi: Update -freorder-blocks-and-partition default. > * opts.c (finish_options): Only warn if -freorder-blocks-and- > partition was set on command line. This is fine. Glad to see we're getting some good benefits from this code. FWIW, I would expect the PA to show benefits from this work -- HP showed good results for block positioning and procedure splitting eons ago. A secondary effect you would expect to see would be an increase in the number of long branch stubs (static count), but a decrease in the number of those stubs that actually execute. Measuring those effects would obviously be out-of-scope. I totally understand that the PA is irrelevant these days, but seeing good results on another architecture would go a long way to answering the question "should this be enabled by default across the board?" jeff