From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26940 invoked by alias); 15 Dec 2013 22:19:46 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 26923 invoked by uid 89); 15 Dec 2013 22:19:45 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-pb0-f43.google.com Received: from mail-pb0-f43.google.com (HELO mail-pb0-f43.google.com) (209.85.160.43) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Sun, 15 Dec 2013 22:19:43 +0000 Received: by mail-pb0-f43.google.com with SMTP id rq2so4669926pbb.2 for ; Sun, 15 Dec 2013 14:19:41 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.66.242.17 with SMTP id wm17mr16652975pac.102.1387145981891; Sun, 15 Dec 2013 14:19:41 -0800 (PST) Received: by 10.68.253.39 with HTTP; Sun, 15 Dec 2013 14:19:41 -0800 (PST) In-Reply-To: References: <528BA299.7040606@redhat.com> <20131128140655.GA20730@kam.mff.cuni.cz> <529CB252.4070806@redhat.com> <20131213011309.GA21107@kam.mff.cuni.cz> Date: Sun, 15 Dec 2013 22:19:00 -0000 Message-ID: Subject: Re: [PATCH i386] Enable -freorder-blocks-and-partition From: =?UTF-8?Q?Martin_Li=C5=A1ka?= To: Jan Hubicka Cc: Teresa Johnson , Jeff Law , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2013-12/txt/msg01365.txt.bz2 On 15 December 2013 23:17, Martin Li=C5=A1ka wrote: > Dear Jan and Teresa, > Jan was right that I've been using changes which were commited by > Teresa and do live in trunk. So the graph with time profile presented > in my previous post was really with enabled > -freorder-blocks-and-partition. I removed the hack in varasm.c and I > do use classic section layout. Please open the following dump > (includes PDF graph+html report that shows functions with time profile > located in cold section and all -fdump-ipa-all dumps): > > https://drive.google.com/file/d/0B0pisUJ80pO1YW1QWUFkZjdqME0/edit?usp=3Ds= haring > > Apart from that, I created also PDF graph (https://drive.google.com/file/= d/0B0pisUJ80pO1aHhPWW56dXpLVTQ/edit?usp=3Dsharing) that > shows that time profile is almost perfect for GIMP. I miss just some > examples that do not have profile in generate phase. > > I will merge current trunk and prepare final patch. > > Are there any other data that you want to be prepared? > > Martin > > > On 13 December 2013 02:13, Jan Hubicka wrote: >>> On Wed, Dec 11, 2013 at 1:21 AM, Martin Li=C5=A1ka wrote: >>> > Hello, >>> > I prepared a collection of systemtap graphs for GIMP. >>> > >>> > 1) just my profile-based function reordering: 550 pages >>> > 2) just -freorder-blocks-and-partitions: 646 pages >>> > 3) just -fno-reorder-blocks-and-partitions: 638 pages >>> > >>> > Please see attached data. >>> >>> Thanks for the data. A few observations/questions: >>> >>> With both 1) (your (time-based?) reordering) and 2) >>> (-freorder-blocks-and-partitions) there are a fair amount of accesses >>> out of the cold section. I'm not seeing so many accesses out of the >>> cold section in the apps I am looking at with splitting enabled. In >> >> I see you already comitted the patch, so perhaps Martin's measurement as= sume >> the pass is off by default? >> >> I rebuilded GCC with profiledboostrap and with the linkerscript unmapping >> text.unlikely. I get ICE in: >> (gdb) bt >> #0 diagnostic_set_caret_max_width(diagnostic_context*, int) () at ../..= /gcc/diagnostic.c:108 >> #1 0x0000000000f68457 in diagnostic_initialize (context=3D0x18ae000 , n_opts=3Dn_opts@entry=3D1290) at ../../gcc/diagno= stic.c:135 >> #2 0x000000000100050e in general_init (argv0=3D) at ../.= ./gcc/toplev.c:1110 >> #3 toplev_main(int, char**) () at ../../gcc/toplev.c:1922 >> #4 0x00007ffff774cbe5 in __libc_start_main () from /lib64/libc.so.6 >> #5 0x0000000000f7898d in _start () at ../sysdeps/x86_64/start.S:122 >> >> That is relatively early in startup process. The function seems inlined = and >> it fails only on second invocation, did not have time to investigate fur= ther, >> yet while without -fprofile-use it starts... >> >> On our periodic testers I see off-noise improvement in crafty 2200->2300 >> and regression on Vortex, 2900->2800, plus code size increase. >> >> Honza