From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 37525 invoked by alias); 6 May 2019 13:50:51 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 37322 invoked by uid 89); 6 May 2019 13:50:51 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-7.7 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,GIT_PATCH_3,SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mx1.suse.de Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 06 May 2019 13:50:49 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 746F9AE18; Mon, 6 May 2019 13:50:47 +0000 (UTC) Subject: Re: [RFH] split {generic,gimple}-match.c files To: Richard Biener Cc: gcc-patches@gcc.gnu.org, Richard Sandiford References: <17f96e37-5e33-fb14-cbcb-d9caf8c309e8@suse.cz> <330e296b-2a16-fa56-8442-219c40594606@suse.cz> <373bd3aa-5bc5-21f1-817f-96be210d281e@suse.cz> <9b743403-c938-aa7d-1598-9454f6c76eab@suse.cz> From: =?UTF-8?Q?Martin_Li=c5=a1ka?= Message-ID: <0dd69228-fae2-01da-53a6-aaee0ca544c8@suse.cz> Date: Mon, 06 May 2019 13:50:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2019-05/txt/msg00221.txt.bz2 On 5/6/19 3:31 PM, Richard Biener wrote: > On Mon, 6 May 2019, Martin Liška wrote: > >> On 5/2/19 3:18 PM, Richard Biener wrote: >>> On Mon, 29 Apr 2019, Martin Liška wrote: >>> >>>> On 9/10/18 1:43 PM, Martin Liška wrote: >>>>> On 09/04/2018 05:07 PM, Martin Liška wrote: >>>>>> - in order to achieve real speed up we need to split also other generated (and also dwarf2out.c, i386.c, ..) files: >>>>>> here I'm most concerned about insn-recog.c, which can't be split the same way without ending up with a single huge SCC component. >>>>> >>>>> About the insn-recog.c file: all functions are static and using SCC one ends >>>>> up with all functions in one component. In order to split the callgraph one >>>>> needs to promote some functions to be extern and then split would be possible. >>>>> In order to do that we'll probably need to teach splitter how to do partitioning >>>>> based on minimal number of edges to be removed. >>>>> >>>>> I need to inspire in lto_balanced_map, or is there some simple algorithm I can start with? >>>>> >>>>> Martin >>>>> >>>> >>>> I'm adding here Richard Sandiford as he wrote majority of gcc/genrecog.c file. >>>> As mentioned, I'm seeking for a way how to split the generated file. Or how >>>> to learn the generator to process a reasonable splitting. >>> >>> Somewhen earlier this year I've done the experiment with using >>> a compile with -flto -fno-fat-lto-objects >> >> -fno-fat-lto-objects is default, isn't it? > > Where linker plugin support is detected, yes. > >>> and a link >>> via -flto -r -flinker-output=rel into the object file. This cut >>> compile-time more than in half with less maintainance overhead. Ah, -flinker-output=nolto-rel is new in GCC 9 release. That's why I was confused. >> >> Can you please provide exact command line how to do that? > > gcc t.c -o t.o -flto=8 -r -flinker-output=nolto-rel > > there's an annoying warning: > > cc1plus: warning: command line option ‘-flinker-output=nolto-rel’ is valid > for LTO but not for C++ > > which can be avoided by splitting the above into a compile and > a separate LTO "link" step. Using -Wl,-flinker-.... doesn't > work unfortunately (ld doesn't understand it). > > Using installed GCC 9.1 compiling trunk gimple-match.c with -O2 -g > takes 58.7s while with the LTO trick it takes 23.3s (combined > CPU time is up to 96s). That was with -flto=8 on a CPU with > 4 physical and 8 logical cores. As it includes -g it includes > the debug copy dance as well. That would be usable for the bootstrap on a massively parallel machine where combined CPU time overhead won't be issue. I'll play with that a bit. Martin > >> bloaty gimple-match.o -- gimple-match.o.nolto > VM SIZE FILE SIZE > ++++++++++++++ GROWING > ++++++++++++++ > [ = ] 0 .rela.debug_info +3.62Mi > +45% > [ = ] 0 .rela.debug_ranges +161Ki > +1.8% > [ = ] 0 .debug_str +95.8Ki > +19% > [ = ] 0 .rela.text +77.6Ki > +10% > [ = ] 0 .debug_ranges +58.9Ki > +1.7% > [ = ] 0 .symtab +22.9Ki > +68% > [ = ] 0 .debug_abbrev +21.1Ki > +394% > [ = ] 0 .strtab +11.4Ki > +9.5% > +8.1% +5.34Ki .eh_frame +5.34Ki > +8.1% > +84% +4.09Ki .rodata.str1.8 +4.09Ki > +84% > [ = ] 0 .rela.text.unlikely +3.87Ki > +1.0% > [ = ] 0 .rela.debug_aranges +3.68Ki > +872% > [ = ] 0 .debug_aranges +3.02Ki > +10e2% > +42% +2.59Ki .rodata.str1.1 +2.59Ki > +42% > +0.2% +2.41Ki [Other] +2.45Ki > +0.2% > [ = ] 0 .rela.debug_line +2.09Ki > +16% > [ = ] 0 .rela.eh_frame +1.17Ki > +4.3% > [NEW] +1.09Ki .rodata._Z7get_defPFP9tree_nodeS0_ES0_.str1.8 +1.09Ki > [NEW] > [ = ] 0 .shstrtab +784 > +44% > [ = ] 0 [ELF Headers] +768 > +16% > [ = ] 0 .comment +666 > +37e2% > > -------------- SHRINKING > -------------- > [ = ] 0 .debug_line -256Ki > -17.3% > [ = ] 0 .rela.debug_loc -73.6Ki > -0.6% > [ = ] 0 .debug_info -63.4Ki > -1.6% > [ = ] 0 .debug_loc -39.3Ki > -0.6% > > +1.1% +15.5Ki TOTAL +3.67Mi > +7.8% > > .debug_line probably shrinks because we drop columns with LTO. > > Richard. >