From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-500168-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 37525 invoked by alias); 6 May 2019 13:50:51 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 37322 invoked by uid 89); 6 May 2019 13:50:51 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-7.7 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,GIT_PATCH_3,SPF_PASS autolearn=ham version=3.3.1 spammy=
X-HELO: mx1.suse.de
Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 06 May 2019 13:50:49 +0000
Received: from relay2.suse.de (unknown [195.135.220.254])	by mx1.suse.de (Postfix) with ESMTP id 746F9AE18;	Mon,  6 May 2019 13:50:47 +0000 (UTC)
Subject: Re: [RFH] split {generic,gimple}-match.c files
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org, Richard Sandiford <richard.sandiford@arm.com>
References: <alpine.LSU.2.20.1804251330400.31014@zhemvz.fhfr.qr> <c4aa7740-f185-6004-4635-dc3c6c58ded1@suse.cz> <alpine.LSU.2.20.1809031435570.16707@zhemvz.fhfr.qr> <e1064cf2-f963-e780-28ea-09a0b6df35df@suse.cz> <17f96e37-5e33-fb14-cbcb-d9caf8c309e8@suse.cz> <alpine.LSU.2.20.1809031554530.16707@zhemvz.fhfr.qr> <330e296b-2a16-fa56-8442-219c40594606@suse.cz> <alpine.LSU.2.20.1809031640350.16707@zhemvz.fhfr.qr> <c9c0ee2c-79dc-8374-b55b-655fd2e161c6@suse.cz> <373bd3aa-5bc5-21f1-817f-96be210d281e@suse.cz> <dfe828e0-7394-535b-1212-96abc704e5fb@suse.cz> <alpine.LSU.2.20.1905021511590.10704@zhemvz.fhfr.qr> <9b743403-c938-aa7d-1598-9454f6c76eab@suse.cz> <alpine.LSU.2.20.1905061510570.10704@zhemvz.fhfr.qr>
From: =?UTF-8?Q?Martin_Li=c5=a1ka?= <mliska@suse.cz>
Message-ID: <0dd69228-fae2-01da-53a6-aaee0ca544c8@suse.cz>
Date: Mon, 06 May 2019 13:50:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <alpine.LSU.2.20.1905061510570.10704@zhemvz.fhfr.qr>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-IsSubscribed: yes
X-SW-Source: 2019-05/txt/msg00221.txt.bz2

On 5/6/19 3:31 PM, Richard Biener wrote:
> On Mon, 6 May 2019, Martin LiÅ¡ka wrote:
> 
>> On 5/2/19 3:18 PM, Richard Biener wrote:
>>> On Mon, 29 Apr 2019, Martin LiÅ¡ka wrote:
>>>
>>>> On 9/10/18 1:43 PM, Martin LiÅ¡ka wrote:
>>>>> On 09/04/2018 05:07 PM, Martin LiÅ¡ka wrote:
>>>>>> - in order to achieve real speed up we need to split also other generated (and also dwarf2out.c, i386.c, ..) files:
>>>>>> here I'm most concerned about insn-recog.c, which can't be split the same way without ending up with a single huge SCC component.
>>>>>
>>>>> About the insn-recog.c file: all functions are static and using SCC one ends
>>>>> up with all functions in one component. In order to split the callgraph one
>>>>> needs to promote some functions to be extern and then split would be possible.
>>>>> In order to do that we'll probably need to teach splitter how to do partitioning
>>>>> based on minimal number of edges to be removed.
>>>>>
>>>>> I need to inspire in lto_balanced_map, or is there some simple algorithm I can start with?
>>>>>
>>>>> Martin
>>>>>
>>>>
>>>> I'm adding here Richard Sandiford as he wrote majority of gcc/genrecog.c file.
>>>> As mentioned, I'm seeking for a way how to split the generated file. Or how
>>>> to learn the generator to process a reasonable splitting.
>>>
>>> Somewhen earlier this year I've done the experiment with using
>>> a compile with -flto -fno-fat-lto-objects
>>
>> -fno-fat-lto-objects is default, isn't it?
> 
> Where linker plugin support is detected, yes.
> 
>>> and a link
>>> via -flto -r -flinker-output=rel into the object file.  This cut
>>> compile-time more than in half with less maintainance overhead.

Ah, -flinker-output=nolto-rel is new in GCC 9 release. That's why I was confused.

>>
>> Can you please provide exact command line how to do that?
> 
> gcc t.c -o t.o -flto=8 -r -flinker-output=nolto-rel
> 
> there's an annoying warning:
> 
> cc1plus: warning: command line option â-flinker-output=nolto-relâ is valid 
> for LTO but not for C++
> 
> which can be avoided by splitting the above into a compile and
> a separate LTO "link" step.  Using -Wl,-flinker-.... doesn't
> work unfortunately (ld doesn't understand it).
> 
> Using installed GCC 9.1 compiling trunk gimple-match.c with -O2 -g
> takes 58.7s while with the LTO trick it takes 23.3s (combined
> CPU time is up to 96s).  That was with -flto=8 on a CPU with
> 4 physical and 8 logical cores.  As it includes -g it includes
> the debug copy dance as well.

That would be usable for the bootstrap on a massively parallel machine
where combined CPU time overhead won't be issue. I'll play with that a bit.

Martin

> 
>> bloaty gimple-match.o -- gimple-match.o.nolto
>      VM SIZE                                                     FILE SIZE
>  ++++++++++++++ GROWING                                       
> ++++++++++++++
>   [ = ]       0 .rela.debug_info                              +3.62Mi   
> +45%
>   [ = ]       0 .rela.debug_ranges                             +161Ki  
> +1.8%
>   [ = ]       0 .debug_str                                    +95.8Ki   
> +19%
>   [ = ]       0 .rela.text                                    +77.6Ki   
> +10%
>   [ = ]       0 .debug_ranges                                 +58.9Ki  
> +1.7%
>   [ = ]       0 .symtab                                       +22.9Ki   
> +68%
>   [ = ]       0 .debug_abbrev                                 +21.1Ki  
> +394%
>   [ = ]       0 .strtab                                       +11.4Ki  
> +9.5%
>   +8.1% +5.34Ki .eh_frame                                     +5.34Ki  
> +8.1%
>    +84% +4.09Ki .rodata.str1.8                                +4.09Ki   
> +84%
>   [ = ]       0 .rela.text.unlikely                           +3.87Ki  
> +1.0%
>   [ = ]       0 .rela.debug_aranges                           +3.68Ki  
> +872%
>   [ = ]       0 .debug_aranges                                +3.02Ki 
> +10e2%
>    +42% +2.59Ki .rodata.str1.1                                +2.59Ki   
> +42%
>   +0.2% +2.41Ki [Other]                                       +2.45Ki  
> +0.2%
>   [ = ]       0 .rela.debug_line                              +2.09Ki   
> +16%
>   [ = ]       0 .rela.eh_frame                                +1.17Ki  
> +4.3%
>   [NEW] +1.09Ki .rodata._Z7get_defPFP9tree_nodeS0_ES0_.str1.8 +1.09Ki  
> [NEW]
>   [ = ]       0 .shstrtab                                        +784   
> +44%
>   [ = ]       0 [ELF Headers]                                    +768   
> +16%
>   [ = ]       0 .comment                                         +666 
> +37e2%
> 
>  -------------- SHRINKING                                     
> --------------
>   [ = ]       0 .debug_line                                    -256Ki 
> -17.3%
>   [ = ]       0 .rela.debug_loc                               -73.6Ki  
> -0.6%
>   [ = ]       0 .debug_info                                   -63.4Ki  
> -1.6%
>   [ = ]       0 .debug_loc                                    -39.3Ki  
> -0.6%
> 
>   +1.1% +15.5Ki TOTAL                                         +3.67Mi  
> +7.8%
> 
> .debug_line probably shrinks because we drop columns with LTO.
> 
> Richard.
>