From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gnu.wildebeest.org (gnu.wildebeest.org [45.83.234.184]) by sourceware.org (Postfix) with ESMTPS id 73C543858C27 for ; Mon, 3 Jan 2022 22:06:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 73C543858C27 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=klomp.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=klomp.org Received: from reform (deer0x0f.wildebeest.org [172.31.17.145]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id 231FD3000913; Mon, 3 Jan 2022 23:06:25 +0100 (CET) Received: by reform (Postfix, from userid 1000) id 82C032E80562; Mon, 3 Jan 2022 23:06:25 +0100 (CET) Date: Mon, 3 Jan 2022 23:06:25 +0100 From: Mark Wielaard To: Martin =?utf-8?B?TGnFoWth?= Cc: Tom de Vries , dwz@sourceware.org, Jakub Jelinek , Michael Matz Subject: Re: [Highlight] Performance improvements Message-ID: References: <5000ad54-f6c7-a164-7519-82e84f91f6db@suse.de> <20463996-a518-c26a-3c68-65b91606d84c@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20463996-a518-c26a-3c68-65b91606d84c@suse.cz> X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: dwz@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Dwz mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Jan 2022 22:06:30 -0000 Hi Martin, I noticed that this is a reply to a thread from 2 years ago. Is it related to the work mentioned by Tom in that thread? On Thu, Dec 23, 2021 at 12:57:48PM +0100, Martin Liška wrote: > I've made couple of experiments with dwz speed. I've taken the following packages: > gcc, krita, libetonyek, rtags, sysdig and run dwz -m x ... for them. > > There are numbers I collected for the following configurations: > dwz (system package, built with LTO and -O2), dwz-O2_lto is supposed > to be the same (built from source), then I experimented with -O3 and PGO > (based on tramp3d copies 4 times). And the final run is experimental patch > I have that replaces the iterative_hash with xxhash: > https://github.com/Cyan4973/xxHash > > # 1/5: sysdig (60M) > dwz : 10.0 > dwz : 9.8 (98.7%) > dwz-O2_lto : 9.5 (95.6%) > dwz-O3_lto : 9.2 (91.9%) > dwz-O3_lto_pgo : 8.1 (81.3%) > dwz-O3_lto_pgo_xxhash : 7.3 (72.9%) > # 2/5: rtags (148M) > dwz : 19.6 > dwz : 19.6 (99.9%) > dwz-O2_lto : 17.4 (89.0%) > dwz-O3_lto : 16.7 (85.4%) > dwz-O3_lto_pgo : 14.4 (73.6%) > dwz-O3_lto_pgo_xxhash : 13.2 (67.6%) > # 3/5: libetonyek (112M) > dwz : 10.5 > dwz : 10.5 (100.6%) > dwz-O2_lto : 10.8 (102.8%) > dwz-O3_lto : 10.1 (96.7%) > dwz-O3_lto_pgo : 9.1 (87.4%) > dwz-O3_lto_pgo_xxhash : 8.1 (77.1%) > # 4/5: krita (685M) > dwz : 133.7 > dwz : 134.3 (100.5%) > dwz-O2_lto : 95.3 (71.3%) > dwz-O3_lto : 91.2 (68.2%) > dwz-O3_lto_pgo : 78.9 (59.0%) > dwz-O3_lto_pgo_xxhash : 71.6 (53.5%) > # 5/5: gcc (1.2G) > dwz : 61.9 > dwz : 61.9 (99.9%) > dwz-O2_lto : 58.5 (94.5%) > dwz-O3_lto : 56.6 (91.3%) > dwz-O3_lto_pgo : 54.1 (87.4%) > dwz-O3_lto_pgo_xxhash : 51.7 (83.4%) > > So as seen, using -O3 really help, one gets a bigger binary, but as dwz is small > it's negligible: > > bloaty dwz-O3_lto -- dwz-O2_lto > FILE SIZE VM SIZE > -------------- -------------- > +28% +50.3Ki [ = ] 0 .debug_loclists > +18% +25.3Ki +18% +25.3Ki .text > +12% +24.6Ki [ = ] 0 .debug_info > +16% +17.3Ki [ = ] 0 .debug_line > +31% +6.19Ki [ = ] 0 .debug_rnglists > +11% +689 [ = ] 0 .debug_abbrev > +7.1% +633 [ = ] 0 .strtab > +5.5% +504 +5.5% +504 .eh_frame > +1.3% +453 [ = ] 0 .debug_str > +0.8% +375 +0.8% +375 .rodata > +2.8% +336 [ = ] 0 .symtab > +11% +64 [ = ] 0 .debug_aranges > +4.2% +64 +4.4% +64 .eh_frame_hdr > [ = ] 0 +1.8% +32 .bss > -3.1% -21 -3.1% -21 [LOAD #2 [RX]] > -61.0% -2.20Ki [ = ] 0 [Unmapped] > +16% +124Ki +13% +26.2Ki TOTAL > > Then, PGO also helps significantly. And finally, using xxhash one can get 5-10% percent > improvement. > > For now I'm suggesting using -O3 and PGO for our openSUSE package: > https://build.opensuse.org/request/show/942235 > > Upstream questions I have: > - What about changing -O2 with -O3 by default? Did you test that without -flto? If it still gets a ~5% speedup then I like that idea. Or maybe we should also include -flto by default? > - Are you interested in the xxhash patch? Do you want it as a conditional build > or may I replace the currently existing hash function? I think it is best to simply replace the existing hash function instead of making it a conditional thing. Does it rely on having the libxxhash dynamic library available or would we simply embed a copy (replacing the hashtab.[ch] files)? Cheers, Mark