From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id 594863858D28 for ; Wed, 18 Jan 2023 17:31:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 594863858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 95BB45C1AD for ; Wed, 18 Jan 2023 17:31:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1674063089; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=5Nxjte2l/OosmW72cT2cRawR/KT5E91cViZi3ZP67Us=; b=rF6X48zNm8h6RAQ8HJwft56y1JlPXNI4N3Azyx1VnI2FsBt3/SnlGK8Xeb5LzQA7l28NGl ZMNA5AlD6IYO5eawuCQ3fZxdCVY9vEi8szYw56iaT0TncbBTadY5efpSjmXHB75p3vXCPF Z9wG1RSKtibardvPEI3ES+iN0dquWYA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1674063089; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=5Nxjte2l/OosmW72cT2cRawR/KT5E91cViZi3ZP67Us=; b=V8UFxdd8kpPFWrMDUZHMsw7S4lTP3fUPhGAyvAZ3eozFVWK7+5/wwJhoT3vL+p5DRWaftK Ten739dVXngFehBA== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 91B5A2C141 for ; Wed, 18 Jan 2023 17:31:29 +0000 (UTC) Received: by wotan.suse.de (Postfix, from userid 10510) id 891556397; Wed, 18 Jan 2023 17:31:29 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by wotan.suse.de (Postfix) with ESMTP id 87DEB62E5 for ; Wed, 18 Jan 2023 17:31:29 +0000 (UTC) Date: Wed, 18 Jan 2023 17:31:29 +0000 (UTC) From: Michael Matz To: binutils@sourceware.org Subject: [PATCH 0/3] Speed up section merging Message-ID: User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Heyho, TLDR: 33% overall linking speed increase by better section merging. Ok for master? :-) when linking reasonably-sized inputs with enabled debug information the slowest single piece of linking time is section merging. We can speed it up quite a bit without resorting to things like multi threading. This mini-series does that. My testcase remains the same as in https://sourceware.org/pipermail/binutils/2022-November/124670.html . The important characteristics related to section merging are that .debug_str sections dominate (of course), and that it's merging 664 of these input sections of overall size 177MB into one of 10.7MB. (The other mergable sections, string or non-string aren't interesting for this measurement) I explain the strategy a bit in the overall comment. The speedup comes from a better hash function, using a power-of-two hash table and cacher-friendlier data structures. (Also we currently decompress the same input section potentially twice). The overall approach can also be used for the generic BFD hash table for symbols and I do have patches for that, but as it causes the symbol order to change and hence the objdump output it requires quite some testsuite adjustments so must remain for a different time. This is no problem for section merging as it maintains an insertion-order of all entries (and for strings it does a sorting step anyway). A little use of the bfd_hash_table code remains (the allocation and freeing), the result wasn't really more appealing when getting rid of that (and I hope to integrate this all back again when the above time for bfd_hash_table rewrite comes). The hash function itself is modelled after the xxh3 family of xxhash narrowed to 32bit and without possibility of seeding. I have tested its statistic properties generally (via the nice approach in http://web.archive.org/web/20060116064140/http://bretm.home.comcast.net/hash/ ) as well as on the concrete data on my cc1 testcase and it's satisfying for the problem at hand. It's actually much better than many other fast hash functions I tried (and only very little worse than xxh3 itself). I've also experimented with many lookup structures over the past months and this is what's fastest. I've regtested this on x86_64-linux for all of Alans targets. The single (harmless) regression is fixed by patch 2. This is in large part a rewrite of bfd/merge.c so the patch might be better to read after applying it. At this time the configure.ac change (i.e. using WORDS_BIGENDIAN) isn't strictly required, but it will be when the hash function is used for the symbol hash table (so that cross binutils will produce the same symbol table order) and it doesn't hurt now, so I've included it. The result of the patches are this: without patch: 3.7 seconds overall linking time and this profile (only stuff related to section merging, i.e. decompressing, memcmp, string compares, the core merging routines): 24.64% 3429 ld-new [.] sec_merge_hash_lookup 6.59% 920 libc.so.6 [.] __memcmp_avx2_movbe 3.52% 504 libz.so.1.2.11 [.] adler32_z 2.66% 378 ld-new [.] _bfd_merged_section_offset 2.41% 327 ld-new [.] strrevcmp 2.05% 295 libz.so.1.2.11 [.] 0x000000000000a2f0 2.04% 294 libz.so.1.2.11 [.] 0x000000000000a36a 1.80% 260 libz.so.1.2.11 [.] 0x000000000000aaa4 1.77% 308 ld-new [.] bfd_hash_lookup 1.57% 226 libz.so.1.2.11 [.] 0x000000000000a30d 1.22% 176 libz.so.1.2.11 [.] 0x000000000000a2e4 1.16% 157 ld-new [.] strrevcmp 1.10% 157 libz.so.1.2.11 [.] 0x000000000000a370 1.07% 154 libz.so.1.2.11 [.] 0x000000000000a44f with patch: 2.4 seconds overall linking time and this profile: 6.92% 611 ld-new [.] _bfd_merge_sections 5.59% 517 ld-new [.] _bfd_merged_section_offset 3.62% 327 libz.so.1.2.11 [.] adler32_z 3.47% 308 libc.so.6 [.] __memcmp_avx2_movbe 3.18% 344 ld-new [.] bfd_hash_lookup 2.43% 207 ld-new [.] strrevcmp 2.25% 203 libz.so.1.2.11 [.] 0x000000000000a2f0 1.90% 172 libz.so.1.2.11 [.] 0x000000000000a36a 1.83% 165 libz.so.1.2.11 [.] 0x000000000000a30d 1.82% 154 ld-new [.] strrevcmp 1.71% 155 libz.so.1.2.11 [.] 0x000000000000aaa4 1.18% 108 libz.so.1.2.11 [.] 0x000000000000a2e4 Michael Matz (3): Faster string merging arm32: Fix rodata-merge-map Add testcase ld-elf/merge4 bfd/config.in | 15 + bfd/configure | 226 +++++++ bfd/configure.ac | 2 + bfd/elflink.c | 7 + bfd/merge.c | 793 ++++++++++++++--------- ld/testsuite/ld-arm/rodata-merge-map.sym | 3 +- ld/testsuite/ld-arm/rodata-merge-map3.s | 5 +- ld/testsuite/ld-elf/elf.exp | 6 + ld/testsuite/ld-elf/merge4.out | 3 + ld/testsuite/ld-elf/merge4a.c | 23 + ld/testsuite/ld-elf/merge4b.s | 23 + 11 files changed, 798 insertions(+), 308 deletions(-) create mode 100644 ld/testsuite/ld-elf/merge4.out create mode 100644 ld/testsuite/ld-elf/merge4a.c create mode 100644 ld/testsuite/ld-elf/merge4b.s -- 2.36.1