* [PATCH 0/3] Speed up section merging
@ 2023-01-18 17:31 Michael Matz
2023-01-20 7:12 ` Alan Modra
0 siblings, 1 reply; 2+ messages in thread
From: Michael Matz @ 2023-01-18 17:31 UTC (permalink / raw)
To: binutils
Heyho,
TLDR: 33% overall linking speed increase by better section merging.
Ok for master? :-)
when linking reasonably-sized inputs with enabled debug information
the slowest single piece of linking time is section merging. We can
speed it up quite a bit without resorting to things like multi
threading. This mini-series does that. My testcase remains the same
as in
https://sourceware.org/pipermail/binutils/2022-November/124670.html .
The important characteristics related to section merging are that
.debug_str sections dominate (of course), and that it's merging 664
of these input sections of overall size 177MB into one of 10.7MB.
(The other mergable sections, string or non-string aren't interesting
for this measurement)
I explain the strategy a bit in the overall comment. The speedup comes
from a better hash function, using a power-of-two hash table and
cacher-friendlier data structures. (Also we currently decompress the
same input section potentially twice). The overall approach can also be
used for the generic BFD hash table for symbols and I do have patches
for that, but as it causes the symbol order to change and hence the
objdump output it requires quite some testsuite adjustments so must
remain for a different time. This is no problem for section merging as
it maintains an insertion-order of all entries (and for strings it does
a sorting step anyway).
A little use of the bfd_hash_table code remains (the allocation and
freeing), the result wasn't really more appealing when getting rid of
that (and I hope to integrate this all back again when the above time
for bfd_hash_table rewrite comes).
The hash function itself is modelled after the xxh3 family of xxhash
narrowed to 32bit and without possibility of seeding. I have tested its
statistic properties generally (via the nice approach in
http://web.archive.org/web/20060116064140/http://bretm.home.comcast.net/hash/
) as well as on the concrete data on my cc1 testcase and it's satisfying
for the problem at hand. It's actually much better than many
other fast hash functions I tried (and only very little worse than xxh3
itself). I've also experimented with many lookup structures over the
past months and this is what's fastest.
I've regtested this on x86_64-linux for all of Alans targets. The
single (harmless) regression is fixed by patch 2.
This is in large part a rewrite of bfd/merge.c so the patch might be
better to read after applying it. At this time the configure.ac change
(i.e. using WORDS_BIGENDIAN) isn't strictly required, but it will be
when the hash function is used for the symbol hash table (so that cross
binutils will produce the same symbol table order) and it doesn't hurt
now, so I've included it.
The result of the patches are this:
without patch: 3.7 seconds overall linking time and this profile (only
stuff related to section merging, i.e. decompressing, memcmp, string
compares, the core merging routines):
24.64% 3429 ld-new [.] sec_merge_hash_lookup
6.59% 920 libc.so.6 [.] __memcmp_avx2_movbe
3.52% 504 libz.so.1.2.11 [.] adler32_z
2.66% 378 ld-new [.] _bfd_merged_section_offset
2.41% 327 ld-new [.] strrevcmp
2.05% 295 libz.so.1.2.11 [.] 0x000000000000a2f0
2.04% 294 libz.so.1.2.11 [.] 0x000000000000a36a
1.80% 260 libz.so.1.2.11 [.] 0x000000000000aaa4
1.77% 308 ld-new [.] bfd_hash_lookup
1.57% 226 libz.so.1.2.11 [.] 0x000000000000a30d
1.22% 176 libz.so.1.2.11 [.] 0x000000000000a2e4
1.16% 157 ld-new [.] strrevcmp
1.10% 157 libz.so.1.2.11 [.] 0x000000000000a370
1.07% 154 libz.so.1.2.11 [.] 0x000000000000a44f
with patch: 2.4 seconds overall linking time and this profile:
6.92% 611 ld-new [.] _bfd_merge_sections
5.59% 517 ld-new [.] _bfd_merged_section_offset
3.62% 327 libz.so.1.2.11 [.] adler32_z
3.47% 308 libc.so.6 [.] __memcmp_avx2_movbe
3.18% 344 ld-new [.] bfd_hash_lookup
2.43% 207 ld-new [.] strrevcmp
2.25% 203 libz.so.1.2.11 [.] 0x000000000000a2f0
1.90% 172 libz.so.1.2.11 [.] 0x000000000000a36a
1.83% 165 libz.so.1.2.11 [.] 0x000000000000a30d
1.82% 154 ld-new [.] strrevcmp
1.71% 155 libz.so.1.2.11 [.] 0x000000000000aaa4
1.18% 108 libz.so.1.2.11 [.] 0x000000000000a2e4
Michael Matz (3):
Faster string merging
arm32: Fix rodata-merge-map
Add testcase ld-elf/merge4
bfd/config.in | 15 +
bfd/configure | 226 +++++++
bfd/configure.ac | 2 +
bfd/elflink.c | 7 +
bfd/merge.c | 793 ++++++++++++++---------
ld/testsuite/ld-arm/rodata-merge-map.sym | 3 +-
ld/testsuite/ld-arm/rodata-merge-map3.s | 5 +-
ld/testsuite/ld-elf/elf.exp | 6 +
ld/testsuite/ld-elf/merge4.out | 3 +
ld/testsuite/ld-elf/merge4a.c | 23 +
ld/testsuite/ld-elf/merge4b.s | 23 +
11 files changed, 798 insertions(+), 308 deletions(-)
create mode 100644 ld/testsuite/ld-elf/merge4.out
create mode 100644 ld/testsuite/ld-elf/merge4a.c
create mode 100644 ld/testsuite/ld-elf/merge4b.s
--
2.36.1
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH 0/3] Speed up section merging
2023-01-18 17:31 [PATCH 0/3] Speed up section merging Michael Matz
@ 2023-01-20 7:12 ` Alan Modra
0 siblings, 0 replies; 2+ messages in thread
From: Alan Modra @ 2023-01-20 7:12 UTC (permalink / raw)
To: Michael Matz; +Cc: binutils
On Wed, Jan 18, 2023 at 05:31:29PM +0000, Michael Matz via Binutils wrote:
> Faster string merging
> arm32: Fix rodata-merge-map
> Add testcase ld-elf/merge4
OK.
--
Alan Modra
Australia Development Lab, IBM
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-01-20 7:12 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-18 17:31 [PATCH 0/3] Speed up section merging Michael Matz
2023-01-20 7:12 ` Alan Modra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).