[PATCH 0/3] Speed up section merging

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

* [PATCH 0/3] Speed up section merging
@ 2023-01-18 17:31 Michael Matz
  2023-01-20  7:12 ` Alan Modra
  0 siblings, 1 reply; 2+ messages in thread
From: Michael Matz @ 2023-01-18 17:31 UTC (permalink / raw)
  To: binutils

Heyho,

TLDR: 33% overall linking speed increase by better section merging.
Ok for master? :-)

when linking reasonably-sized inputs with enabled debug information
the slowest single piece of linking time is section merging.  We can
speed it up quite a bit without resorting to things like multi
threading.  This mini-series does that.  My testcase remains the same
as in
  https://sourceware.org/pipermail/binutils/2022-November/124670.html .

The important characteristics related to section merging are that
.debug_str sections dominate (of course), and that it's merging 664
of these input sections of overall size 177MB into one of 10.7MB.
(The other mergable sections, string or non-string aren't interesting
for this measurement)

I explain the strategy a bit in the overall comment.  The speedup comes
from a better hash function, using a power-of-two hash table and
cacher-friendlier data structures.  (Also we currently decompress the
same input section potentially twice).  The overall approach can also be
used for the generic BFD hash table for symbols and I do have patches
for that, but as it causes the symbol order to change and hence the
objdump output it requires quite some testsuite adjustments so must
remain for a different time.  This is no problem for section merging as
it maintains an insertion-order of all entries (and for strings it does
a sorting step anyway).

A little use of the bfd_hash_table code remains (the allocation and
freeing), the result wasn't really more appealing when getting rid of
that (and I hope to integrate this all back again when the above time
for bfd_hash_table rewrite comes).

The hash function itself is modelled after the xxh3 family of xxhash
narrowed to 32bit and without possibility of seeding.  I have tested its
statistic properties generally (via the nice approach in
  http://web.archive.org/web/20060116064140/http://bretm.home.comcast.net/hash/
) as well as on the concrete data on my cc1 testcase and it's satisfying
for the problem at hand.  It's actually much better than many
other fast hash functions I tried (and only very little worse than xxh3
itself).  I've also experimented with many lookup structures over the
past months and this is what's fastest.

I've regtested this on x86_64-linux for all of Alans targets.  The
single (harmless) regression is fixed by patch 2.

This is in large part a rewrite of bfd/merge.c so the patch might be
better to read after applying it.  At this time the configure.ac change
(i.e. using WORDS_BIGENDIAN) isn't strictly required, but it will be
when the hash function is used for the symbol hash table (so that cross
binutils will produce the same symbol table order) and it doesn't hurt
now, so I've included it.

The result of the patches are this:
without patch: 3.7 seconds overall linking time and this profile (only
stuff related to section merging, i.e. decompressing, memcmp, string
compares, the core merging routines):

    24.64%   3429  ld-new                [.] sec_merge_hash_lookup
     6.59%    920  libc.so.6             [.] __memcmp_avx2_movbe
     3.52%    504  libz.so.1.2.11        [.] adler32_z
     2.66%    378  ld-new                [.] _bfd_merged_section_offset
     2.41%    327  ld-new                [.] strrevcmp
     2.05%    295  libz.so.1.2.11        [.] 0x000000000000a2f0
     2.04%    294  libz.so.1.2.11        [.] 0x000000000000a36a
     1.80%    260  libz.so.1.2.11        [.] 0x000000000000aaa4
     1.77%    308  ld-new                [.] bfd_hash_lookup
     1.57%    226  libz.so.1.2.11        [.] 0x000000000000a30d
     1.22%    176  libz.so.1.2.11        [.] 0x000000000000a2e4
     1.16%    157  ld-new                [.] strrevcmp
     1.10%    157  libz.so.1.2.11        [.] 0x000000000000a370
     1.07%    154  libz.so.1.2.11        [.] 0x000000000000a44f

with patch: 2.4 seconds overall linking time and this profile:

     6.92%    611  ld-new                [.] _bfd_merge_sections
     5.59%    517  ld-new                [.] _bfd_merged_section_offset
     3.62%    327  libz.so.1.2.11        [.] adler32_z
     3.47%    308  libc.so.6             [.] __memcmp_avx2_movbe
     3.18%    344  ld-new                [.] bfd_hash_lookup
     2.43%    207  ld-new                [.] strrevcmp
     2.25%    203  libz.so.1.2.11        [.] 0x000000000000a2f0
     1.90%    172  libz.so.1.2.11        [.] 0x000000000000a36a
     1.83%    165  libz.so.1.2.11        [.] 0x000000000000a30d
     1.82%    154  ld-new                [.] strrevcmp
     1.71%    155  libz.so.1.2.11        [.] 0x000000000000aaa4
     1.18%    108  libz.so.1.2.11        [.] 0x000000000000a2e4

Michael Matz (3):
  Faster string merging
  arm32: Fix rodata-merge-map
  Add testcase ld-elf/merge4

 bfd/config.in                            |  15 +
 bfd/configure                            | 226 +++++++
 bfd/configure.ac                         |   2 +
 bfd/elflink.c                            |   7 +
 bfd/merge.c                              | 793 ++++++++++++++---------
 ld/testsuite/ld-arm/rodata-merge-map.sym |   3 +-
 ld/testsuite/ld-arm/rodata-merge-map3.s  |   5 +-
 ld/testsuite/ld-elf/elf.exp              |   6 +
 ld/testsuite/ld-elf/merge4.out           |   3 +
 ld/testsuite/ld-elf/merge4a.c            |  23 +
 ld/testsuite/ld-elf/merge4b.s            |  23 +
 11 files changed, 798 insertions(+), 308 deletions(-)
 create mode 100644 ld/testsuite/ld-elf/merge4.out
 create mode 100644 ld/testsuite/ld-elf/merge4a.c
 create mode 100644 ld/testsuite/ld-elf/merge4b.s

-- 
2.36.1

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH 0/3] Speed up section merging
  2023-01-18 17:31 [PATCH 0/3] Speed up section merging Michael Matz
@ 2023-01-20  7:12 ` Alan Modra
  0 siblings, 0 replies; 2+ messages in thread
From: Alan Modra @ 2023-01-20  7:12 UTC (permalink / raw)
  To: Michael Matz; +Cc: binutils

On Wed, Jan 18, 2023 at 05:31:29PM +0000, Michael Matz via Binutils wrote:
>   Faster string merging
>   arm32: Fix rodata-merge-map
>   Add testcase ld-elf/merge4

OK.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-01-20  7:12 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-18 17:31 [PATCH 0/3] Speed up section merging Michael Matz
2023-01-20  7:12 ` Alan Modra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).