public inbox for dwz@sourceware.org
 help / color / mirror / Atom feed
From: Tom de Vries <tdevries@suse.de>
To: dwz@sourceware.org, jakub@redhat.com
Subject: [RFC] Try to ensure 62.5% occupancy in off_htab
Date: Tue, 01 Jan 2019 00:00:00 -0000	[thread overview]
Message-ID: <20191122150201.GA7055@delia> (raw)

Hi,

A noticible fact about the vmlinux test-case is the low final occupancy of
26% of off_htab:
...
searches: 83198411, collisions: 0.044957, occupancy: 0.260028, line: 11088
...

This is caused by the following.

The hashtable.c in the gcc repo started out with a growing strategy that
grows to twice the number of elements:
...
  new_htab = create_hash_table (htab->number_of_elements * 2,
                                htab->hash_function, htab->eq_function);
...

However, in 2000 commmit 07c967f908b changed that to twice the size:
...
  htab->size = higher_prime_number (htab->size * 2);
...

So, before this commit, in a hashtab with size 100 and 75 elements, we
would grow (ignoring the higher_prime_number complication) to size 150
(with occupancy 50%), but after this commit we would grow to size 200
(with occupancy 37.5%).

This change was reverted again in 2003 in commit b9af005f330:
...
    nsize = higher_prime_number ((htab->n_elements - htab->n_deleted) * 2);
...
But the hashtable.c in the dwz repo has the 'twice the size' variant.

If we change the hashtable implementation to use the 'twice the elements'
variant:
...
-  htab->size = higher_prime_number (htab->size * 2);
+  htab->size = higher_prime_number (htab_elements (htab) * 2);
...
on top of the commit before commit b7935b7 "Avoid htab_expand for off_htab",
we get an occupancy of 52% instead of 26:
...
 $ dwz -lnone -Lnone ./vmlinux-4.20.13-0-vanilla.debug -o 1 2>&1 | sort -V | tail -n 1
-searches: 83198411, collisions: 0.672008, occupancy: 0.260028, line: 11073
+searches: 83198411, collisions: 0.989007, occupancy: 0.520056, line: 11073
...

This patch doesn't change hashtab.c growth strategy, nor the emulation thereof
in emulate_htab.

But it does attempt to ensure at least a 62.5% occupancy (chosen as inbetween
the 75% growth trigger and the 50% after-growth target) for the estimated
final occupancy.

The resulting change is 52% occupancy (which is somewhat lower than the
targeted 62.5% due to the higher_prime_number usage):
...
-searches: 83198411, collisions: 0.044957, occupancy: 0.260028, line: 11088
+searches: 83198411, collisions: 0.060844, occupancy: 0.520056, line: 11094
...
which gives a ~10% reduction in memory usage:
...
$ time.sh dwz.1 -lnone -Lnone vmlinux -o 1 \
    | grep maxmem
maxmem: 2123368
$ time.sh dwz.2 -lnone -Lnone vmlinux -o 1 \
    | grep maxmem
maxmem: 1903212
...
without any adverse performance effects:
...
real:  mean:  27212.20  100.00%  stddev:  28.20
       mean:  27189.10   99.92%  stddev:  62.44
user:  mean:  26667.60  100.00%  stddev:  36.77
       mean:  26704.80  100.14%  stddev:  61.91
sys:   mean:    542.80  100.00%  stddev:  23.25
       mean:    482.40   88.87%  stddev:  36.33
...

For now, this is an RFC due having been validated on only one test-case.

---
 dwz.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/dwz.c b/dwz.c
index 945eae9..04d38cd 100644
--- a/dwz.c
+++ b/dwz.c
@@ -1292,6 +1292,13 @@ off_htab_add_die (dw_cu_ref cu, dw_die_ref die)
 	  size_t estimated_final_hashtab_size
 	    = emulate_htab (default_initial_size, estimated_nr_dies);
 	  initial_size = estimated_final_hashtab_size;
+	  if (8 * estimated_nr_dies < 5 * initial_size)
+	    {
+	      /* If occupancy is less 62.5% (5/8), try to make it at least
+		 that.  */
+	      size_t alt_size = higher_prime_number (8 * estimated_nr_dies / 5);
+	      initial_size = alt_size;
+	    }
 	}
       off_htab = htab_try_create (initial_size, off_hash, off_eq, NULL);
       if (off_htab == NULL)

Any comments?

Thanks,
- Tom

Try to ensure 62.5% occupancy in off_htab

                 reply	other threads:[~2019-11-22 15:02 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191122150201.GA7055@delia \
    --to=tdevries@suse.de \
    --cc=dwz@sourceware.org \
    --cc=jakub@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).