public inbox for dwz@sourceware.org
 help / color / mirror / Atom feed
* [committed] Improve off_hash
@ 2019-01-01  0:00 Tom de Vries
  0 siblings, 0 replies; only message in thread
From: Tom de Vries @ 2019-01-01  0:00 UTC (permalink / raw)
  To: dwz, jakub; +Cc: Michael Matz

Hi,

When profiling dwz (compiled at Makefile default -O2) with cc1 as argument:
...
$ perf record dwz -lnone cc1 -o 1
...
we get a top of profile like this:
...
  15,97%  dwz    dwz              [.] iterative_hash
  15,48%  dwz    dwz              [.] htab_find_with_hash
   8,71%  dwz    dwz              [.] off_eq
...

The high off_eq position hints at possible collision problems.

Using a debug patch that prints hashtab statistics upon deletion:
...
$ dwz -lnone cc1 -o 1 2>&1 | sort -V | tail -n 1
searches: 34256212, collisions: 0.551075, occupancy: 0.607308, line: 11087
...
we find the number of collisions for off_htab somewhat high given the
occupancy of 60%.

The corresponding hashing function is:
...
static hashval_t
off_hash (const void *p)
{
  dw_die_ref die = (dw_die_ref) p;
  return die->die_offset;
}

Then taking into account the start of cc1:
...
$ readelf -wi cc1 | grep ': Abbrev Number.*(DW_TAG'
 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
 <0><39>: Abbrev Number: 1 (DW_TAG_compile_unit)
 <1><4b>: Abbrev Number: 2 (DW_TAG_base_type)
 <1><52>: Abbrev Number: 2 (DW_TAG_base_type)
 <1><59>: Abbrev Number: 2 (DW_TAG_base_type)
 <1><60>: Abbrev Number: 2 (DW_TAG_base_type)
 <1><67>: Abbrev Number: 2 (DW_TAG_base_type)
 <1><6e>: Abbrev Number: 2 (DW_TAG_base_type)
 <1><75>: Abbrev Number: 3 (DW_TAG_base_type)
 <1><7c>: Abbrev Number: 2 (DW_TAG_base_type)
 <1><83>: Abbrev Number: 2 (DW_TAG_base_type)
 <1><8a>: Abbrev Number: 2 (DW_TAG_base_type)
 <1><91>: Abbrev Number: 4 (DW_TAG_variable)
 <1><a6>: Abbrev Number: 5 (DW_TAG_const_type)
...
and the way the hash is used to access the entries in htab_find_with_hash:
...
  size = htab->size;
  index = hash % size;
  entry = htab->entries[index];
...
we can see that this uses the entries array sparsely.  After adding the DIE at
0xa6, we've used 14 entries out of 166, so only 8%.  This way we'll quickly
wrap around, after which the chance of collisions increases.

Using a less sparse hashing function can improve this.  I've tried a hashing
function die->die_offset / x, with varying x, for both cc1:
...
1: searches: 34256212, collisions: 0.551075, occupancy: 0.607308, line: 11087
2: searches: 34256212, collisions: 0.492540, occupancy: 0.607308, line: 11087
3: searches: 34256212, collisions: 0.442651, occupancy: 0.607308, line: 11087
4: searches: 34256212, collisions: 0.370392, occupancy: 0.607308, line: 11087
5: searches: 34256212, collisions: 0.304453, occupancy: 0.607308, line: 11087
6: searches: 34256212, collisions: 0.206357, occupancy: 0.607308, line: 11087
7: searches: 34256212, collisions: 0.194195, occupancy: 0.607308, line: 11087
8: searches: 34256212, collisions: 0.324787, occupancy: 0.607308, line: 11087
9: searches: 34256212, collisions: 0.411821, occupancy: 0.607308, line: 11087
10: searches: 34256212, collisions: 0.473084, occupancy: 0.607308, line: 11087
11: searches: 34256212, collisions: 0.529094, occupancy: 0.607308, line: 11087
12: searches: 34256212, collisions: 0.582236, occupancy: 0.607308, line: 11087
...
as well as vmlinux:
...
1: searches: 83198411, collisions: 0.100454, occupancy: 0.260028, line: 11088
2: searches: 83198411, collisions: 0.070154, occupancy: 0.260028, line: 11088
3: searches: 83198411, collisions: 0.002834, occupancy: 0.260028, line: 11088
4: searches: 83198411, collisions: 0.003510, occupancy: 0.260028, line: 11088
5: searches: 83198411, collisions: 0.004286, occupancy: 0.260028, line: 11088
6: searches: 83198411, collisions: 0.044957, occupancy: 0.260028, line: 11088
7: searches: 83198411, collisions: 0.105956, occupancy: 0.260028, line: 11088
8: searches: 83198411, collisions: 0.172427, occupancy: 0.260028, line: 11088

There seems to be a sweet spot for cc1 around x == 6/7, at ~20% collision
rate.

For vmlinux, likewise around x == 3/4/5 at ~0.4% collision rate.

Weighing the cc1 sweet spot heavier because of the higher occupancy, we choose
x == 6 for further investigation.

Using x == 6, for cc1, we get ~18.5% performance improvement:
...
real:  mean:  6476.10  100.00%  stddev:  29.13
       mean:  5277.50   81.49%  stddev:  22.12
user:  mean:  6315.20  100.00%  stddev:  38.27
       mean:  5121.00   81.09%  stddev:  19.57
sys:   mean:   160.00  100.00%  stddev:  19.69
       mean:   155.60   97.25%  stddev:  13.39
...

And for vmlinux, around 2%:
...
real:  mean:  27038.20  100.00%  stddev:  59.43
       mean:  26410.80   97.68%  stddev:  42.40
user:  mean:  26741.10  100.00%  stddev:  74.41
       mean:  26110.30   97.64%  stddev:  40.49
sys:   mean:    294.80  100.00%  stddev:  27.78
       mean:    297.60  100.95%  stddev:  22.72
...

To confirm, I've tried a yet larger example, clang (with cc1 at 10.2 million
DIEs, vmlinux at 17.5 million DIES, and clang at 111.4 million DIEs):
...
real:  mean:  112530.67  100.00%  stddev:  3107.09
       mean:   89756.33   79.76%  stddev:  584.42
user:  mean:  102115.33  100.00%  stddev:  1349.92
       mean:   81865.00   80.17%  stddev:  25.71
sys:   mean:    5130.00  100.00%  stddev:  146.37
       mean:    4625.00   90.16%  stddev:  350.46
...
Also here the performance improvement is reproducible, at ~19%.

Also, I've tried an example with higher occupancy,
CheckerOptionHandlingAnalyzerPlugin.so at 48.6 million DIEs and occupancy 72%:
...
real:  mean:  53073.00  100.00% stddev:   28.84
       mean:  42556.00   80.18% stddev:  176.40
user:  mean:  51343.33  100.00% stddev:   84.22
       mean:  40924.00   79.71% stddev:  125.03
sys:   mean:   1726.67  100.00% stddev:   89.20
       mean:   1629.33   94.36% stddev:  114.01
...
Again here the performance improvement is reproducible, at ~19%.

Update off_hash to die->die_offset / 6.

Committed to trunk.

Thanks,
- Tom

Improve off_hash

2019-11-21  Tom de Vries  <tdevries@suse.de>

	* dwz.c (off_hash): Update to die->die_offset / 6.

---
 dwz.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dwz.c b/dwz.c
index 84a7080..945eae9 100644
--- a/dwz.c
+++ b/dwz.c
@@ -1233,7 +1233,7 @@ off_hash (const void *p)
 {
   dw_die_ref die = (dw_die_ref) p;
 
-  return die->die_offset;
+  return die->die_offset / 6;
 }
 
 /* Equality function for off_htab hash table.  */

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2019-11-22 13:19 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-01  0:00 [committed] Improve off_hash Tom de Vries

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).