* [committed] Improve off_hash
@ 2019-01-01 0:00 Tom de Vries
0 siblings, 0 replies; only message in thread
From: Tom de Vries @ 2019-01-01 0:00 UTC (permalink / raw)
To: dwz, jakub; +Cc: Michael Matz
Hi,
When profiling dwz (compiled at Makefile default -O2) with cc1 as argument:
...
$ perf record dwz -lnone cc1 -o 1
...
we get a top of profile like this:
...
15,97% dwz dwz [.] iterative_hash
15,48% dwz dwz [.] htab_find_with_hash
8,71% dwz dwz [.] off_eq
...
The high off_eq position hints at possible collision problems.
Using a debug patch that prints hashtab statistics upon deletion:
...
$ dwz -lnone cc1 -o 1 2>&1 | sort -V | tail -n 1
searches: 34256212, collisions: 0.551075, occupancy: 0.607308, line: 11087
...
we find the number of collisions for off_htab somewhat high given the
occupancy of 60%.
The corresponding hashing function is:
...
static hashval_t
off_hash (const void *p)
{
dw_die_ref die = (dw_die_ref) p;
return die->die_offset;
}
Then taking into account the start of cc1:
...
$ readelf -wi cc1 | grep ': Abbrev Number.*(DW_TAG'
<0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
<0><39>: Abbrev Number: 1 (DW_TAG_compile_unit)
<1><4b>: Abbrev Number: 2 (DW_TAG_base_type)
<1><52>: Abbrev Number: 2 (DW_TAG_base_type)
<1><59>: Abbrev Number: 2 (DW_TAG_base_type)
<1><60>: Abbrev Number: 2 (DW_TAG_base_type)
<1><67>: Abbrev Number: 2 (DW_TAG_base_type)
<1><6e>: Abbrev Number: 2 (DW_TAG_base_type)
<1><75>: Abbrev Number: 3 (DW_TAG_base_type)
<1><7c>: Abbrev Number: 2 (DW_TAG_base_type)
<1><83>: Abbrev Number: 2 (DW_TAG_base_type)
<1><8a>: Abbrev Number: 2 (DW_TAG_base_type)
<1><91>: Abbrev Number: 4 (DW_TAG_variable)
<1><a6>: Abbrev Number: 5 (DW_TAG_const_type)
...
and the way the hash is used to access the entries in htab_find_with_hash:
...
size = htab->size;
index = hash % size;
entry = htab->entries[index];
...
we can see that this uses the entries array sparsely. After adding the DIE at
0xa6, we've used 14 entries out of 166, so only 8%. This way we'll quickly
wrap around, after which the chance of collisions increases.
Using a less sparse hashing function can improve this. I've tried a hashing
function die->die_offset / x, with varying x, for both cc1:
...
1: searches: 34256212, collisions: 0.551075, occupancy: 0.607308, line: 11087
2: searches: 34256212, collisions: 0.492540, occupancy: 0.607308, line: 11087
3: searches: 34256212, collisions: 0.442651, occupancy: 0.607308, line: 11087
4: searches: 34256212, collisions: 0.370392, occupancy: 0.607308, line: 11087
5: searches: 34256212, collisions: 0.304453, occupancy: 0.607308, line: 11087
6: searches: 34256212, collisions: 0.206357, occupancy: 0.607308, line: 11087
7: searches: 34256212, collisions: 0.194195, occupancy: 0.607308, line: 11087
8: searches: 34256212, collisions: 0.324787, occupancy: 0.607308, line: 11087
9: searches: 34256212, collisions: 0.411821, occupancy: 0.607308, line: 11087
10: searches: 34256212, collisions: 0.473084, occupancy: 0.607308, line: 11087
11: searches: 34256212, collisions: 0.529094, occupancy: 0.607308, line: 11087
12: searches: 34256212, collisions: 0.582236, occupancy: 0.607308, line: 11087
...
as well as vmlinux:
...
1: searches: 83198411, collisions: 0.100454, occupancy: 0.260028, line: 11088
2: searches: 83198411, collisions: 0.070154, occupancy: 0.260028, line: 11088
3: searches: 83198411, collisions: 0.002834, occupancy: 0.260028, line: 11088
4: searches: 83198411, collisions: 0.003510, occupancy: 0.260028, line: 11088
5: searches: 83198411, collisions: 0.004286, occupancy: 0.260028, line: 11088
6: searches: 83198411, collisions: 0.044957, occupancy: 0.260028, line: 11088
7: searches: 83198411, collisions: 0.105956, occupancy: 0.260028, line: 11088
8: searches: 83198411, collisions: 0.172427, occupancy: 0.260028, line: 11088
There seems to be a sweet spot for cc1 around x == 6/7, at ~20% collision
rate.
For vmlinux, likewise around x == 3/4/5 at ~0.4% collision rate.
Weighing the cc1 sweet spot heavier because of the higher occupancy, we choose
x == 6 for further investigation.
Using x == 6, for cc1, we get ~18.5% performance improvement:
...
real: mean: 6476.10 100.00% stddev: 29.13
mean: 5277.50 81.49% stddev: 22.12
user: mean: 6315.20 100.00% stddev: 38.27
mean: 5121.00 81.09% stddev: 19.57
sys: mean: 160.00 100.00% stddev: 19.69
mean: 155.60 97.25% stddev: 13.39
...
And for vmlinux, around 2%:
...
real: mean: 27038.20 100.00% stddev: 59.43
mean: 26410.80 97.68% stddev: 42.40
user: mean: 26741.10 100.00% stddev: 74.41
mean: 26110.30 97.64% stddev: 40.49
sys: mean: 294.80 100.00% stddev: 27.78
mean: 297.60 100.95% stddev: 22.72
...
To confirm, I've tried a yet larger example, clang (with cc1 at 10.2 million
DIEs, vmlinux at 17.5 million DIES, and clang at 111.4 million DIEs):
...
real: mean: 112530.67 100.00% stddev: 3107.09
mean: 89756.33 79.76% stddev: 584.42
user: mean: 102115.33 100.00% stddev: 1349.92
mean: 81865.00 80.17% stddev: 25.71
sys: mean: 5130.00 100.00% stddev: 146.37
mean: 4625.00 90.16% stddev: 350.46
...
Also here the performance improvement is reproducible, at ~19%.
Also, I've tried an example with higher occupancy,
CheckerOptionHandlingAnalyzerPlugin.so at 48.6 million DIEs and occupancy 72%:
...
real: mean: 53073.00 100.00% stddev: 28.84
mean: 42556.00 80.18% stddev: 176.40
user: mean: 51343.33 100.00% stddev: 84.22
mean: 40924.00 79.71% stddev: 125.03
sys: mean: 1726.67 100.00% stddev: 89.20
mean: 1629.33 94.36% stddev: 114.01
...
Again here the performance improvement is reproducible, at ~19%.
Update off_hash to die->die_offset / 6.
Committed to trunk.
Thanks,
- Tom
Improve off_hash
2019-11-21 Tom de Vries <tdevries@suse.de>
* dwz.c (off_hash): Update to die->die_offset / 6.
---
dwz.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/dwz.c b/dwz.c
index 84a7080..945eae9 100644
--- a/dwz.c
+++ b/dwz.c
@@ -1233,7 +1233,7 @@ off_hash (const void *p)
{
dw_die_ref die = (dw_die_ref) p;
- return die->die_offset;
+ return die->die_offset / 6;
}
/* Equality function for off_htab hash table. */
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2019-11-22 13:19 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-01 0:00 [committed] Improve off_hash Tom de Vries
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).