From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 107338 invoked by alias); 22 Nov 2019 15:02:08 -0000 Mailing-List: contact dwz-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Post: List-Help: List-Subscribe: Sender: dwz-owner@sourceware.org Received: (qmail 107321 invoked by uid 89); 22 Nov 2019 15:02:08 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Checked: by ClamAV 0.100.3 on sourceware.org X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.1 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_PASS autolearn=ham version=3.3.1 spammy=adverse, strategy, 2820, sk:number_ X-Spam-Status: No, score=-25.1 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_PASS autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on sourceware.org X-Spam-Level: X-HELO: mx1.suse.de X-Virus-Scanned: by amavisd-new at test-mx.suse.de Date: Tue, 01 Jan 2019 00:00:00 -0000 From: Tom de Vries To: dwz@sourceware.org, jakub@redhat.com Subject: [RFC] Try to ensure 62.5% occupancy in off_htab Message-ID: <20191122150201.GA7055@delia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) X-SW-Source: 2019-q4/txt/msg00065.txt.bz2 Hi, A noticible fact about the vmlinux test-case is the low final occupancy of 26% of off_htab: ... searches: 83198411, collisions: 0.044957, occupancy: 0.260028, line: 11088 ... This is caused by the following. The hashtable.c in the gcc repo started out with a growing strategy that grows to twice the number of elements: ... new_htab = create_hash_table (htab->number_of_elements * 2, htab->hash_function, htab->eq_function); ... However, in 2000 commmit 07c967f908b changed that to twice the size: ... htab->size = higher_prime_number (htab->size * 2); ... So, before this commit, in a hashtab with size 100 and 75 elements, we would grow (ignoring the higher_prime_number complication) to size 150 (with occupancy 50%), but after this commit we would grow to size 200 (with occupancy 37.5%). This change was reverted again in 2003 in commit b9af005f330: ... nsize = higher_prime_number ((htab->n_elements - htab->n_deleted) * 2); ... But the hashtable.c in the dwz repo has the 'twice the size' variant. If we change the hashtable implementation to use the 'twice the elements' variant: ... - htab->size = higher_prime_number (htab->size * 2); + htab->size = higher_prime_number (htab_elements (htab) * 2); ... on top of the commit before commit b7935b7 "Avoid htab_expand for off_htab", we get an occupancy of 52% instead of 26: ... $ dwz -lnone -Lnone ./vmlinux-4.20.13-0-vanilla.debug -o 1 2>&1 | sort -V | tail -n 1 -searches: 83198411, collisions: 0.672008, occupancy: 0.260028, line: 11073 +searches: 83198411, collisions: 0.989007, occupancy: 0.520056, line: 11073 ... This patch doesn't change hashtab.c growth strategy, nor the emulation thereof in emulate_htab. But it does attempt to ensure at least a 62.5% occupancy (chosen as inbetween the 75% growth trigger and the 50% after-growth target) for the estimated final occupancy. The resulting change is 52% occupancy (which is somewhat lower than the targeted 62.5% due to the higher_prime_number usage): ... -searches: 83198411, collisions: 0.044957, occupancy: 0.260028, line: 11088 +searches: 83198411, collisions: 0.060844, occupancy: 0.520056, line: 11094 ... which gives a ~10% reduction in memory usage: ... $ time.sh dwz.1 -lnone -Lnone vmlinux -o 1 \ | grep maxmem maxmem: 2123368 $ time.sh dwz.2 -lnone -Lnone vmlinux -o 1 \ | grep maxmem maxmem: 1903212 ... without any adverse performance effects: ... real: mean: 27212.20 100.00% stddev: 28.20 mean: 27189.10 99.92% stddev: 62.44 user: mean: 26667.60 100.00% stddev: 36.77 mean: 26704.80 100.14% stddev: 61.91 sys: mean: 542.80 100.00% stddev: 23.25 mean: 482.40 88.87% stddev: 36.33 ... For now, this is an RFC due having been validated on only one test-case. --- dwz.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/dwz.c b/dwz.c index 945eae9..04d38cd 100644 --- a/dwz.c +++ b/dwz.c @@ -1292,6 +1292,13 @@ off_htab_add_die (dw_cu_ref cu, dw_die_ref die) size_t estimated_final_hashtab_size = emulate_htab (default_initial_size, estimated_nr_dies); initial_size = estimated_final_hashtab_size; + if (8 * estimated_nr_dies < 5 * initial_size) + { + /* If occupancy is less 62.5% (5/8), try to make it at least + that. */ + size_t alt_size = higher_prime_number (8 * estimated_nr_dies / 5); + initial_size = alt_size; + } } off_htab = htab_try_create (initial_size, off_hash, off_eq, NULL); if (off_htab == NULL) Any comments? Thanks, - Tom Try to ensure 62.5% occupancy in off_htab