From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4968 invoked by alias); 18 Dec 2018 17:28:35 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 4953 invoked by uid 89); 18 Dec 2018 17:28:34 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-4.4 required=5.0 tests=BAYES_00,GIT_PATCH_1,KAM_LAZY_DOMAIN_SECURITY,KAM_STOCKGEN,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=percentage, apologize, dict_language, la_language X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 18 Dec 2018 17:28:32 +0000 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0A71F81DEA; Tue, 18 Dec 2018 17:28:31 +0000 (UTC) Received: from theo.uglyboxes.com (ovpn04.gateway.prod.ext.phx2.redhat.com [10.5.9.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C463F5D9CA; Tue, 18 Dec 2018 17:28:30 +0000 (UTC) Subject: Re: [PATCH 1/5] gdb/23712: Introduce multidictionary's To: Tom Tromey Cc: gdb-patches@sourceware.org References: <20181109022847.32049-1-keiths@redhat.com> <87lg4p4d92.fsf@tromey.com> From: Keith Seitz Message-ID: <36e34a33-55bb-07aa-13f4-70b83f5026bf@redhat.com> Date: Tue, 18 Dec 2018 17:28:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 MIME-Version: 1.0 In-Reply-To: <87lg4p4d92.fsf@tromey.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2018-12/txt/msg00206.txt.bz2 Hi, Tom, On 12/16/18 9:01 AM, Tom Tromey wrote: > > I wonder a bit more about the memory overhead, though. I see that I didn't include this information in my original submission. I apologize for that because, of course, our own gdb.perf has this information. For completeness, I've copied the vmsize results that I collected during my original testing. Difference to unpatched master (at the time of submission -- multiply by 100 for percentage), vmsize: gmonster1:gmonster-null-lookup 10000-cus 0.00132458225987 gmonster1:gmonster-pervasive-typedef vmsize 10000-cus -0.000998352718015 gmonster1:gmonster-print-cerr 10000-cus -0.018685850806828 gmonster1:gmonster-ptype-string 10000-cus 0.005297872904029 gmonster1:gmonster-runto-main 10000-cus 0.018689093578962 gmonster1:gmonster-select-file 10000-cus 0.007689710164794 gmonster2:gmonster-null-lookup 1000-sos 0.009959564169472 gmonster2:gmonster-pervasive-typedef 1000-sos 0.000 gmonster2:gmonster-print-cerr 1000-sos 0.002044195506858 gmonster2:gmonster-ptype-string 1000-sos -0.021910168309929 gmonster2:gmonster-runto-main 1000-sos 0.006132085113341 gmonster2:gmonster-select-file 1000-sos 0.038964767646938 Using various methods to check memory consumption (massif, smem), the average additional memory consumption this patch adds with firefox is approx 1.3%. That doesn't seem altogether too out-of-line with gdb.perf results. We can probably shave a bit off this by using a bitfield or char instead of an unsigned short, too. > Keith> +struct multidictionary > Keith> +{ > Keith> + /* An array of dictionaries, one per language. All dictionaries > Keith> + must be of the same type. This should be free'd for expandable > Keith> + dictionary types. */ > Keith> + struct dictionary **dictionaries; > Keith> + > Keith> + /* The number of language dictionaries currently allocated. > Keith> + Only used for expandable dictionaries. */ > Keith> + unsigned short n_allocated_dictionaries; > Keith> +}; > > I think this is the main possibly contentious bit of this series. Is it just the increase of memory usage introduced by this struct that causes you concern? > I wondered when reading this series why it isn't possible to just put > symbols of multiple languages into a single hash table. That would seem > to make the series as a whole much simpler: no mass renaming, no need to > store multiple dictionaries in a block, etc. > > Maybe one problem is that when searching you may only have the > searched-for symbol name; so you'd have to maybe keep a bitset of all > the languages in a dictionary and then do multiple hash lookups? But > still that seems better. Yeah, this is the main implementation problem, exemplified by iter_match_first_hashed: const language_defn *lang = DICT_LANGUAGE (dict); unsigned int hash_index = (name.search_name_hash (lang->la_language) % DICT_HASHED_NBUCKETS (dict)); symbol_name_matcher_ftype *matches_name = get_symbol_name_matcher (lang, name); /* Loop through the symbols in the given bucket, breaking when SYM first matches. If SYM never matches, it will be set to NULL; either way, we have the right return value. */ for (sym = DICT_HASHED_BUCKET (dict, hash_index); sym != NULL; sym = sym->hash_next) { /* Warning: the order of arguments to compare matters! */ if (matches_name (SYMBOL_SEARCH_NAME (sym), name, NULL)) break; } Some of this is fairly trivial to fixup to permit multiple languages in a dictionary, e.g., moving the get_symbol_name_matcher into the loop. It adds cost for the loop, but I have not attempted to quantify that. The bigger problem is computing the hash_index for the bucket we want to loop over... With no language definition associated with the dictionary, we will have to loop over /all/ buckets, computing a hash for the first symbol in the bucket until we find a matching bucket, if any. An alternative is storing the hash in each bucket to facilitate this, but that would consume even more memory. This sounded incredibly wasteful to me when I wrote this patch. Keith