From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-152864-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 4968 invoked by alias); 18 Dec 2018 17:28:35 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Received: (qmail 4953 invoked by uid 89); 18 Dec 2018 17:28:34 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-4.4 required=5.0 tests=BAYES_00,GIT_PATCH_1,KAM_LAZY_DOMAIN_SECURITY,KAM_STOCKGEN,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=percentage, apologize, dict_language, la_language
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 18 Dec 2018 17:28:32 +0000
Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14])	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id 0A71F81DEA;	Tue, 18 Dec 2018 17:28:31 +0000 (UTC)
Received: from theo.uglyboxes.com (ovpn04.gateway.prod.ext.phx2.redhat.com [10.5.9.4])	by smtp.corp.redhat.com (Postfix) with ESMTPS id C463F5D9CA;	Tue, 18 Dec 2018 17:28:30 +0000 (UTC)
Subject: Re: [PATCH 1/5] gdb/23712: Introduce multidictionary's
To: Tom Tromey <tom@tromey.com>
Cc: gdb-patches@sourceware.org
References: <20181109022847.32049-1-keiths@redhat.com> <87lg4p4d92.fsf@tromey.com>
From: Keith Seitz <keiths@redhat.com>
Message-ID: <36e34a33-55bb-07aa-13f4-70b83f5026bf@redhat.com>
Date: Tue, 18 Dec 2018 17:28:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1
MIME-Version: 1.0
In-Reply-To: <87lg4p4d92.fsf@tromey.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
X-SW-Source: 2018-12/txt/msg00206.txt.bz2

Hi, Tom,

On 12/16/18 9:01 AM, Tom Tromey wrote:
> 
> I wonder a bit more about the memory overhead, though.

I see that I didn't include this information in my original submission.
I apologize for that because, of course, our own gdb.perf has this
information.

For completeness, I've copied the vmsize results that I collected during
my original testing.

Difference to unpatched master (at the time of submission -- multiply
by 100 for percentage), vmsize:

gmonster1:gmonster-null-lookup	10000-cus 0.00132458225987
gmonster1:gmonster-pervasive-typedef	vmsize	10000-cus -0.000998352718015
gmonster1:gmonster-print-cerr	     10000-cus -0.018685850806828
gmonster1:gmonster-ptype-string	     10000-cus  0.005297872904029
gmonster1:gmonster-runto-main	     10000-cus  0.018689093578962
gmonster1:gmonster-select-file	     10000-cus  0.007689710164794
gmonster2:gmonster-null-lookup	     1000-sos   0.009959564169472
gmonster2:gmonster-pervasive-typedef 1000-sos	0.000
gmonster2:gmonster-print-cerr	     1000-sos   0.002044195506858
gmonster2:gmonster-ptype-string	     1000-sos  -0.021910168309929
gmonster2:gmonster-runto-main	     1000-sos   0.006132085113341
gmonster2:gmonster-select-file	     1000-sos   0.038964767646938

Using various methods to check memory consumption (massif, smem), the average
additional memory consumption this patch adds with firefox is approx 1.3%.
That doesn't seem altogether too out-of-line with gdb.perf results. We can
probably shave a bit off this by using a bitfield or char instead of an
unsigned short, too.

> Keith> +struct multidictionary
> Keith> +{
> Keith> +  /* An array of dictionaries, one per language.  All dictionaries
> Keith> +     must be of the same type.  This should be free'd for expandable
> Keith> +     dictionary types.  */
> Keith> +  struct dictionary **dictionaries;
> Keith> +
> Keith> +  /* The number of language dictionaries currently allocated.
> Keith> +     Only used for expandable dictionaries.  */
> Keith> +  unsigned short n_allocated_dictionaries;
> Keith> +};
> 
> I think this is the main possibly contentious bit of this series.

Is it just the increase of memory usage introduced by this struct that causes you
concern?

> I wondered when reading this series why it isn't possible to just put
> symbols of multiple languages into a single hash table.  That would seem
> to make the series as a whole much simpler: no mass renaming, no need to
> store multiple dictionaries in a block, etc.
> 
> Maybe one problem is that when searching you may only have the
> searched-for symbol name; so you'd have to maybe keep a bitset of all
> the languages in a dictionary and then do multiple hash lookups?  But
> still that seems better.

Yeah, this is the main implementation problem, exemplified by
iter_match_first_hashed:

  const language_defn *lang = DICT_LANGUAGE (dict);
  unsigned int hash_index = (name.search_name_hash (lang->la_language)
			     % DICT_HASHED_NBUCKETS (dict));
  symbol_name_matcher_ftype *matches_name
    = get_symbol_name_matcher (lang, name);

  /* Loop through the symbols in the given bucket, breaking when SYM
     first matches.  If SYM never matches, it will be set to NULL;
     either way, we have the right return value.  */
  
  for (sym = DICT_HASHED_BUCKET (dict, hash_index);
       sym != NULL;
       sym = sym->hash_next)
    {
      /* Warning: the order of arguments to compare matters!  */
      if (matches_name (SYMBOL_SEARCH_NAME (sym), name, NULL))
	break;
    }

Some of this is fairly trivial to fixup to permit multiple languages in
a dictionary, e.g., moving the get_symbol_name_matcher into the loop. It
adds cost for the loop, but I have not attempted to quantify that.

The bigger problem is computing the hash_index for the bucket we want to
loop over... With no language definition associated with the dictionary,
we will have to loop over /all/ buckets, computing a hash for the first
symbol in the bucket until we find a matching bucket, if any. An alternative
is storing the hash in each bucket to facilitate this, but that would consume
even more memory.

This sounded incredibly wasteful to me when I wrote this patch.

Keith