From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 35050 invoked by alias); 2 Oct 2019 22:02:21 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 35034 invoked by uid 89); 2 Oct 2019 22:02:20 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-17.9 required=5.0 tests=AWL,BAYES_00,ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_PASS,USER_IN_DEF_SPF_WL autolearn=ham version=3.3.1 spammy=nearly, cbiesinger, shortlog X-HELO: mail-ot1-f66.google.com Received: from mail-ot1-f66.google.com (HELO mail-ot1-f66.google.com) (209.85.210.66) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 02 Oct 2019 22:02:18 +0000 Received: by mail-ot1-f66.google.com with SMTP id 21so523530otj.11 for ; Wed, 02 Oct 2019 15:02:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=jg28lLgaNa1eg4dqyt4cCPYSoGroyPpkvATf5aVBIAk=; b=Qd8oGNEyfjOZvCNqPaMEJSvfnPLLtSUYIvcjDJ/c55UMO9HveTcpCLP5ZQgtOwed1R Bs6ifNWDeY+I+4AXKsxmx9Bq41X2g2Ouv6SpQsFnFfBU2jtbNtsPHDoH3R/QWwWrvzwl LzLciLWzs4GpkXQhbW2frDabRQLwD7NLTdLnrW+izhnrKazMLi01F79Nu9Jc4I1Qq7NW GiEG/i93yOGCAXdXQvYDEkMYjm/aCHGcpk7pJi2Yqm7sOKZrZxJKEyi3GzBRLpAaJdWL 4DpoJ+OSW3V9OYiEDfn6kEWePmYwhzj9CiNpwRrRMLjvSD7CTghj2rXaGRSyOybqRRJ9 MBbg== MIME-Version: 1.0 References: <874l0tc1gl.fsf@tromey.com> <20190930165543.68106-1-cbiesinger@google.com> <87r23vxdzs.fsf@tromey.com> In-Reply-To: From: "Christian Biesinger via gdb-patches" Reply-To: Christian Biesinger Date: Wed, 02 Oct 2019 22:02:00 -0000 Message-ID: Subject: Re: [PATCH] Don't use the mutex for each symbol_set_names call To: Tom Tromey Cc: Christian Biesinger via gdb-patches Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2019-10/txt/msg00087.txt.bz2 On Wed, Oct 2, 2019 at 1:20 PM Christian Biesinger wrote: > > On Wed, Oct 2, 2019 at 12:18 PM Tom Tromey wrote: > > > > >>>>> "Christian" == Christian Biesinger via gdb-patches writes: > > > > Christian> It speeds up "attach to Chrome's content_shell binary" from 44 sec -> 30 > > Christian> sec (32%) (compared to GDB trunk), or from 37 sec compared to this > > Christian> patchset before this patch. > > > > Nice. > > I do need to redo these measurements with the latest version of the > branch and patch... Here are some new measurements on trunk (this is also with a recompiled Chrome); either trunk gdb is slower or trunk Chrome is bigger. I'll call tromey/t/parallel-minsyms-mutex "tromey" below. GDB Trunk: 49.8s tromey: 53-54s (!) tromey + my patch here: ~30.3s tromey + my patch here + https://sourceware.org/ml/gdb-patches/2019-09/msg00633.html: 24.8s (-18% compared to previous) tromey + my patch here + https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=shortlog;h=refs/heads/users/cbiesinger/minsym-hash-one-thread: 28.8s (only -5%) I repeatedly measured "tromey" because it is so slow and got consistent results. It must be due to the lock contention. > > Christian> + [&] (minimal_symbol *first, minimal_symbol* last) { > > Christian> + std::lock_guard guard (demangled_mutex); > > Christian> + for (; first < last; ++first) { > > Christian> + symbol_set_names (first, first->name, > > Christian> + strlen (first->name), 0, > > Christian> + m_objfile->per_bfd); > > Christian> + } > > Christian> + }); > > > > IIUC the idea is to separate the demangling from updating the > > demangled_names_hash. > > That's correct. > > > A couple of thoughts on that... > > > > Christian> + *slot > > Christian> + = ((struct demangled_name_entry *) > > Christian> + obstack_alloc (&per_bfd->storage_obstack, > > Christian> + offsetof (struct demangled_name_entry, demangled) > > Christian> + + len + demangled_len + 2)); > > Christian> + mangled_ptr = &((*slot)->demangled[demangled_len + 1]); > > Christian> + strcpy (mangled_ptr, linkage_name_copy); > > Christian> + (*slot)->mangled = mangled_ptr; > > > > There's no deep reason that these things have to be stored on the > > per-BFD obstack -- and requiring this means an extra copy. Instead we > > could change the hash table to use ordinary heap allocation, and I think > > this would be more efficient when demangling in worker threads, because > > we could just reuse the existing allocation. > > Yes indeed. I was actually thinking of that last night -- we could > change to a hash_set + reuse the alloc from > gdb_demangle and avoid any allocations here. https://sourceware.org/ml/gdb-patches/2019-10/msg00085.html Although now that I re-read this, I'm not sure if I understood you correctly, did you want to allocate more things with regular malloc/new? > > Also, it seems fine to serialize the calls to symbol_set_names. There's > > no need for a lock at all, then. > > True, though this way, if some threads finish faster than others it's > possible to parallelize the work a bit more. Trying this out, it seems to be about 1-1.5s slower (3-5%). However, this did make me realize that there's no real reason why the mutex should be global, so I'm going to move it inside this function. > > One idea I had was to parallelize build_minimal_symbol_hash_tables as > > well: when possible, have it run two threads, and create the hash tables > > in parallel. > > Hmm, yeah, that's a good idea. I hadn't thought of doing it quite that way. See measurements above for users/cbiesinger/minsym-hash-one-thread; it's not nearly as fast as my "Compute msymbol hash codes in parallel" patch. However I couldn't do it quite as you suggested (as mentioned in IRC, but writing it down here as well for anyone watching): - Building the demangled minsym hash table requires having the demangled names available, so it needs to happen at the very least after find_demangled_name - But it can't happen in parallel with symbol_set_names either, because that may change the pointer (to one from the hashtable) - So in practice, I can only build the msymbol_hash table on a thread, and that's the faster one (search_name_hash is slowish) > > Adding a third thread here to update the > > demangled_names_hash might help too? Maybe this approach would > > eliminate the need for your "Compute msymbol hash codes in parallel" > > patch ... the issue there being that it makes the minsyms larger. > > (Another way to handle that would be to keep the hashes in a local array > > of some kind that is discarded once the hash tables are updated.) > > The local array is a bit tricky... it needs an entry for each msymbol, > which is only known at runtime. So it can't be stack-allocated with a > fixed size, and I'm hesitant to use alloca for this unbounded size. So > it would require a heap allocation for that vector. Maybe it's still > worth it... Putting this in a vector works out! It might possibly be a couple of tenths of a second faster even. Pushed to: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=85f7818c32fdf5b9fbd24f08320c54e9f9d50b4c Actually makes me wonder if I could precompute the hash code for demangled_names_hash in a similar way... Will send an updated version of this patch in a bit. Christian