From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=A3ee=4T=tromey.com=tom@sourceware.org>
Received: from qproxy6-pub.mail.unifiedlayer.com (qproxy6-pub.mail.unifiedlayer.com [69.89.23.12])
	by sourceware.org (Postfix) with ESMTPS id A77BF3858425
	for <gdb-patches@sourceware.org>; Wed, 21 Dec 2022 20:21:01 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A77BF3858425
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tromey.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tromey.com
Received: from gproxy4-pub.mail.unifiedlayer.com (unknown [69.89.23.142])
	by qproxy6.mail.unifiedlayer.com (Postfix) with ESMTP id 98118802A464
	for <gdb-patches@sourceware.org>; Wed, 21 Dec 2022 20:20:58 +0000 (UTC)
Received: from cmgw11.mail.unifiedlayer.com (unknown [10.0.90.126])
	by progateway6.mail.pro1.eigbox.com (Postfix) with ESMTP id B210F10044403
	for <gdb-patches@sourceware.org>; Wed, 21 Dec 2022 20:19:33 +0000 (UTC)
Received: from box5379.bluehost.com ([162.241.216.53])
	by cmsmtp with ESMTP
	id 85YrpWHtb3zcQ85YrpqkDJ; Wed, 21 Dec 2022 20:19:33 +0000
X-Authority-Reason: nr=8
X-Authority-Analysis: v=2.4 cv=AMrqNt3M c=1 sm=1 tr=0 ts=63a36a55
 a=ApxJNpeYhEAb1aAlGBBbmA==:117 a=ApxJNpeYhEAb1aAlGBBbmA==:17
 a=dLZJa+xiwSxG16/P+YVxDGlgEgI=:19 a=sHyYjHe8cH0A:10:nop_rcvd_month_year
 a=Qbun_eYptAEA:10:endurance_base64_authed_username_1 a=CCpqsmhAAAAA:8
 a=y40FQMpuAMk0lmLTJOcA:9 a=ul9cdbp4aOFLsgKbc677:22
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tromey.com;
	s=default; h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date:References
	:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID:
	Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
	:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe:
	List-Post:List-Owner:List-Archive;
	bh=KqIJjXi1zx5NODc94JlfcxQVGO9wAPuLVyfwFJP1wkI=; b=uBugpkOSvJyLXAb3JzUb1RMJOs
	dMezDq8++lFAWJnanLlNIAqu+zd38iK/7wo0jWv0w/Mn4x5l5zU6WYe3JtuxrJfYF2QOJk/Z9UASd
	ZsV/oRE1nx8AWQPd/cb1wrven;
Received: from 97-122-76-186.hlrn.qwest.net ([97.122.76.186]:57550 helo=murgatroyd)
	by box5379.bluehost.com with esmtpsa  (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.95)
	(envelope-from <tom@tromey.com>)
	id 1p85Yr-001cAe-7h;
	Wed, 21 Dec 2022 13:19:33 -0700
From: Tom Tromey <tom@tromey.com>
To: Matheus Branco Borella via Gdb-patches <gdb-patches@sourceware.org>
Cc: Matheus Branco Borella <mbrla@usp.br>
Subject: Re: Parallelization of dwarf2 symtab parsing
References: <CA+o+VZsWx7MivHVQG5ccrOLwikDM5YXuMY6wQ70xfvG=P_h8Pw@mail.gmail.com>
X-Attribution:  Tom
Date: Wed, 21 Dec 2022 13:19:30 -0700
In-Reply-To: <CA+o+VZsWx7MivHVQG5ccrOLwikDM5YXuMY6wQ70xfvG=P_h8Pw@mail.gmail.com>
	(Matheus Branco Borella via Gdb-patches's message of "Wed, 21 Dec 2022
	15:46:59 -0300")
Message-ID: <87bknwik6l.fsf@tromey.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - box5379.bluehost.com
X-AntiAbuse: Original Domain - sourceware.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - tromey.com
X-BWhitelist: no
X-Source-IP: 97.122.76.186
X-Source-L: No
X-Exim-ID: 1p85Yr-001cAe-7h
X-Source: 
X-Source-Args: 
X-Source-Dir: 
X-Source-Sender: 97-122-76-186.hlrn.qwest.net (murgatroyd) [97.122.76.186]:57550
X-Source-Auth: tom+tromey.com
X-Email-Count: 1
X-Source-Cap: ZWx5bnJvYmk7ZWx5bnJvYmk7Ym94NTM3OS5ibHVlaG9zdC5jb20=
X-Local-Domain: yes
X-Spam-Status: No, score=-3022.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,JMQ_SPF_NEUTRAL,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gdb-patches.sourceware.org>

>>>>> "Matt" == Matheus Branco Borella via Gdb-patches <gdb-patches@sourceware.org> writes:

Matt> I was wondering if there's anything design-wise blocking running the calls to
Matt> `dw2_expand_symtabs_matching_one` made in
Matt> `cooked_index_functions::expand_symtabs_matching`
Matt> in parallel, using a similar approach to what building the psymtabs currently
Matt> uses.

I think it's reasonably difficult, though it might be possible.

Matt> Doing that naively, though, there are a few trouble variables and structures
Matt> that cause data races. The ones I've found are:

Yeah, the biggest problem has to do with data races.  In particular gdb
has some maybe non-obvious spots that can interfere.

First, choosing symbol names in the DWARF reader (the "full" reader, not
the indexer) uses a per-objfile bcache:

  if (need_copy)
    retval = objfile->intern (retval);

(The code to compute symbol names is a horrible mess, I wish we could
clean that up...)

Also any spot in the DWARF reader referring to the objfile obstack is a
potential data race.  According to M-x occur there are 43 of these.
These can probably be changed to some kind of sharded obstack to avoid
interference.

There may also be problems with the buildsym code; and of course when a
symtab is actually installed, that must be made thread-safe.  I tried to
make buildsym less reliant on global variables a while ago, though I
don't recall if that was a 100% success.  (Note there's a "legacy" API
that uses globals, but the DWARF reader avoids this.)

Matt> (4) per_objfile->m_dwarf2_cus, and

Matt> (4), though, is where I've hit a bit of a nag. Since it's a global registry of
Matt> CUs, all threads must have a shared and coherent view of it. Using a mutex to
Matt> lock the map can be done in one of two ways, both with equally undesirable
Matt> results:
...
Matt> - We analyze the graph of which CUs cause other CUs to get loaded and run
Matt>  the parallel batches in topological order.
Matt>    Assuming that we can even know that info. ahead of time, this approach
Matt> would still be the most intrusive to the code and the most complex to
Matt> pull off.
Matt> - We introduce a type that represents a CU that is currently being loaded,
Matt>  but that hasn't finished loading quite yet, for threads that need that CU
Matt>  to await on.

Basically, I think you just want to make sure that a given CU is only
parsed by a single thread.  It's better to arrange things to avoid doing
any extra work here.  I don't think this should be super hard to do, for
example you can have a map that is locked only on use that maps from the
dwarf2_per_cu_data to a std::future that will hold the resulting symtab
or whatever.

Then, when multiple CUs refer to some other CU, whichever thread gets to
that included CU first will win.  I think the data structures should be
set up such that these can be stitched together again after parsing.
It's probably fine to just do this in a post-pass that waits for all the
futures and then sets up the inclusions.

The cooked indexer uses a strategy like this.

Also note that cross-CU references are relatively rare, basically only
occurring when 'dwz' has been used.  So this situation isn't extremely
important for normal users.

Matt> (5) is conceptually pretty simple to understand, but fairly complex to solve. We
Matt> can model the CU cache cleanup functionality as a form of age-based garbage
Matt> collection that we're trying to parallelize. And there's plenty of fun to be
Matt> had by going down that particular rabbit hole. :^)

We have some evidence that this cache is just not very good:

    https://sourceware.org/bugzilla/show_bug.cgi?id=25703

Changing it radically to work in some other way seems totally fine.
Like, I think if the DIEs from some CU are ever needed for a cross-CU
reference, then just keeping those around for the duration of the
current "expansion operation" is fine.

Matt> So I'd like to hear your thoughts and opinions on this. If there's anything I've
Matt> missed or got wrong, please let me know.

A different idea is to try to somewhat merge the partial (cooked index)
DWARF reader and the full reader, and do lazy expansion of CUs.  This
has the same end goal -- speed up CU expansion.

    https://sourceware.org/bugzilla/show_bug.cgi?id=29398

The idea here is that, even when some data from a CU is needed (like a
type, or if gdb stops in some function in the CU), frequently the rest
of the data there is not needed.  So, reading it all is needlessly slow.

There's a second idea here as well, which is unifying the top-level
names between the index and the symtab -- occasionally we've had
divergences here, which result in weird bugs.

The latter could be done without full laziness, though, by using the
index to build the symtab but then filling in the details using the
current code.

The big benefit of lazy expansion is that it would be much faster for
the pathologically large CUs that do sometimes appear (whereas with
parallel reading, a really big CU will still be slow).  The main
drawback is that it's more complicated, so there's more chance for bugs,
and the bugs will be harder to understand when they do occur.

Tom