From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from qproxy6-pub.mail.unifiedlayer.com (qproxy6-pub.mail.unifiedlayer.com [69.89.23.12]) by sourceware.org (Postfix) with ESMTPS id A77BF3858425 for ; Wed, 21 Dec 2022 20:21:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A77BF3858425 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tromey.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tromey.com Received: from gproxy4-pub.mail.unifiedlayer.com (unknown [69.89.23.142]) by qproxy6.mail.unifiedlayer.com (Postfix) with ESMTP id 98118802A464 for ; Wed, 21 Dec 2022 20:20:58 +0000 (UTC) Received: from cmgw11.mail.unifiedlayer.com (unknown [10.0.90.126]) by progateway6.mail.pro1.eigbox.com (Postfix) with ESMTP id B210F10044403 for ; Wed, 21 Dec 2022 20:19:33 +0000 (UTC) Received: from box5379.bluehost.com ([162.241.216.53]) by cmsmtp with ESMTP id 85YrpWHtb3zcQ85YrpqkDJ; Wed, 21 Dec 2022 20:19:33 +0000 X-Authority-Reason: nr=8 X-Authority-Analysis: v=2.4 cv=AMrqNt3M c=1 sm=1 tr=0 ts=63a36a55 a=ApxJNpeYhEAb1aAlGBBbmA==:117 a=ApxJNpeYhEAb1aAlGBBbmA==:17 a=dLZJa+xiwSxG16/P+YVxDGlgEgI=:19 a=sHyYjHe8cH0A:10:nop_rcvd_month_year a=Qbun_eYptAEA:10:endurance_base64_authed_username_1 a=CCpqsmhAAAAA:8 a=y40FQMpuAMk0lmLTJOcA:9 a=ul9cdbp4aOFLsgKbc677:22 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tromey.com; s=default; h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date:References :Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=KqIJjXi1zx5NODc94JlfcxQVGO9wAPuLVyfwFJP1wkI=; b=uBugpkOSvJyLXAb3JzUb1RMJOs dMezDq8++lFAWJnanLlNIAqu+zd38iK/7wo0jWv0w/Mn4x5l5zU6WYe3JtuxrJfYF2QOJk/Z9UASd ZsV/oRE1nx8AWQPd/cb1wrven; Received: from 97-122-76-186.hlrn.qwest.net ([97.122.76.186]:57550 helo=murgatroyd) by box5379.bluehost.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1p85Yr-001cAe-7h; Wed, 21 Dec 2022 13:19:33 -0700 From: Tom Tromey To: Matheus Branco Borella via Gdb-patches Cc: Matheus Branco Borella Subject: Re: Parallelization of dwarf2 symtab parsing References: X-Attribution: Tom Date: Wed, 21 Dec 2022 13:19:30 -0700 In-Reply-To: (Matheus Branco Borella via Gdb-patches's message of "Wed, 21 Dec 2022 15:46:59 -0300") Message-ID: <87bknwik6l.fsf@tromey.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - box5379.bluehost.com X-AntiAbuse: Original Domain - sourceware.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - tromey.com X-BWhitelist: no X-Source-IP: 97.122.76.186 X-Source-L: No X-Exim-ID: 1p85Yr-001cAe-7h X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: 97-122-76-186.hlrn.qwest.net (murgatroyd) [97.122.76.186]:57550 X-Source-Auth: tom+tromey.com X-Email-Count: 1 X-Source-Cap: ZWx5bnJvYmk7ZWx5bnJvYmk7Ym94NTM3OS5ibHVlaG9zdC5jb20= X-Local-Domain: yes X-Spam-Status: No, score=-3022.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,JMQ_SPF_NEUTRAL,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: >>>>> "Matt" == Matheus Branco Borella via Gdb-patches writes: Matt> I was wondering if there's anything design-wise blocking running the calls to Matt> `dw2_expand_symtabs_matching_one` made in Matt> `cooked_index_functions::expand_symtabs_matching` Matt> in parallel, using a similar approach to what building the psymtabs currently Matt> uses. I think it's reasonably difficult, though it might be possible. Matt> Doing that naively, though, there are a few trouble variables and structures Matt> that cause data races. The ones I've found are: Yeah, the biggest problem has to do with data races. In particular gdb has some maybe non-obvious spots that can interfere. First, choosing symbol names in the DWARF reader (the "full" reader, not the indexer) uses a per-objfile bcache: if (need_copy) retval = objfile->intern (retval); (The code to compute symbol names is a horrible mess, I wish we could clean that up...) Also any spot in the DWARF reader referring to the objfile obstack is a potential data race. According to M-x occur there are 43 of these. These can probably be changed to some kind of sharded obstack to avoid interference. There may also be problems with the buildsym code; and of course when a symtab is actually installed, that must be made thread-safe. I tried to make buildsym less reliant on global variables a while ago, though I don't recall if that was a 100% success. (Note there's a "legacy" API that uses globals, but the DWARF reader avoids this.) Matt> (4) per_objfile->m_dwarf2_cus, and Matt> (4), though, is where I've hit a bit of a nag. Since it's a global registry of Matt> CUs, all threads must have a shared and coherent view of it. Using a mutex to Matt> lock the map can be done in one of two ways, both with equally undesirable Matt> results: ... Matt> - We analyze the graph of which CUs cause other CUs to get loaded and run Matt> the parallel batches in topological order. Matt> Assuming that we can even know that info. ahead of time, this approach Matt> would still be the most intrusive to the code and the most complex to Matt> pull off. Matt> - We introduce a type that represents a CU that is currently being loaded, Matt> but that hasn't finished loading quite yet, for threads that need that CU Matt> to await on. Basically, I think you just want to make sure that a given CU is only parsed by a single thread. It's better to arrange things to avoid doing any extra work here. I don't think this should be super hard to do, for example you can have a map that is locked only on use that maps from the dwarf2_per_cu_data to a std::future that will hold the resulting symtab or whatever. Then, when multiple CUs refer to some other CU, whichever thread gets to that included CU first will win. I think the data structures should be set up such that these can be stitched together again after parsing. It's probably fine to just do this in a post-pass that waits for all the futures and then sets up the inclusions. The cooked indexer uses a strategy like this. Also note that cross-CU references are relatively rare, basically only occurring when 'dwz' has been used. So this situation isn't extremely important for normal users. Matt> (5) is conceptually pretty simple to understand, but fairly complex to solve. We Matt> can model the CU cache cleanup functionality as a form of age-based garbage Matt> collection that we're trying to parallelize. And there's plenty of fun to be Matt> had by going down that particular rabbit hole. :^) We have some evidence that this cache is just not very good: https://sourceware.org/bugzilla/show_bug.cgi?id=25703 Changing it radically to work in some other way seems totally fine. Like, I think if the DIEs from some CU are ever needed for a cross-CU reference, then just keeping those around for the duration of the current "expansion operation" is fine. Matt> So I'd like to hear your thoughts and opinions on this. If there's anything I've Matt> missed or got wrong, please let me know. A different idea is to try to somewhat merge the partial (cooked index) DWARF reader and the full reader, and do lazy expansion of CUs. This has the same end goal -- speed up CU expansion. https://sourceware.org/bugzilla/show_bug.cgi?id=29398 The idea here is that, even when some data from a CU is needed (like a type, or if gdb stops in some function in the CU), frequently the rest of the data there is not needed. So, reading it all is needlessly slow. There's a second idea here as well, which is unifying the top-level names between the index and the symtab -- occasionally we've had divergences here, which result in weird bugs. The latter could be done without full laziness, though, by using the index to build the symtab but then filling in the details using the current code. The big benefit of lazy expansion is that it would be much faster for the pathologically large CUs that do sometimes appear (whereas with parallel reading, a really big CU will still be slow). The main drawback is that it's more complicated, so there's more chance for bugs, and the bugs will be harder to understand when they do occur. Tom