From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vs1-xe32.google.com (mail-vs1-xe32.google.com [IPv6:2607:f8b0:4864:20::e32]) by sourceware.org (Postfix) with ESMTPS id 985F53858D1E for ; Wed, 21 Dec 2022 18:47:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 985F53858D1E Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=usp.br Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=usp.br Received: by mail-vs1-xe32.google.com with SMTP id k185so15650220vsc.2 for ; Wed, 21 Dec 2022 10:47:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp.br; s=usp-google; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=l2Q9iuMAV6by/Wre8TSMsTUiVolFsB8fClahn6IQp78=; b=Zg854ZZfa0l6Ot8gPuHkAYPbiLY9aCVkLjBPOjow7d4L2o4GI5Nv6xQSvPmSuRxtkV 6ByEQYrZd/WfsIkcB1/+UoA1O7A2U055JS2NM9hCfyw6/gwqfzBfLlfu2xm4jNYdNwbQ D1bh77SB8PPb0eOeyYB1IJT2tfoeBrEikf0Ha2CCYDSal2qgkUW4VKbAf4yGKy1F+fio XTFmOmsNbDTWTpB5j/sC6hXz4m5kjbxchfmFaeqDcKXz9TEweaPhYBa+IRE1JRHlQ4f7 dGsPdlKzZY0aUlxtqFe4E2kv6pNYWg6rgStj59s9f3CnUevXCVevc1Y6EAQKJPI2kW0S 1dzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=l2Q9iuMAV6by/Wre8TSMsTUiVolFsB8fClahn6IQp78=; b=vRTJLRJMcmFwviX7iLoI+dssb1ccyR7kNmcE6xQFWQ+BdGSYWlux2o2Vto2hlXPUIB psGd1FeBZW7iaFYFOu6n7Ku7LkU+LTeGTQ1Ma+ORbZ0ZING1L6aOV6Ec71ZwK5pqfuse +4VKqKjUHZzKfiFx/Dhg9odukiy2JqZCKu3+AB4btdRcHwAAfc7ZhJr5U5Qhsp1lVPYJ TDph1xcTzsuoCwjc1EEH/sywGkXUAZPbSWeN8s7Nf07cZ7EuzO7C6i1qpj1pRkecnmfU NmjRcseW9jnUh9OZBJYBwQI2+KT1izhzU83BN9m2JnrJFxgw8teJESUnCw1Vrp0JhinT yM9A== X-Gm-Message-State: AFqh2kq92QDe0VqvqbnwRMIxAo13bFtH94Wc4j8xYIgLmpO7OAxodzET mnhEd3NNVUF3NCy6Gg769/h5vbs72xt825so5eS1jLhMRFc14adVMcw= X-Google-Smtp-Source: AMrXdXsJiu3iFKE0S0lfvzq+kDlrY9rbRXjQpiJC9+5r8qDB+jaVtrhLhL0z8jXykkbua65fxhyS/OzhgE3fAwb/g9o= X-Received: by 2002:a05:6102:2389:b0:3b1:352b:b22a with SMTP id v9-20020a056102238900b003b1352bb22amr348714vsr.40.1671648430651; Wed, 21 Dec 2022 10:47:10 -0800 (PST) MIME-Version: 1.0 From: Matheus Branco Borella Date: Wed, 21 Dec 2022 15:46:59 -0300 Message-ID: Subject: Parallelization of dwarf2 symtab parsing To: gdb-patches@sourceware.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: I was wondering if there's anything design-wise blocking running the calls to `dw2_expand_symtabs_matching_one` made in `cooked_index_functions::expand_symtabs_matching` in parallel, using a similar approach to what building the psymtabs currently uses. I tried my hand at it and there doesn't seem to be anything insurmountable about it, though I haven't been able to get it to work yet. Doing that naively, though, there are a few trouble variables and structures that cause data races. The ones I've found are: (1) per_objfile->queue, (2) per_objfile->sym_cu, (3) per_bfd->dwp_checked, (4) per_objfile->m_dwarf2_cus, and (5) the CU cache. >From what I can see: (1) and (2) can easily be made safe by just making them thread local, seeing as going down from `dw2_expand_symtabs_matching_one`, they get built fresh and are only used in the context of that call, before being reset again when it exits. (3) can also similarly be easily made safe by having it be a `std::once_flag`, and loading up the DWP in a call to `std::run_once`. Since the flag was only ever set once to mark the load, and never reset. (4), though, is where I've hit a bit of a nag. Since it's a global registry of CUs, all threads must have a shared and coherent view of it. Using a mutex to lock the map can be done in one of two ways, both with equally undesirable results: - We can lock the mutex for the whole lifetime of the parsing of a new CU, only unlocking it after the call to `per_objfile->set_cu()`. Obviously, the problem with this approach is that it stalls other threads trying to parse new CUs. - We can lock the mutex for just the duration of `per_objfile->set_cu()` and let threads parse all CUs as they come in. Problem with this is that there will be some amount of rework in the form of multiple calls to `load_full_comp_unit`. As, for a given CU 'X', there can now be a window of time where a given thread is midway through `load_full_comp_unit(X)`, and, because it hasn't called `per_objfile->set_cu(X)` yet, another thread can try to load the same CU by calling `load_full_comp_unit(X)` a second time, as, from its perspective `per_objfile->get_cu(X) == nullptr`. As far as solutions to this go, I see three of them: - We weaken the assertion in `per_objfile->set_cu()` so that, instead of checking for whether it exists in the objfile, it instead checks for whether the objects are sufficiently similar (for some definition of sufficiently similar), before either discarding the new or old CU. I'm not familiar with how the lifetimes of these objects are managed, so I can't say how good of an option this is. Though the possible discarding issue could maybe be solved by altering the return type and having the caller give up on its object if someone else has already beaten them to parsing it faster. This would be the simplest solution to implement. - We analyze the graph of which CUs cause other CUs to get loaded and run the parallel batches in topological order. Assuming that we can even know that info. ahead of time, this approach would still be the most intrusive to the code and the most complex to pull off. - We introduce a type that represents a CU that is currently being loaded, but that hasn't finished loading quite yet, for threads that need that CU to await on. This avoids the rework inherent to the first solution, and the need for dependency info. ahead of time inherent to the second. But also has the potential to be slower than either of them, seeing as, in the worst case scenario, some threads can stall multiple times waiting for CUs to load. (5) is conceptually pretty simple to understand, but fairly complex to solve. We can model the CU cache cleanup functionality as a form of age-based garbage collection that we're trying to parallelize. And there's plenty of fun to be had by going down that particular rabbit hole. :^) So I'd like to hear your thoughts and opinions on this. If there's anything I've missed or got wrong, please let me know. Thanks, Matt.