Parallelization of dwarf2 symtab parsing

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

From: Matheus Branco Borella <mbrla@usp.br>
To: gdb-patches@sourceware.org
Subject: Parallelization of dwarf2 symtab parsing
Date: Wed, 21 Dec 2022 15:46:59 -0300	[thread overview]
Message-ID: <CA+o+VZsWx7MivHVQG5ccrOLwikDM5YXuMY6wQ70xfvG=P_h8Pw@mail.gmail.com> (raw)

I was wondering if there's anything design-wise blocking running the calls to
`dw2_expand_symtabs_matching_one` made in
`cooked_index_functions::expand_symtabs_matching`
in parallel, using a similar approach to what building the psymtabs currently
uses. I tried my hand at it and there doesn't seem to be anything insurmountable
about it, though I haven't been able to get it to work yet.

Doing that naively, though, there are a few trouble variables and structures
that cause data races. The ones I've found are:
    (1) per_objfile->queue,
(2) per_objfile->sym_cu,
(3) per_bfd->dwp_checked,
(4) per_objfile->m_dwarf2_cus, and
(5) the CU cache.

From what I can see:

(1) and (2) can easily be made safe by just making them thread local, seeing as
going down from `dw2_expand_symtabs_matching_one`, they get built fresh and are
only used in the context of that call, before being reset again when it exits.

(3) can also similarly be easily made safe by having it be a `std::once_flag`,
and loading up the DWP in a call to `std::run_once`. Since the flag was only
ever set once to mark the load, and never reset.

(4), though, is where I've hit a bit of a nag. Since it's a global registry of
CUs, all threads must have a shared and coherent view of it. Using a mutex to
lock the map can be done in one of two ways, both with equally undesirable
results:
- We can lock the mutex for the whole lifetime of the parsing of a new CU,
 only unlocking it after the call to `per_objfile->set_cu()`. Obviously,
 the problem with this approach is that it stalls other threads trying to
 parse new CUs.
- We can lock the mutex for just the duration of `per_objfile->set_cu()` and
 let threads parse all CUs as they come in. Problem with this is that there
 will be some amount of rework in the form of multiple calls to
 `load_full_comp_unit`. As, for a given CU 'X', there can now be a window
 of time where a given thread is midway through `load_full_comp_unit(X)`,
 and, because it hasn't called `per_objfile->set_cu(X)` yet, another thread
 can try to load the same CU by calling `load_full_comp_unit(X)` a second
 time, as, from its perspective `per_objfile->get_cu(X) == nullptr`.
As far as solutions to this go, I see three of them:
- We weaken the assertion in `per_objfile->set_cu()` so that, instead of
 checking for whether it exists in the objfile, it instead checks for
 whether the objects are sufficiently similar (for some definition of
 sufficiently similar), before either discarding the new or old CU.
  I'm not familiar with how the lifetimes of these objects are managed, so
I can't say how good of an option this is. Though the possible discarding
issue could maybe be solved by altering the return type and having the
caller give up on its object if someone else has already beaten them to
parsing it faster. This would be the simplest solution to implement.
- We analyze the graph of which CUs cause other CUs to get loaded and run
 the parallel batches in topological order.
   Assuming that we can even know that info. ahead of time, this approach
would still be the most intrusive to the code and the most complex to
pull off.
- We introduce a type that represents a CU that is currently being loaded,
 but that hasn't finished loading quite yet, for threads that need that CU
 to await on.
   This avoids the rework inherent to the first solution, and the need for
dependency info. ahead of time inherent to the second. But also
has the potential to be slower than either of them, seeing as, in the
worst case scenario, some threads can stall multiple times waiting for
CUs to load.

(5) is conceptually pretty simple to understand, but fairly complex to solve. We
can model the CU cache cleanup functionality as a form of age-based garbage
collection that we're trying to parallelize. And there's plenty of fun to be
had by going down that particular rabbit hole. :^)

So I'd like to hear your thoughts and opinions on this. If there's anything I've
missed or got wrong, please let me know.

Thanks,
Matt.

next             reply	other threads:[~2022-12-21 18:47 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-21 18:46 Matheus Branco Borella [this message]
2022-12-21 20:19 ` Tom Tromey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+o+VZsWx7MivHVQG5ccrOLwikDM5YXuMY6wQ70xfvG=P_h8Pw@mail.gmail.com' \
    --to=mbrla@usp.br \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).