Multi-threaded dwarf parsing

public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed

* Multi-threaded dwarf parsing
@ 2016-02-24  2:45 Simon Marchi
  2016-02-24 11:06 ` Pedro Alves
  0 siblings, 1 reply; 11+ messages in thread
From: Simon Marchi @ 2016-02-24  2:45 UTC (permalink / raw)
  To: gdb; +Cc: tromey

Hi all,

When debugging large programs, simply loading the binary in gdb can take 
a significant amount of time.  I was wondering if the dwarf parsing 
(building partial and/or full symtabs, I suppose) could be a good 
candidate for parallelization.  I did some quick checks to determine 
that, at least when reading from my SSD drive, the operation is not 
IO-bound.  Also, according to my limited understanding of the Dwarf 
format, it seems like the compilation units DIEs are entities that could 
be processed independently.  These two facts, if we assume they are 
true, suggest that there is a good potential for performance gain here.

I couldn't find anything on the mailing list about that, please point 
out any discussion I might have missed.

I found (and it was a very good surprise) this branch by Tom Tromey:

https://github.com/tromey/gdb/tree/threaded-dwarf-reader

According to his description (from https://github.com/tromey/gdb/wiki): 
"I think it doesn't help any real-world case".  I'd like to ask you 
directly, Tom: now that you debug Firefox (i.e. a quite large program) 
daily with gdb, are you still of the same opinion?  Of course, I'm also 
interested in what others have to say about that.  Is it something that 
would have value, you think?

Also, since not so long ago, LLDB does it.  Apparently, it "can 
drastically incrase the speed of loading debug info" (sic).  If it's 
good for LLDB, I don't see why it wouldn't be good for GDB.
Ref: http://blog.llvm.org/2015/10/llvm-weekly-95-oct-26th-2015.html

So, in a word, are there any gotchas or good reasons not do take this 
path?

Simon

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multi-threaded dwarf parsing
  2016-02-24  2:45 Multi-threaded dwarf parsing Simon Marchi
@ 2016-02-24 11:06 ` Pedro Alves
  2016-02-24 15:30   ` Tom Tromey
  0 siblings, 1 reply; 11+ messages in thread
From: Pedro Alves @ 2016-02-24 11:06 UTC (permalink / raw)
  To: Simon Marchi, gdb, Tom Tromey

[Updated Tom's address]

On 02/24/2016 02:45 AM, Simon Marchi wrote:
> Hi all,
> 
> When debugging large programs, simply loading the binary in gdb can take 
> a significant amount of time.  I was wondering if the dwarf parsing 
> (building partial and/or full symtabs, I suppose) could be a good 
> candidate for parallelization.  I did some quick checks to determine 
> that, at least when reading from my SSD drive, the operation is not 
> IO-bound.  Also, according to my limited understanding of the Dwarf 
> format, it seems like the compilation units DIEs are entities that could 
> be processed independently.  These two facts, if we assume they are 
> true, suggest that there is a good potential for performance gain here.
> 
> I couldn't find anything on the mailing list about that, please point 
> out any discussion I might have missed.
> 
> I found (and it was a very good surprise) this branch by Tom Tromey:
> 
> https://github.com/tromey/gdb/tree/threaded-dwarf-reader
> 
> According to his description (from https://github.com/tromey/gdb/wiki): 
> "I think it doesn't help any real-world case".  I'd like to ask you 
> directly, Tom: now that you debug Firefox (i.e. a quite large program) 
> daily with gdb, are you still of the same opinion?  Of course, I'm also 
> interested in what others have to say about that.  Is it something that 
> would have value, you think?

Making GDB load debug info faster, and making it take advantage of
the multiple cores in most host machines nowadays definitely adds value.

( I'd also like to get threads into GDB for other reasons, so this would
be a good trojan.  Oh, whoops, did I say that out loud? :-) )

> 
> Also, since not so long ago, LLDB does it.  Apparently, it "can 
> drastically incrase the speed of loading debug info" (sic).  If it's 
> good for LLDB, I don't see why it wouldn't be good for GDB.
> Ref: http://blog.llvm.org/2015/10/llvm-weekly-95-oct-26th-2015.html
> 
> So, in a word, are there any gotchas or good reasons not do take this 
> path?

The obvious gotchas are of course all the globals, and coming up with
fine enough locking granularity that threads actually do run in parallel.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multi-threaded dwarf parsing
  2016-02-24 11:06 ` Pedro Alves
@ 2016-02-24 15:30   ` Tom Tromey
  2016-02-24 16:43     ` Simon Marchi
  0 siblings, 1 reply; 11+ messages in thread
From: Tom Tromey @ 2016-02-24 15:30 UTC (permalink / raw)
  To: Pedro Alves; +Cc: Simon Marchi, gdb, Tom Tromey

>> According to his description (from https://github.com/tromey/gdb/wiki): 
>> "I think it doesn't help any real-world case".  I'd like to ask you 
>> directly, Tom: now that you debug Firefox (i.e. a quite large program) 
>> daily with gdb, are you still of the same opinion?  Of course, I'm also 
>> interested in what others have to say about that.  Is it something that 
>> would have value, you think?

It's been a while since I thought about that branch.

I think it helps some scenarios, but maybe not as many as you'd like.
In fact, I think it doesn't help the two of the three most typical ways
I debug Firefox.  (I realize this may not apply directly to your idea of
reading each CU independently; this is just the state of that branch.)

1. Run Firefox, then attach.

   Here it is pretty normal for the attach to interrupt Firefox
   somewhere in libxul.so -- the largest library (so much larger that it
   is the only one that causes a noticeable pause at gdb startup).

   But, it seems to me that stopping somewhere in libxul.so should
   probably cause its debuginfo to be read.

2. Start gdb, set a breakpoint, then run Firefox.

   Here debuginfo for every library must be read in order to set the
   breakpoint correctly.

The third scenario, which would be helped, is:

3. Start gdb, run Firefox, and try to reproduce a crash.  In this
   situation gdb could read the debuginfo in the background and
   everything would work nicely.

That said, I think my branch might have helped a tiny bit with scenario
#1, because it prioritized the largest files when reading debuginfo.
So, libxul.so would generally be read a bit earlier than it is now.

Reading each CU independently seems like a good idea to me.  I think it
will stumble into various problems inside gdb, but I'd guess they are
all surmountable with enough work.

I think this could help with scenario #1.  The ideal situation here
would be to read just the CU (or CUs?) covering the stop address; then
lazily read more as needed for types and such.

I suppose it could also help #2 if enough parallelism is there to be
had, though I'm a bit skeptical.

>> So, in a word, are there any gotchas or good reasons not do take this 
>> path?

Pedro> The obvious gotchas are of course all the globals, and coming up with
Pedro> fine enough locking granularity that threads actually do run in parallel.

I think the gotcha situation got worse since I wrote my patch.

Now the DWARF reader can call into the type-printing system, which it
didn't before.  It wasn't clear to me that this was safe.  ISTR there
was some other change along these lines -- the DWARF reader calling out
to some gdb module that it previously did not -- but I can't remember
what it was any more.

The DWARF reader also has many more modes (debug_types, dwz, dwo/dwp)
than it did back then.  So, this will require some careful auditing.

FWIW my threading patches were written during my time at Red Hat and so
you can use any part of that series without needing any paperwork from
me.

Tom

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multi-threaded dwarf parsing
  2016-02-24 15:30   ` Tom Tromey
@ 2016-02-24 16:43     ` Simon Marchi
  2016-02-24 19:50       ` Tom Tromey
  2016-02-24 20:25       ` Jan Kratochvil
  0 siblings, 2 replies; 11+ messages in thread
From: Simon Marchi @ 2016-02-24 16:43 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Pedro Alves, gdb

On 2016-02-24 10:30, Tom Tromey wrote:
> It's been a while since I thought about that branch.
> 
> I think it helps some scenarios, but maybe not as many as you'd like.
> In fact, I think it doesn't help the two of the three most typical ways
> I debug Firefox.  (I realize this may not apply directly to your idea 
> of
> reading each CU independently; this is just the state of that branch.)
> 
> 1. Run Firefox, then attach.
> 
>    Here it is pretty normal for the attach to interrupt Firefox
>    somewhere in libxul.so -- the largest library (so much larger that 
> it
>    is the only one that causes a noticeable pause at gdb startup).
> 
>    But, it seems to me that stopping somewhere in libxul.so should
>    probably cause its debuginfo to be read.
> 
> 2. Start gdb, set a breakpoint, then run Firefox.
> 
>    Here debuginfo for every library must be read in order to set the
>    breakpoint correctly.
> 
> 
> The third scenario, which would be helped, is:
> 
> 3. Start gdb, run Firefox, and try to reproduce a crash.  In this
>    situation gdb could read the debuginfo in the background and
>    everything would work nicely.
> 
> 
> That said, I think my branch might have helped a tiny bit with scenario
> #1, because it prioritized the largest files when reading debuginfo.
> So, libxul.so would generally be read a bit earlier than it is now.
> 
> Reading each CU independently seems like a good idea to me.  I think it
> will stumble into various problems inside gdb, but I'd guess they are
> all surmountable with enough work.

Indeed, we probably had different, but not incompatible ideas of 
"threaded".
Just to make sure I understand correctly: instead of blocking on the 
psymtabs
creation at startup (in elf_symfile_read), you offload that to worker 
threads
and carry on.  If you happen to need the information and it's not ready 
yet,
then the main code will have to block until the corresponding task is 
complete
(dwarf2_require_psymtabs).  However, in each worker thread, each objfile 
is
still processed sequentially.  So if you are waiting for libxul.so's 
debug info
to be ready (such as in #1), it won't be ready any faster.  Is that 
right?

My view of the parallelism was that when reading an objfile's debug 
info, the
main thread would offload chunks of work (a chunk == a CU) to the worker
threads, but wait for all of them to be done before continuing.  So it 
would
still be blocking on the psymtab creation, but it would block for a 
shorter
time (divided by the number of threads/cores, in an ideal world).  It's 
just
replacing a serial algorithm by a parallel one, but it would be mostly
transparent to the rest of gdb.

I hadn't thought of reading the info in the background, but I like the 
fact
that it can get the user to a prompt faster.  And I think these two 
forms of
parallelism are not mutually exclusive, we could very well read CUs in 
parallel,
in the background.

> I think this could help with scenario #1.  The ideal situation here
> would be to read just the CU (or CUs?) covering the stop address; then
> lazily read more as needed for types and such.
> 
> I suppose it could also help #2 if enough parallelism is there to be
> had, though I'm a bit skeptical.

I think that reading CUs in parallel would help pretty much any use case 
where
you are waiting for psymtabs to be created, it could reduce that wait 
time.

>>> So, in a word, are there any gotchas or good reasons not do take this
>>> path?
> 
> Pedro> The obvious gotchas are of course all the globals, and coming up 
> with
> Pedro> fine enough locking granularity that threads actually do run in 
> parallel.
> 
> I think the gotcha situation got worse since I wrote my patch.
> 
> Now the DWARF reader can call into the type-printing system, which it
> didn't before.  It wasn't clear to me that this was safe.  ISTR there
> was some other change along these lines -- the DWARF reader calling out
> to some gdb module that it previously did not -- but I can't remember
> what it was any more.
> 
> The DWARF reader also has many more modes (debug_types, dwz, dwo/dwp)
> than it did back then.  So, this will require some careful auditing.

Yes, I'm sure the reality is way more complicated than the image I have
in my head at the moment :).

> FWIW my threading patches were written during my time at Red Hat and so
> you can use any part of that series without needing any paperwork from
> me.

Great, thanks!

Simon

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multi-threaded dwarf parsing
  2016-02-24 16:43     ` Simon Marchi
@ 2016-02-24 19:50       ` Tom Tromey
  2016-02-24 20:25       ` Jan Kratochvil
  1 sibling, 0 replies; 11+ messages in thread
From: Tom Tromey @ 2016-02-24 19:50 UTC (permalink / raw)
  To: Simon Marchi; +Cc: Tom Tromey, Pedro Alves, gdb

Simon> Just to make sure I understand correctly: instead of blocking on
Simon> the psymtabs creation at startup (in elf_symfile_read), you
Simon> offload that to worker threads and carry on.  If you happen to
Simon> need the information and it's not ready yet, then the main code
Simon> will have to block until the corresponding task is complete
Simon> (dwarf2_require_psymtabs).

That's correct.

Simon> However, in each worker thread, each objfile is still processed
Simon> sequentially.  So if you are waiting for libxul.so's debug info
Simon> to be ready (such as in #1), it won't be ready any faster.  Is
Simon> that right?

Yes, each task constructs the psymtabs for an entire objfile.

Simon> My view of the parallelism was that when reading an objfile's
Simon> debug info, the main thread would offload chunks of work (a chunk
Simon> == a CU) to the worker threads, but wait for all of them to be
Simon> done before continuing.  So it would still be blocking on the
Simon> psymtab creation, but it would block for a shorter time (divided
Simon> by the number of threads/cores, in an ideal world).  It's just
Simon> replacing a serial algorithm by a parallel one, but it would be
Simon> mostly transparent to the rest of gdb.

Yeah.  This sounds doable in the abstract; though of course details
matter.  The DWARF reader has a lot of per-objfile state that would have
to be split up (ideally) or locked.  And there is stuff like buildsym.h,
which is full of globals for no good reason.

Simon> I hadn't thought of reading the info in the background, but I
Simon> like the fact that it can get the user to a prompt faster.  And I
Simon> think these two forms of parallelism are not mutually exclusive,
Simon> we could very well read CUs in parallel, in the background.

I agree.

Tom

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multi-threaded dwarf parsing
  2016-02-24 16:43     ` Simon Marchi
  2016-02-24 19:50       ` Tom Tromey
@ 2016-02-24 20:25       ` Jan Kratochvil
  2016-02-24 20:37         ` Simon Marchi
                           ` (2 more replies)
  1 sibling, 3 replies; 11+ messages in thread
From: Jan Kratochvil @ 2016-02-24 20:25 UTC (permalink / raw)
  To: Simon Marchi; +Cc: Tom Tromey, Pedro Alves, gdb

On Wed, 24 Feb 2016 17:43:03 +0100, Simon Marchi wrote:
> instead of blocking on the psymtabs creation at startup
[...]
> then the main code will have to block until the corresponding task is
> complete (dwarf2_require_psymtabs).

If really your concern are psymtabs then use Tom's .gdb_index:
	gdb/contrib/gdb-add-index.sh

With .gdb_index GDB still has startup performance problems during full CU
expansions, that is struct symtab and struct symbol.  That happens with C++
inferiors which have very interlinked CUs and thus expanding one CU means for
GDB expanding 100+ CUs due to the inter-type dependencies which cannot be left
opaque in such cases.  And as each C++ CU is usually very large...

Jan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multi-threaded dwarf parsing
  2016-02-24 20:25       ` Jan Kratochvil
@ 2016-02-24 20:37         ` Simon Marchi
  2016-02-24 21:28           ` Jan Kratochvil
  2016-02-24 21:10         ` Pedro Alves
  2016-02-25  3:31         ` Tom Tromey
  2 siblings, 1 reply; 11+ messages in thread
From: Simon Marchi @ 2016-02-24 20:37 UTC (permalink / raw)
  To: Jan Kratochvil; +Cc: Tom Tromey, Pedro Alves, gdb

On 2016-02-24 15:25, Jan Kratochvil wrote:
> On Wed, 24 Feb 2016 17:43:03 +0100, Simon Marchi wrote:
>> instead of blocking on the psymtabs creation at startup
> [...]
>> then the main code will have to block until the corresponding task is
>> complete (dwarf2_require_psymtabs).
> 
> If really your concern are psymtabs then use Tom's .gdb_index:
> 	gdb/contrib/gdb-add-index.sh
> 
> With .gdb_index GDB still has startup performance problems during full 
> CU
> expansions, that is struct symtab and struct symbol.  That happens with 
> C++
> inferiors which have very interlinked CUs and thus expanding one CU 
> means for
> GDB expanding 100+ CUs due to the inter-type dependencies which cannot 
> be left
> opaque in such cases.  And as each C++ CU is usually very large...

What can cause CUs to be interlinked with each other?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multi-threaded dwarf parsing
  2016-02-24 20:25       ` Jan Kratochvil
  2016-02-24 20:37         ` Simon Marchi
@ 2016-02-24 21:10         ` Pedro Alves
  2016-02-24 21:22           ` Jan Kratochvil
  2016-02-25  3:31         ` Tom Tromey
  2 siblings, 1 reply; 11+ messages in thread
From: Pedro Alves @ 2016-02-24 21:10 UTC (permalink / raw)
  To: Jan Kratochvil, Simon Marchi; +Cc: Tom Tromey, gdb

On 02/24/2016 08:25 PM, Jan Kratochvil wrote:
> On Wed, 24 Feb 2016 17:43:03 +0100, Simon Marchi wrote:
>> instead of blocking on the psymtabs creation at startup
> [...]
>> then the main code will have to block until the corresponding task is
>> complete (dwarf2_require_psymtabs).
> 
> If really your concern are psymtabs then use Tom's .gdb_index:
> 	gdb/contrib/gdb-add-index.sh

I think the index isn't so helpful if the big thing that takes a
while to read/load is what you're changing in a edit/compile/debug
cycle.

Also, that script actually relies on gdb to read the debug info,
intern it, and spit out the index.  So if we gdb reads dwarf faster,
then index generation itself becomes faster too.

> 
> With .gdb_index GDB still has startup performance problems during full CU
> expansions, that is struct symtab and struct symbol.  That happens with C++
> inferiors which have very interlinked CUs and thus expanding one CU means for
> GDB expanding 100+ CUs due to the inter-type dependencies which cannot be left
> opaque in such cases.  And as each C++ CU is usually very large...

Sounds like something that could be sped up by reading CUs in parallel.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multi-threaded dwarf parsing
  2016-02-24 21:10         ` Pedro Alves
@ 2016-02-24 21:22           ` Jan Kratochvil
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kratochvil @ 2016-02-24 21:22 UTC (permalink / raw)
  To: Pedro Alves; +Cc: Simon Marchi, Tom Tromey, gdb

On Wed, 24 Feb 2016 22:10:46 +0100, Pedro Alves wrote:
> On 02/24/2016 08:25 PM, Jan Kratochvil wrote:
> > If really your concern are psymtabs then use Tom's .gdb_index:
> > 	gdb/contrib/gdb-add-index.sh
> 
> I think the index isn't so helpful if the big thing that takes a
> while to read/load is what you're changing in a edit/compile/debug
> cycle.

I found it useful even during edit/compile/debug cycles.  If one modifies
an .h file the compilation step takes up to a few minutes anyway so that is
a non-interactive step.  Moreover it is done only once, one may debug it more
times then etc.


> Sounds like something that could be sped up by reading CUs in parallel.

Yes; going to discuss it in another mail.


Jan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multi-threaded dwarf parsing
  2016-02-24 20:37         ` Simon Marchi
@ 2016-02-24 21:28           ` Jan Kratochvil
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kratochvil @ 2016-02-24 21:28 UTC (permalink / raw)
  To: Simon Marchi; +Cc: Tom Tromey, Pedro Alves, gdb

On Wed, 24 Feb 2016 21:37:24 +0100, Simon Marchi wrote:
> What can cause CUs to be interlinked with each other?

I did not remember, from what I am checking now it is due to dwz:
	https://sourceware.org/git/?p=dwz.git;a=blob;f=dwz.c
That is a DWARF size reduction tool (by DWARF optimization, not by any
compression).

All the CUs get queued there due to its DW_AT_import:
	process_imported_unit_die()->maybe_queue_comp_unit()

Without dwz I could not reproduce the queueing problem.  IIRC there was some
but I admit I may not remember it right.

BTW expanding one CU is also not cheap, just its .debug_info part can be
around 1MB:
	readelf -wi libwebkitgtk-1.0.so.0.5.2.debug|grep '^ *<0>'|perl -lne 'BEGIN{$l=0;} /^\s*<0><([0-9a-f]+)>/ or die;$x=eval "0x$1";print(($x-$l)." ".$_);$l=$x;'|sort -nr
But that is a sub-second delay not much of a real problem.


Jan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multi-threaded dwarf parsing
  2016-02-24 20:25       ` Jan Kratochvil
  2016-02-24 20:37         ` Simon Marchi
  2016-02-24 21:10         ` Pedro Alves
@ 2016-02-25  3:31         ` Tom Tromey
  2 siblings, 0 replies; 11+ messages in thread
From: Tom Tromey @ 2016-02-25  3:31 UTC (permalink / raw)
  To: Jan Kratochvil; +Cc: Simon Marchi, Tom Tromey, Pedro Alves, gdb

Jan> With .gdb_index GDB still has startup performance problems during
Jan> full CU expansions, that is struct symtab and struct symbol.

My branch "lazily-read-function-bodies" addressed this issue.  It
changed CU expansion to skip reading function bodies until needed.  This
was good for a decent speedup; my notes say ~40%.  I didn't finish this
branch, though -- it still needed a bit of work to expand a function
when a by-address lookup was done.

It's possible, but harder, to go even farther than this -- that is,
unify symtabs and psymtabs and make CU expansion completely lazy.  At
one point I had a rather complicated plan for this.

For what it's worth, in my current debugging, I do notice psymtab
reading, but I never notice CU expansion.  I'm not sure if I'm just
lucky or if it's because the CU expansion problem is exacerbated by dwz,
which I'm of course not using during development.

Tom

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-02-25  3:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-24  2:45 Multi-threaded dwarf parsing Simon Marchi
2016-02-24 11:06 ` Pedro Alves
2016-02-24 15:30   ` Tom Tromey
2016-02-24 16:43     ` Simon Marchi
2016-02-24 19:50       ` Tom Tromey
2016-02-24 20:25       ` Jan Kratochvil
2016-02-24 20:37         ` Simon Marchi
2016-02-24 21:28           ` Jan Kratochvil
2016-02-24 21:10         ` Pedro Alves
2016-02-24 21:22           ` Jan Kratochvil
2016-02-25  3:31         ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).