public inbox for archer@sourceware.org
 help / color / mirror / Atom feed
* Re: Generating gdb index at link time
       [not found]       ` <CAHACq4poC0QkGowdHkg0_Y1FRmyXTZZDeoBYrUf91nxz9SJtQw@mail.gmail.com>
@ 2011-08-05 20:30         ` Cary Coutant
  2011-08-06 11:17           ` Dodji Seketeli
  2011-08-10 20:12           ` Daniel Jacobowitz
  0 siblings, 2 replies; 3+ messages in thread
From: Cary Coutant @ 2011-08-05 20:30 UTC (permalink / raw)
  To: Tom Tromey, Project Archer; +Cc: Dodji Seketeli, Sterling Augustine

> [ opening up to the archer mailing list -- the beginning of the
> conversation is copied below ]

Oops, really adding archer this time! Sorry for the dup.

> [ also adding Sterling, who will be doing the GCC changes ]
>
>
> On Fri, Aug 5, 2011 at 12:04 PM, Tom Tromey <tromey@redhat.com> wrote:
>> Cary> No problem at all. I considered sending this to the archer mailing
>> Cary> list, but figured I'd just start with you and Dodji. Would that be
>> Cary> what you were thinking of by "public", or something more like the gcc
>> Cary> list?
>>
>> I was thinking archer or gdb.
>>
>> Tom> This problem was that .debug_pub* are explicitly for public names, but
>> Tom> for GDB we wanted to see all names (so that "break staticfunction" works
>> Tom> and "break typo" doesn't cause reading all CUs).  This particular
>> Tom> problem was resolved by you :) saying that we should just change GCC to
>> Tom> emit all the names.
>>
>> Cary> No one has actually changed GCC to do that, yet, though -- right?
>>
>> I think either Dodji or I had a patch at one point.  It may have been
>> part of Dodji's patch to change .debug_pub* to also include all the
>> extra info we thought we needed at the time.
>>
>> I'm not sure if it is still around.
>
> If you can find it, we'd definitely be interested in it.
>
>> Tom> The other issue with .debug_pub* was that I implemented a reader for it
>> Tom> in gdb and it was too slow.  It may be possible to fix this -- I don't
>> Tom> think I tried lazily reading it.
>>
>> Cary> OK, that actually answers a question I posed a while back -- having a
>> Cary> complete hashed symbol table does help, compared to just having a
>> Cary> usable implementation of pubnames/pubtypes.
>>
>> Yeah.  Though, again IIRC, one problem was that we had to do name
>> canonicalization while reading.
>>
>> Tom> I think at the time I was concerned that the arange was incorrect.  I
>> Tom> did find some arange bugs in the wild (all since fixed) when we first
>> Tom> used that code.  Also I think I had a concern, probably theoretical,
>> Tom> that since aranges is optional it might not appear -- but it seems to me
>> Tom> now that this could be accounted for at index creation time.
>>
>> Cary> In gold, I plan to check for DW_AT_ranges or DW_AT_{low,high}_pc in
>> Cary> the compile_unit DIE, as I'm doing now. That much of parsing the
>> Cary> debug_info isn't that expensive.
>>
>> You may want to research this change a little, just to make sure your
>> plan is compatible with what gdb expects:
>>
>>    http://sourceware.org/bugzilla/show_bug.cgi?id=12302
>>    http://sourceware.org/ml/gdb-patches/2010-12/msg00093.html
>>
>> I'm not sure what test case inspired this.
>
> Thanks, I'll check with dje when he gets back on Monday. I think
> gold's current behavior is compatible with this, but I'll check to
> make sure.
>
>> Cary> I think the biggest problem is an issue I've seen where the DIE for a
>> Cary> class isn't properly nested inside the right context (even after
>> Cary> following DW_AT_specification). There's an ugly workaround for this in
>> Cary> gdb (by looking for a member subprogram with a linkage name, and
>> Cary> extracting the class name from that), which I copied into gold. As
>> Cary> part of this work, I'd like to fix that bug in GCC, and any others
>> Cary> like it.
>>
>> Yeah.  We've tried to file and/or fix problems like this as we've run
>> across them.  There are still some open bugs around naming.
>>
>> Tom
>
> Thanks,
>
> -cary
>
>
>
> ---------- Forwarded message ----------
> From: Cary Coutant <ccoutant@google.com>
> Date: Thu, Aug 4, 2011 at 6:10 PM
> Subject: Generating gdb index at link time
> To: Tom Tromey <tromey@redhat.com>, Dodji Seketeli <dodji@redhat.com>
>
>
> I talked to Dodji in London in June and I believe I mentioned that I
> was working on adding gdb index support in the gold linker. I did
> finish that, but I've failed miserably so far at making it perform
> decently -- it more than doubles the link time for a large C++ binary.
> Logically, it shouldn't take any more time than it would take to load
> the non-indexed file in gdb and save the gdb index from there, but it
> does, and I've been struggling to fix that. It's still bad enough that
> I haven't yet bothered to submit the patch upstream. It does, however,
> have a significant effect on gdb startup time, as I'm sure you know,
> so we definitely want to have it.
>
> So I've been thinking about faster ways of generating the index...
>
> I'm planning to fix gcc to emit all the names we want to see in
> .debug_pubnames and .debug_pubtypes (extending .debug_pubtypes to
> include type units as well), then have gold generate the gdb index
> from those sections and .debug_aranges, plus what ought to be a *very*
> fast pass over the compilation unit headers in .debug_info. Gold could
> then discard the .debug_pub* sections from the final output.
>
> In reviewing some old email discussions, I noted a couple of things.
> One of your original complaints about .debug_pubnames was that it
> didn't distinguish between function and variable, or between typedef,
> enum, struct, and class. But the gdb index format does not do that
> either. Is this something you discovered later was not actually
> necessary? Likewise, I believe you also mentioned something early on
> about wanting to distinguish between private and public names, but no
> such distinction is made in the gdb index.
>
> Did you ever reach any conclusion on the value of .debug_aranges vs.
> the address table in the gdb index? I saw a gcc patch go by to add an
> empty range for compilation units that had no code so that you could
> distinguish between an empty arange and a missing arange. Did that
> resolve the problem with .debug_aranges? Were there other issues?
>
> Any comments, objections, advice, ...?
>
> -cary
>
>
>
>
> ---------- Forwarded message ----------
> From: Tom Tromey <tromey@redhat.com>
> Date: Fri, Aug 5, 2011 at 7:05 AM
> Subject: Re: Generating gdb index at link time
> To: Cary Coutant <ccoutant@google.com>
> Cc: Dodji Seketeli <dodji@redhat.com>
>
>
> Cary> I talked to Dodji in London in June and I believe I mentioned that I
> Cary> was working on adding gdb index support in the gold linker.
>
> Nice.
>
> It may be better to have this discussion in public; but at least I'd
> like to CC Jan and Keith, if you don't mind, as they can shed some light
> on a couple of issues.
>
> Cary> I did finish that, but I've failed miserably so far at making it
> Cary> perform decently -- it more than doubles the link time for a large
> Cary> C++ binary.
>
> I'm surprised by this as well.
>
> Cary> One of your original complaints about .debug_pubnames was that it
> Cary> didn't distinguish between function and variable, or between typedef,
> Cary> enum, struct, and class. But the gdb index format does not do that
> Cary> either. Is this something you discovered later was not actually
> Cary> necessary?
>
> Yeah; I tried it without this information, just having gdb expand all
> matching CUs, and aside from an issue with type lookup it performed
> acceptable.  We fixed the type lookup thing without affecting the index,
> though I don't remember the details right now.
>
> Cary> Likewise, I believe you also mentioned something early on
> Cary> about wanting to distinguish between private and public names, but no
> Cary> such distinction is made in the gdb index.
>
> This problem was that .debug_pub* are explicitly for public names, but
> for GDB we wanted to see all names (so that "break staticfunction" works
> and "break typo" doesn't cause reading all CUs).  This particular
> problem was resolved by you :) saying that we should just change GCC to
> emit all the names.
>
> The other issue with .debug_pub* was that I implemented a reader for it
> in gdb and it was too slow.  It may be possible to fix this -- I don't
> think I tried lazily reading it.
>
> Cary> Did you ever reach any conclusion on the value of .debug_aranges vs.
> Cary> the address table in the gdb index? I saw a gcc patch go by to add an
> Cary> empty range for compilation units that had no code so that you could
> Cary> distinguish between an empty arange and a missing arange. Did that
> Cary> resolve the problem with .debug_aranges? Were there other issues?
>
> Putting the range info into the index may have been a mistake.
> I am not sure why I did this :(
>
> I still have arange-reading code on a branch.  We could resurrect this
> and drop this information from the index.
>
> There was some bug fix involving the address ranges emitted in the
> index, but I don't recall details now.
>
> I think at the time I was concerned that the arange was incorrect.  I
> did find some arange bugs in the wild (all since fixed) when we first
> used that code.  Also I think I had a concern, probably theoretical,
> that since aranges is optional it might not appear -- but it seems to me
> now that this could be accounted for at index creation time.
>
> Cary> Any comments, objections, advice, ...?
>
> I think the biggest difficulty is in C++ name canonicalization.
>
> My understanding (Jan and Keith are the experts here) is that DW_AT_name
> does not always agree with the demangler; but one also cannot always
> rely on the demangler because not all entities are given a
> DW_linkage_name (perhaps fixed in newer GCC versions?  Dodji would
> know, or anyway it is in bugzilla).
>
> We have a second canonicalization step (a bunch of code in
> cp-name-parser.y) that we run on the demangled names.  I'm not 100% sure
> this is still necessary.  I think this code is reasonably
> self-contained; but maybe slow.
>
> I think it is worth considering changes to the index, even radical ones,
> if it would make your solution better.  The index is purely an ad hoc
> invention and should, IMO, be considered as mutable as any other piece.
>
> Tom
>
>
>
>
> ---------- Forwarded message ----------
> From: Cary Coutant <ccoutant@google.com>
> Date: Fri, Aug 5, 2011 at 11:10 AM
> Subject: Re: Generating gdb index at link time
> To: Tom Tromey <tromey@redhat.com>
> Cc: Dodji Seketeli <dodji@redhat.com>
>
>
>> It may be better to have this discussion in public; but at least I'd
>> like to CC Jan and Keith, if you don't mind, as they can shed some light
>> on a couple of issues.
>
> No problem at all. I considered sending this to the archer mailing
> list, but figured I'd just start with you and Dodji. Would that be
> what you were thinking of by "public", or something more like the gcc
> list?
>
>> Cary> I did finish that, but I've failed miserably so far at making it
>> Cary> perform decently -- it more than doubles the link time for a large
>> Cary> C++ binary.
>>
>> I'm surprised by this as well.
>
> Yeah, I found a few poor implementation choices (and a stupid bug
> where I scanned the section table for the .debug_str section for every
> string reference, even when I already knew the section index from the
> relocation) that made big improvements when fixed, but it's still
> spending a lot of time tracking the relocations, and a lot of time
> parsing DIEs and computing canonical names that I don't need simply
> because I *might* need them to follow a DW_AT_specification or
> DW_AT_abstract_origin reference.
>
>> Cary> Likewise, I believe you also mentioned something early on
>> Cary> about wanting to distinguish between private and public names, but no
>> Cary> such distinction is made in the gdb index.
>>
>> This problem was that .debug_pub* are explicitly for public names, but
>> for GDB we wanted to see all names (so that "break staticfunction" works
>> and "break typo" doesn't cause reading all CUs).  This particular
>> problem was resolved by you :) saying that we should just change GCC to
>> emit all the names.
>
> No one has actually changed GCC to do that, yet, though -- right?
>
>> The other issue with .debug_pub* was that I implemented a reader for it
>> in gdb and it was too slow.  It may be possible to fix this -- I don't
>> think I tried lazily reading it.
>
> OK, that actually answers a question I posed a while back -- having a
> complete hashed symbol table does help, compared to just having a
> usable implementation of pubnames/pubtypes.
>
>> Cary> Did you ever reach any conclusion on the value of .debug_aranges vs.
>> Cary> the address table in the gdb index? I saw a gcc patch go by to add an
>> Cary> empty range for compilation units that had no code so that you could
>> Cary> distinguish between an empty arange and a missing arange. Did that
>> Cary> resolve the problem with .debug_aranges? Were there other issues?
>>
>> Putting the range info into the index may have been a mistake.
>> I am not sure why I did this :(
>>
>> I still have arange-reading code on a branch.  We could resurrect this
>> and drop this information from the index.
>
> I may be interested in doing that at a later point, but I think I'll
> start by using it as you've designed it.
>
>> I think at the time I was concerned that the arange was incorrect.  I
>> did find some arange bugs in the wild (all since fixed) when we first
>> used that code.  Also I think I had a concern, probably theoretical,
>> that since aranges is optional it might not appear -- but it seems to me
>> now that this could be accounted for at index creation time.
>
> In gold, I plan to check for DW_AT_ranges or DW_AT_{low,high}_pc in
> the compile_unit DIE, as I'm doing now. That much of parsing the
> debug_info isn't that expensive.
>
>> I think the biggest difficulty is in C++ name canonicalization.
>>
>> My understanding (Jan and Keith are the experts here) is that DW_AT_name
>> does not always agree with the demangler; but one also cannot always
>> rely on the demangler because not all entities are given a
>> DW_linkage_name (perhaps fixed in newer GCC versions?  Dodji would
>> know, or anyway it is in bugzilla).
>
> I think the biggest problem is an issue I've seen where the DIE for a
> class isn't properly nested inside the right context (even after
> following DW_AT_specification). There's an ugly workaround for this in
> gdb (by looking for a member subprogram with a linkage name, and
> extracting the class name from that), which I copied into gold. As
> part of this work, I'd like to fix that bug in GCC, and any others
> like it.
>
>> I think it is worth considering changes to the index, even radical ones,
>> if it would make your solution better.  The index is purely an ad hoc
>> invention and should, IMO, be considered as mutable as any other piece.
>
> Definitely something to pursue, after I get it working as is.
>
> Thanks for doing all this in the first place, and for your help!
>
> -cary
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Generating gdb index at link time
  2011-08-05 20:30         ` Generating gdb index at link time Cary Coutant
@ 2011-08-06 11:17           ` Dodji Seketeli
  2011-08-10 20:12           ` Daniel Jacobowitz
  1 sibling, 0 replies; 3+ messages in thread
From: Dodji Seketeli @ 2011-08-06 11:17 UTC (permalink / raw)
  To: Cary Coutant; +Cc: Tom Tromey, Project Archer, Sterling Augustine

Hello,

Cary Coutant <ccoutant@google.com> writes:

>>> Tom> This problem was that .debug_pub* are explicitly for public names, but
>>> Tom> for GDB we wanted to see all names (so that "break staticfunction" works
>>> Tom> and "break typo" doesn't cause reading all CUs).  This particular
>>> Tom> problem was resolved by you :) saying that we should just change GCC to
>>> Tom> emit all the names.
>>>
>>> Cary> No one has actually changed GCC to do that, yet, though -- right?
>>>
>>> I think either Dodji or I had a patch at one point.  It may have been
>>> part of Dodji's patch to change .debug_pub* to also include all the
>>> extra info we thought we needed at the time.
>>>
>>> I'm not sure if it is still around.
>>
>> If you can find it, we'd definitely be interested in it.

I stored the iterations of that patch as attachments to
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41130 before dropping the
ball on this issue.

[...]

>>
>>> Cary> I think the biggest problem is an issue I've seen where the DIE for a
>>> Cary> class isn't properly nested inside the right context (even after
>>> Cary> following DW_AT_specification). There's an ugly workaround for this in
>>> Cary> gdb (by looking for a member subprogram with a linkage name, and
>>> Cary> extracting the class name from that), which I copied into gold. As
>>> Cary> part of this work, I'd like to fix that bug in GCC, and any others
>>> Cary> like it.
>>>
>>> Yeah.  We've tried to file and/or fix problems like this as we've run
>>> across them.  There are still some open bugs around naming.
>>>

I'd be interested in looking into these nesting bugs when I have some
free cycles.  Please CC me on these.

Thanks.

-- 
		Dodji

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Generating gdb index at link time
  2011-08-05 20:30         ` Generating gdb index at link time Cary Coutant
  2011-08-06 11:17           ` Dodji Seketeli
@ 2011-08-10 20:12           ` Daniel Jacobowitz
  1 sibling, 0 replies; 3+ messages in thread
From: Daniel Jacobowitz @ 2011-08-10 20:12 UTC (permalink / raw)
  To: Cary Coutant
  Cc: Tom Tromey, Project Archer, Dodji Seketeli, Sterling Augustine

On Fri, Aug 5, 2011 at 4:30 PM, Cary Coutant <ccoutant@google.com> wrote:
>> Cary> Any comments, objections, advice, ...?
>>
>> I think the biggest difficulty is in C++ name canonicalization.
>>
>> My understanding (Jan and Keith are the experts here) is that DW_AT_name
>> does not always agree with the demangler; but one also cannot always
>> rely on the demangler because not all entities are given a
>> DW_linkage_name (perhaps fixed in newer GCC versions?  Dodji would
>> know, or anyway it is in bugzilla).
>>
>> We have a second canonicalization step (a bunch of code in
>> cp-name-parser.y) that we run on the demangled names.  I'm not 100% sure
>> this is still necessary.  I think this code is reasonably
>> self-contained; but maybe slow.
>>
>> I think it is worth considering changes to the index, even radical ones,
>> if it would make your solution better.  The index is purely an ad hoc
>> invention and should, IMO, be considered as mutable as any other piece.

You may all know most of this already, but just in case, here's a bit
of history.

The purpose of name canonicalization is to accept different spellings
of the same C++ name and be able to reliably and efficiently look up
the symbol for the canonical spelling of the name.  It is still
necessary, even with GCC.

Tom is right; DW_AT_name can not be trusted.  That's true across
multiple compilers, not just GCC, but GCC is a particularly egregious
offender.  There are some cosmetic differences, like spacing, and some
more significant differences like whether typedefs are expanded.

It is definitely slow.  I spent a long time speeding it up, but it's
still a significant chunk of startup time (a couple percent? don't
remember).

It's really important that the index use the same canonicalization as
GDB.  If it doesn't, we will fail to look up symbols where there's a
difference.  It would be nice to have some robust tests for this;
maybe a flag where GDB checks that all names in the index are
canonical, so we can run the testsuite that way?  That makes me a
little nervous about skew between GDB and Gold.

Not all entities have a linkage name because there are entities which
don't appear in the output.  Types, for instance.  Plus the abstract
copy of the two constructor versions, that's a historic trouble spot.

-- 
Thanks,
Daniel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-08-10 20:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAHACq4oHymXfBOqzJfyzQXNyW5PYkN6g65X6x1rMU+YmJybZmQ@mail.gmail.com>
     [not found] ` <m3d3gkx8ar.fsf@fleche.redhat.com>
     [not found]   ` <CAHACq4ofyavZBt4y65OfYSoeW--HeEAT=sz86urLC0MXBB0hnA@mail.gmail.com>
     [not found]     ` <m3d3gjwuho.fsf@fleche.redhat.com>
     [not found]       ` <CAHACq4poC0QkGowdHkg0_Y1FRmyXTZZDeoBYrUf91nxz9SJtQw@mail.gmail.com>
2011-08-05 20:30         ` Generating gdb index at link time Cary Coutant
2011-08-06 11:17           ` Dodji Seketeli
2011-08-10 20:12           ` Daniel Jacobowitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).