Re: Generating gdb index at link time

public inbox for archer@sourceware.org
 help / color / mirror / Atom feed

From: Cary Coutant <ccoutant@google.com>
To: Tom Tromey <tromey@redhat.com>, Project Archer <archer@sourceware.org>
Cc: Dodji Seketeli <dodji@redhat.com>,
	Sterling Augustine <saugustine@google.com>
Subject: Re: Generating gdb index at link time
Date: Fri, 05 Aug 2011 20:30:00 -0000	[thread overview]
Message-ID: <CAHACq4qhGTg7ShS-eoMHq1oDmoBj=GC=Bx06knVq-+XTUN4ohw@mail.gmail.com> (raw)
In-Reply-To: <CAHACq4poC0QkGowdHkg0_Y1FRmyXTZZDeoBYrUf91nxz9SJtQw@mail.gmail.com>

> [ opening up to the archer mailing list -- the beginning of the
> conversation is copied below ]

Oops, really adding archer this time! Sorry for the dup.

> [ also adding Sterling, who will be doing the GCC changes ]
>
>
> On Fri, Aug 5, 2011 at 12:04 PM, Tom Tromey <tromey@redhat.com> wrote:
>> Cary> No problem at all. I considered sending this to the archer mailing
>> Cary> list, but figured I'd just start with you and Dodji. Would that be
>> Cary> what you were thinking of by "public", or something more like the gcc
>> Cary> list?
>>
>> I was thinking archer or gdb.
>>
>> Tom> This problem was that .debug_pub* are explicitly for public names, but
>> Tom> for GDB we wanted to see all names (so that "break staticfunction" works
>> Tom> and "break typo" doesn't cause reading all CUs).  This particular
>> Tom> problem was resolved by you :) saying that we should just change GCC to
>> Tom> emit all the names.
>>
>> Cary> No one has actually changed GCC to do that, yet, though -- right?
>>
>> I think either Dodji or I had a patch at one point.  It may have been
>> part of Dodji's patch to change .debug_pub* to also include all the
>> extra info we thought we needed at the time.
>>
>> I'm not sure if it is still around.
>
> If you can find it, we'd definitely be interested in it.
>
>> Tom> The other issue with .debug_pub* was that I implemented a reader for it
>> Tom> in gdb and it was too slow.  It may be possible to fix this -- I don't
>> Tom> think I tried lazily reading it.
>>
>> Cary> OK, that actually answers a question I posed a while back -- having a
>> Cary> complete hashed symbol table does help, compared to just having a
>> Cary> usable implementation of pubnames/pubtypes.
>>
>> Yeah.  Though, again IIRC, one problem was that we had to do name
>> canonicalization while reading.
>>
>> Tom> I think at the time I was concerned that the arange was incorrect.  I
>> Tom> did find some arange bugs in the wild (all since fixed) when we first
>> Tom> used that code.  Also I think I had a concern, probably theoretical,
>> Tom> that since aranges is optional it might not appear -- but it seems to me
>> Tom> now that this could be accounted for at index creation time.
>>
>> Cary> In gold, I plan to check for DW_AT_ranges or DW_AT_{low,high}_pc in
>> Cary> the compile_unit DIE, as I'm doing now. That much of parsing the
>> Cary> debug_info isn't that expensive.
>>
>> You may want to research this change a little, just to make sure your
>> plan is compatible with what gdb expects:
>>
>>    http://sourceware.org/bugzilla/show_bug.cgi?id=12302
>>    http://sourceware.org/ml/gdb-patches/2010-12/msg00093.html
>>
>> I'm not sure what test case inspired this.
>
> Thanks, I'll check with dje when he gets back on Monday. I think
> gold's current behavior is compatible with this, but I'll check to
> make sure.
>
>> Cary> I think the biggest problem is an issue I've seen where the DIE for a
>> Cary> class isn't properly nested inside the right context (even after
>> Cary> following DW_AT_specification). There's an ugly workaround for this in
>> Cary> gdb (by looking for a member subprogram with a linkage name, and
>> Cary> extracting the class name from that), which I copied into gold. As
>> Cary> part of this work, I'd like to fix that bug in GCC, and any others
>> Cary> like it.
>>
>> Yeah.  We've tried to file and/or fix problems like this as we've run
>> across them.  There are still some open bugs around naming.
>>
>> Tom
>
> Thanks,
>
> -cary
>
>
>
> ---------- Forwarded message ----------
> From: Cary Coutant <ccoutant@google.com>
> Date: Thu, Aug 4, 2011 at 6:10 PM
> Subject: Generating gdb index at link time
> To: Tom Tromey <tromey@redhat.com>, Dodji Seketeli <dodji@redhat.com>
>
>
> I talked to Dodji in London in June and I believe I mentioned that I
> was working on adding gdb index support in the gold linker. I did
> finish that, but I've failed miserably so far at making it perform
> decently -- it more than doubles the link time for a large C++ binary.
> Logically, it shouldn't take any more time than it would take to load
> the non-indexed file in gdb and save the gdb index from there, but it
> does, and I've been struggling to fix that. It's still bad enough that
> I haven't yet bothered to submit the patch upstream. It does, however,
> have a significant effect on gdb startup time, as I'm sure you know,
> so we definitely want to have it.
>
> So I've been thinking about faster ways of generating the index...
>
> I'm planning to fix gcc to emit all the names we want to see in
> .debug_pubnames and .debug_pubtypes (extending .debug_pubtypes to
> include type units as well), then have gold generate the gdb index
> from those sections and .debug_aranges, plus what ought to be a *very*
> fast pass over the compilation unit headers in .debug_info. Gold could
> then discard the .debug_pub* sections from the final output.
>
> In reviewing some old email discussions, I noted a couple of things.
> One of your original complaints about .debug_pubnames was that it
> didn't distinguish between function and variable, or between typedef,
> enum, struct, and class. But the gdb index format does not do that
> either. Is this something you discovered later was not actually
> necessary? Likewise, I believe you also mentioned something early on
> about wanting to distinguish between private and public names, but no
> such distinction is made in the gdb index.
>
> Did you ever reach any conclusion on the value of .debug_aranges vs.
> the address table in the gdb index? I saw a gcc patch go by to add an
> empty range for compilation units that had no code so that you could
> distinguish between an empty arange and a missing arange. Did that
> resolve the problem with .debug_aranges? Were there other issues?
>
> Any comments, objections, advice, ...?
>
> -cary
>
>
>
>
> ---------- Forwarded message ----------
> From: Tom Tromey <tromey@redhat.com>
> Date: Fri, Aug 5, 2011 at 7:05 AM
> Subject: Re: Generating gdb index at link time
> To: Cary Coutant <ccoutant@google.com>
> Cc: Dodji Seketeli <dodji@redhat.com>
>
>
> Cary> I talked to Dodji in London in June and I believe I mentioned that I
> Cary> was working on adding gdb index support in the gold linker.
>
> Nice.
>
> It may be better to have this discussion in public; but at least I'd
> like to CC Jan and Keith, if you don't mind, as they can shed some light
> on a couple of issues.
>
> Cary> I did finish that, but I've failed miserably so far at making it
> Cary> perform decently -- it more than doubles the link time for a large
> Cary> C++ binary.
>
> I'm surprised by this as well.
>
> Cary> One of your original complaints about .debug_pubnames was that it
> Cary> didn't distinguish between function and variable, or between typedef,
> Cary> enum, struct, and class. But the gdb index format does not do that
> Cary> either. Is this something you discovered later was not actually
> Cary> necessary?
>
> Yeah; I tried it without this information, just having gdb expand all
> matching CUs, and aside from an issue with type lookup it performed
> acceptable.  We fixed the type lookup thing without affecting the index,
> though I don't remember the details right now.
>
> Cary> Likewise, I believe you also mentioned something early on
> Cary> about wanting to distinguish between private and public names, but no
> Cary> such distinction is made in the gdb index.
>
> This problem was that .debug_pub* are explicitly for public names, but
> for GDB we wanted to see all names (so that "break staticfunction" works
> and "break typo" doesn't cause reading all CUs).  This particular
> problem was resolved by you :) saying that we should just change GCC to
> emit all the names.
>
> The other issue with .debug_pub* was that I implemented a reader for it
> in gdb and it was too slow.  It may be possible to fix this -- I don't
> think I tried lazily reading it.
>
> Cary> Did you ever reach any conclusion on the value of .debug_aranges vs.
> Cary> the address table in the gdb index? I saw a gcc patch go by to add an
> Cary> empty range for compilation units that had no code so that you could
> Cary> distinguish between an empty arange and a missing arange. Did that
> Cary> resolve the problem with .debug_aranges? Were there other issues?
>
> Putting the range info into the index may have been a mistake.
> I am not sure why I did this :(
>
> I still have arange-reading code on a branch.  We could resurrect this
> and drop this information from the index.
>
> There was some bug fix involving the address ranges emitted in the
> index, but I don't recall details now.
>
> I think at the time I was concerned that the arange was incorrect.  I
> did find some arange bugs in the wild (all since fixed) when we first
> used that code.  Also I think I had a concern, probably theoretical,
> that since aranges is optional it might not appear -- but it seems to me
> now that this could be accounted for at index creation time.
>
> Cary> Any comments, objections, advice, ...?
>
> I think the biggest difficulty is in C++ name canonicalization.
>
> My understanding (Jan and Keith are the experts here) is that DW_AT_name
> does not always agree with the demangler; but one also cannot always
> rely on the demangler because not all entities are given a
> DW_linkage_name (perhaps fixed in newer GCC versions?  Dodji would
> know, or anyway it is in bugzilla).
>
> We have a second canonicalization step (a bunch of code in
> cp-name-parser.y) that we run on the demangled names.  I'm not 100% sure
> this is still necessary.  I think this code is reasonably
> self-contained; but maybe slow.
>
> I think it is worth considering changes to the index, even radical ones,
> if it would make your solution better.  The index is purely an ad hoc
> invention and should, IMO, be considered as mutable as any other piece.
>
> Tom
>
>
>
>
> ---------- Forwarded message ----------
> From: Cary Coutant <ccoutant@google.com>
> Date: Fri, Aug 5, 2011 at 11:10 AM
> Subject: Re: Generating gdb index at link time
> To: Tom Tromey <tromey@redhat.com>
> Cc: Dodji Seketeli <dodji@redhat.com>
>
>
>> It may be better to have this discussion in public; but at least I'd
>> like to CC Jan and Keith, if you don't mind, as they can shed some light
>> on a couple of issues.
>
> No problem at all. I considered sending this to the archer mailing
> list, but figured I'd just start with you and Dodji. Would that be
> what you were thinking of by "public", or something more like the gcc
> list?
>
>> Cary> I did finish that, but I've failed miserably so far at making it
>> Cary> perform decently -- it more than doubles the link time for a large
>> Cary> C++ binary.
>>
>> I'm surprised by this as well.
>
> Yeah, I found a few poor implementation choices (and a stupid bug
> where I scanned the section table for the .debug_str section for every
> string reference, even when I already knew the section index from the
> relocation) that made big improvements when fixed, but it's still
> spending a lot of time tracking the relocations, and a lot of time
> parsing DIEs and computing canonical names that I don't need simply
> because I *might* need them to follow a DW_AT_specification or
> DW_AT_abstract_origin reference.
>
>> Cary> Likewise, I believe you also mentioned something early on
>> Cary> about wanting to distinguish between private and public names, but no
>> Cary> such distinction is made in the gdb index.
>>
>> This problem was that .debug_pub* are explicitly for public names, but
>> for GDB we wanted to see all names (so that "break staticfunction" works
>> and "break typo" doesn't cause reading all CUs).  This particular
>> problem was resolved by you :) saying that we should just change GCC to
>> emit all the names.
>
> No one has actually changed GCC to do that, yet, though -- right?
>
>> The other issue with .debug_pub* was that I implemented a reader for it
>> in gdb and it was too slow.  It may be possible to fix this -- I don't
>> think I tried lazily reading it.
>
> OK, that actually answers a question I posed a while back -- having a
> complete hashed symbol table does help, compared to just having a
> usable implementation of pubnames/pubtypes.
>
>> Cary> Did you ever reach any conclusion on the value of .debug_aranges vs.
>> Cary> the address table in the gdb index? I saw a gcc patch go by to add an
>> Cary> empty range for compilation units that had no code so that you could
>> Cary> distinguish between an empty arange and a missing arange. Did that
>> Cary> resolve the problem with .debug_aranges? Were there other issues?
>>
>> Putting the range info into the index may have been a mistake.
>> I am not sure why I did this :(
>>
>> I still have arange-reading code on a branch.  We could resurrect this
>> and drop this information from the index.
>
> I may be interested in doing that at a later point, but I think I'll
> start by using it as you've designed it.
>
>> I think at the time I was concerned that the arange was incorrect.  I
>> did find some arange bugs in the wild (all since fixed) when we first
>> used that code.  Also I think I had a concern, probably theoretical,
>> that since aranges is optional it might not appear -- but it seems to me
>> now that this could be accounted for at index creation time.
>
> In gold, I plan to check for DW_AT_ranges or DW_AT_{low,high}_pc in
> the compile_unit DIE, as I'm doing now. That much of parsing the
> debug_info isn't that expensive.
>
>> I think the biggest difficulty is in C++ name canonicalization.
>>
>> My understanding (Jan and Keith are the experts here) is that DW_AT_name
>> does not always agree with the demangler; but one also cannot always
>> rely on the demangler because not all entities are given a
>> DW_linkage_name (perhaps fixed in newer GCC versions?  Dodji would
>> know, or anyway it is in bugzilla).
>
> I think the biggest problem is an issue I've seen where the DIE for a
> class isn't properly nested inside the right context (even after
> following DW_AT_specification). There's an ugly workaround for this in
> gdb (by looking for a member subprogram with a linkage name, and
> extracting the class name from that), which I copied into gold. As
> part of this work, I'd like to fix that bug in GCC, and any others
> like it.
>
>> I think it is worth considering changes to the index, even radical ones,
>> if it would make your solution better.  The index is purely an ad hoc
>> invention and should, IMO, be considered as mutable as any other piece.
>
> Definitely something to pursue, after I get it working as is.
>
> Thanks for doing all this in the first place, and for your help!
>
> -cary
>

next      parent reply	other threads:[~2011-08-05 20:30 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAHACq4oHymXfBOqzJfyzQXNyW5PYkN6g65X6x1rMU+YmJybZmQ@mail.gmail.com>
     [not found] ` <m3d3gkx8ar.fsf@fleche.redhat.com>
     [not found]   ` <CAHACq4ofyavZBt4y65OfYSoeW--HeEAT=sz86urLC0MXBB0hnA@mail.gmail.com>
     [not found]     ` <m3d3gjwuho.fsf@fleche.redhat.com>
     [not found]       ` <CAHACq4poC0QkGowdHkg0_Y1FRmyXTZZDeoBYrUf91nxz9SJtQw@mail.gmail.com>
2011-08-05 20:30         ` Cary Coutant [this message]
2011-08-06 11:17           ` Dodji Seketeli
2011-08-10 20:12           ` Daniel Jacobowitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHACq4qhGTg7ShS-eoMHq1oDmoBj=GC=Bx06knVq-+XTUN4ohw@mail.gmail.com' \
    --to=ccoutant@google.com \
    --cc=archer@sourceware.org \
    --cc=dodji@redhat.com \
    --cc=saugustine@google.com \
    --cc=tromey@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).