Re: Regression handling linkage vs source name

public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed

From: Doug Evans <dje@google.com>
To: Michael Eager <eager@eagercon.com>
Cc: gdb@sourceware.org, Jan Kratochvil <jan.kratochvil@redhat.com>
Subject: Re: Regression handling linkage vs source name
Date: Thu, 23 Feb 2012 03:08:00 -0000	[thread overview]
Message-ID: <CADPb22TuctsUr2F4rRJ0RS4HUaB_aU_NG=T96QFbPzo8+MC=6g@mail.gmail.com> (raw)
In-Reply-To: <4F454188.7000904@eagercon.com>

On Wed, Feb 22, 2012 at 11:27 AM, Michael Eager <eager@eagercon.com> wrote:
> On 02/22/2012 08:40 AM, Doug Evans wrote:
>>
>> On Tue, Feb 21, 2012 at 4:41 PM, Michael Eager<eager@eagercon.com>  wrote:
>>>
>>> There were changes to dwarf2_physname() around July 1, 2011, which
>>> cause it to return a different value.  Arguably, the changes make
>>> the function work better, returning the linkage name for a symbol where
>>> it previously returned the source name.
>>
>>
>> Tangential data point: Outside of dwarf2read.c, gdb generally uses
>> "physname" to mean linkage name.
>
>
> It doesn't appear to be the case for the value returned by
> dwarf2_compute_name().  I haven't looked at other places.

I'm not sure what your point is here, so I'm going to set it aside for now.
[My comment (tangential data point) still stands though.]

>
>> I'm not sure dwarf2_physname ever returned the linkage name (for c/c++).
>> I know there's been some changes in how it computes the source name.
>
>
> Before the patch, it didn't for C.  I didn't look at C++.

Ah.  Indeed.
Fun fun fun.

>>> But since the source name
>>> is overwritten by the linkage name in the symbol entry, gdb's
>>> behavior is different, and, IMO, incorrect.
>>
>>
>> Can you elaborate on what you mean by overwritten?
>
>
> In new_symbol_full():
>      /* Cache this symbol's name and the name's demangled form (if any).  */
>      SYMBOL_SET_LANGUAGE (sym, cu->language);
>      linkagename = dwarf2_physname (name, die, cu);
>      SYMBOL_SET_NAMES (sym, linkagename, strlen (linkagename), 0, objfile);
>
> Clearly the comment is incorrect or misleading:  only one name is stored in
> the symbol.

Well, yes and no.
Only one name is passed to SYMBOL_SET_NAMES, but it does at least try
to set both the mangled(/linkage) and demangled(/source-with-caveats)
names.

>
> Before the patch, dwarf2_physname() would call dwarf2_compute_name() which
> would return 'name' unmodified.
>
> After the patch, dwarf2_physname() returns the linkage name or a mangled
> name.

Right.  This was done to address potential shortcomings in c++-land.
IIUC, We totally missed gcc's asm(name) feature.  Blech.

> (I initially misread the code to save the source name in the symbol,
> then overwrite it with the linkage name.)
>
>
>> [It's not what I see.]
>>
>>> Here is the test case:
>>>
>>> $ cat t.c
>>> extern int foo(void) asm ("xfoo");
>>>
>>> int foo(void) {}
>>>
>>> int main (void)
>>> {
>>>  foo ();
>>> }
>>>
>>> $ gdb-before a.out
>>> ...
>>> (gdb) b foo
>>> Breakpoint 1 at ...
>>>
>>> $ gdb-after a.out
>>> ...
>>> (gdb) b foo
>>> Make breakpoint pending on future shared library load? (y or [n])? n
>>> (gdb) b xfoo
>>> Breakpoint 1 at ...
>>>
>>>
>>> The symbol "foo" is no longer present in gdb symbol table
>>> so trying to use the name specified in the source fails.
>>> Before the change, breakpoint, backtrace, and other commands
>>> display the symbol name as in the source (and in the DWARF DIE).
>>> After the change, the linkage name is displayed instead of the
>>> source name.
>>
>>
>> Even pre-dwarf2_physname gdb's have this problem.
>> Maybe you tried a 7.x variant that I didn't (I think I tried 6.8, 7.0,
>> 7.1, and 7.4 - at the time I didn't have easy access to 7.2,7.3 and
>> didn't want to go to trouble of building them).
>
>
> This works correctly in 7.2.  Maybe this was an "accident".  :-)

I'm not sure, though I think it was at least partly accidental.

>
>> The problem (or at least one of the problems), as I see it, is that
>> the API for creating symbols is broken.
>
>
> I agree.
>
>
>> gdb does (or at least can) record both the linkage and source names
>> (it does this for "minsyms", e.g. ELF symbols).  But the way to set a
>> symbol's name is via symbol_set_names and it only takes a linkage name
>> (though the dwarf2read.c further compounds the problem by passing it
>> the source name - or more accurately the "search" name).
>> symbol_set_names then tries to demangle the name and will record that
>> name as well if the demangling succeeds.
>
>
> Well, the minisym only stores a single name, not both linkage and source
> names.  Despite what the comments preceding symbol_set_names() says,
> there's only one name saved.

Sorry, this is another c-vs-c++ mixup.
Minsyms don't come from the debug info so there's no way to obtain the
source name for C.  However, for c++ the linkage name can be demangled
and that name is indeed stored for minsyms (in addition to the linkage
name). [Setting aside the case where the program uses asm(name).]

> The code in symbol_set_names() seems to be doing way more than it needs
> to.  If both the source and linkage names are available, then there
> is no need to mangle, demangle, add a prefix or do anything but save
> the name.

I think part of the intent here is to have the demangled name
available even if there is no debug info (e.g. for c++).  [As for how
one goes about achieving that, I wouldn't disagree with the claim that
there is room for improvement.]

>
> (symbol_set_demangled_name() does save a demangled name for C++.)
>
>
>>> Before the change, dwarf2_physname() calls dwarf2_compute_name()
>>> which returns the symbol name if the language is not C++, Java,
>>> or Fortran, even if the DIE has a DW_AT_linkage_name attribute.
>>> After the change, dwarf2_physname() returns the DW_AT_linkage_name.
>>>
>>> Since gdb doesn't keep both the source name and the linkage
>>> name in the symbol table (a design flaw, IMO) what is the
>>> right way to get the previous behavior, where gdb recognizes
>>> the symbol names from the source file?
>>
>>
>> We need to have dwarf2read.c store both names (linkage name and dwarf
>> name). [More changes may be required beyond that, but I think that's a
>> start.]  Your test program shows that we can't assume we can simply
>> demangle the linkage name to get the source name.
>
>
> Yes, this was my conclusion as well.
>
> When debug info is not available, mangling a source name or demangling
> a linkage name may be necessary.  I don't know that this should be done
> when reading the DWARF info.  When the debug info contains both names,
> or the linkage name can be inferred from the DIE, doing anything
> other than saving their values should not be necessary.

In general I agree.  I think part of the concern here is if there is a
problem in what gcc (or whatever) provides, and how to best cope.

> Understanding what the symbol routines are trying to do with all of
> the mangling and demangling is a challenge.

You're preaching to the choir here ... :-)

> One thing which is not
> clear is whether there is agreement whether the name stored for a symbol
> is supposed to be the source name or the linkagename.

This is part of what I was referring to in my tangential data point.
dwarf2read.c stores the demangled name in symbol.ginfo.name which is
defined to contain the mangled name (if it wasn't, why does
symbol.ginfo.language_specific.cplus_specific.demangled_name exist
:-)).

> symbol_set_names()
> and symbol_set_demangled_name() suggest that symtab.c thinks it is the
> linkage name.  Commands which look up a symbol assume that it is the
> source name.

re: commands: minsym lookup will make two passes first trying the
passed in name as the linkage name, and then trying it as the
source(/search) name, whereas fullsym lookup will just look at one
(SYMBOL_SEARCH_NAME).

> Should there be two symbol entries, one for "foo" and one for "xfoo"?
> A single entry which is searched for a match on either name or linkage
> name?  Something else?

I think we do need to store both.  Otherwise lookup is going to be too painful.
Beyond that, the devil is in the details.

Note for reference sake: What kind of demangled name to store is a bit
more complex than simply storing the fully demangled name.  gdb stores
the "search" name which is not necessarily equivalent to the full
source name (e.g. for c++ templates that encode the result type in the
linkage name).

     prev parent reply	other threads:[~2012-02-23  3:08 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-22  0:41 Michael Eager
2012-02-22 16:40 ` Doug Evans
2012-02-22 19:27   ` Michael Eager
2012-02-23  3:08     ` Doug Evans [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADPb22TuctsUr2F4rRJ0RS4HUaB_aU_NG=T96QFbPzo8+MC=6g@mail.gmail.com' \
    --to=dje@google.com \
    --cc=eager@eagercon.com \
    --cc=gdb@sourceware.org \
    --cc=jan.kratochvil@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).