Regression handling linkage vs source name

public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed

* Regression handling linkage vs source name
@ 2012-02-22  0:41 Michael Eager
  2012-02-22 16:40 ` Doug Evans
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Eager @ 2012-02-22  0:41 UTC (permalink / raw)
  To: gdb; +Cc: Jan Kratochvil

There were changes to dwarf2_physname() around July 1, 2011, which
cause it to return a different value.  Arguably, the changes make
the function work better, returning the linkage name for a symbol where
it previously returned the source name.  But since the source name
is overwritten by the linkage name in the symbol entry, gdb's
behavior is different, and, IMO, incorrect.

Here is the test case:

$ cat t.c
extern int foo(void) asm ("xfoo");

int foo(void) {}

int main (void)
{
   foo ();
}

$ gdb-before a.out
...
(gdb) b foo
Breakpoint 1 at ...

$ gdb-after a.out
...
(gdb) b foo
Make breakpoint pending on future shared library load? (y or [n])? n
(gdb) b xfoo
Breakpoint 1 at ...

The symbol "foo" is no longer present in gdb symbol table
so trying to use the name specified in the source fails.
Before the change, breakpoint, backtrace, and other commands
display the symbol name as in the source (and in the DWARF DIE).
After the change, the linkage name is displayed instead of the
source name.

Before the change, dwarf2_physname() calls dwarf2_compute_name()
which returns the symbol name if the language is not C++, Java,
or Fortran, even if the DIE has a DW_AT_linkage_name attribute.
After the change, dwarf2_physname() returns the DW_AT_linkage_name.

Since gdb doesn't keep both the source name and the linkage
name in the symbol table (a design flaw, IMO) what is the
right way to get the previous behavior, where gdb recognizes
the symbol names from the source file?

-- 
Michael Eager	 eager@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Regression handling linkage vs source name
  2012-02-22  0:41 Regression handling linkage vs source name Michael Eager
@ 2012-02-22 16:40 ` Doug Evans
  2012-02-22 19:27   ` Michael Eager
  0 siblings, 1 reply; 4+ messages in thread
From: Doug Evans @ 2012-02-22 16:40 UTC (permalink / raw)
  To: Michael Eager; +Cc: gdb, Jan Kratochvil

On Tue, Feb 21, 2012 at 4:41 PM, Michael Eager <eager@eagercon.com> wrote:
> There were changes to dwarf2_physname() around July 1, 2011, which
> cause it to return a different value.  Arguably, the changes make
> the function work better, returning the linkage name for a symbol where
> it previously returned the source name.

Tangential data point: Outside of dwarf2read.c, gdb generally uses
"physname" to mean linkage name.

I'm not sure dwarf2_physname ever returned the linkage name (for c/c++).
I know there's been some changes in how it computes the source name.

> But since the source name
> is overwritten by the linkage name in the symbol entry, gdb's
> behavior is different, and, IMO, incorrect.

Can you elaborate on what you mean by overwritten?
[It's not what I see.]

> Here is the test case:
>
> $ cat t.c
> extern int foo(void) asm ("xfoo");
>
> int foo(void) {}
>
> int main (void)
> {
>  foo ();
> }
>
> $ gdb-before a.out
> ...
> (gdb) b foo
> Breakpoint 1 at ...
>
> $ gdb-after a.out
> ...
> (gdb) b foo
> Make breakpoint pending on future shared library load? (y or [n])? n
> (gdb) b xfoo
> Breakpoint 1 at ...
>
>
> The symbol "foo" is no longer present in gdb symbol table
> so trying to use the name specified in the source fails.
> Before the change, breakpoint, backtrace, and other commands
> display the symbol name as in the source (and in the DWARF DIE).
> After the change, the linkage name is displayed instead of the
> source name.

Even pre-dwarf2_physname gdb's have this problem.
Maybe you tried a 7.x variant that I didn't (I think I tried 6.8, 7.0,
7.1, and 7.4 - at the time I didn't have easy access to 7.2,7.3 and
didn't want to go to trouble of building them).

The problem (or at least one of the problems), as I see it, is that
the API for creating symbols is broken.
gdb does (or at least can) record both the linkage and source names
(it does this for "minsyms", e.g. ELF symbols).  But the way to set a
symbol's name is via symbol_set_names and it only takes a linkage name
(though the dwarf2read.c further compounds the problem by passing it
the source name - or more accurately the "search" name).
symbol_set_names then tries to demangle the name and will record that
name as well if the demangling succeeds.

> Before the change, dwarf2_physname() calls dwarf2_compute_name()
> which returns the symbol name if the language is not C++, Java,
> or Fortran, even if the DIE has a DW_AT_linkage_name attribute.
> After the change, dwarf2_physname() returns the DW_AT_linkage_name.
>
> Since gdb doesn't keep both the source name and the linkage
> name in the symbol table (a design flaw, IMO) what is the
> right way to get the previous behavior, where gdb recognizes
> the symbol names from the source file?

We need to have dwarf2read.c store both names (linkage name and dwarf
name). [More changes may be required beyond that, but I think that's a
start.]  Your test program shows that we can't assume we can simply
demangle the linkage name to get the source name.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Regression handling linkage vs source name
  2012-02-22 16:40 ` Doug Evans
@ 2012-02-22 19:27   ` Michael Eager
  2012-02-23  3:08     ` Doug Evans
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Eager @ 2012-02-22 19:27 UTC (permalink / raw)
  To: Doug Evans; +Cc: gdb, Jan Kratochvil

On 02/22/2012 08:40 AM, Doug Evans wrote:
> On Tue, Feb 21, 2012 at 4:41 PM, Michael Eager<eager@eagercon.com>  wrote:
>> There were changes to dwarf2_physname() around July 1, 2011, which
>> cause it to return a different value.  Arguably, the changes make
>> the function work better, returning the linkage name for a symbol where
>> it previously returned the source name.
>
> Tangential data point: Outside of dwarf2read.c, gdb generally uses
> "physname" to mean linkage name.

It doesn't appear to be the case for the value returned by
dwarf2_compute_name().  I haven't looked at other places.

> I'm not sure dwarf2_physname ever returned the linkage name (for c/c++).
> I know there's been some changes in how it computes the source name.

Before the patch, it didn't for C.  I didn't look at C++.

>> But since the source name
>> is overwritten by the linkage name in the symbol entry, gdb's
>> behavior is different, and, IMO, incorrect.
>
> Can you elaborate on what you mean by overwritten?

In new_symbol_full():
       /* Cache this symbol's name and the name's demangled form (if any).  */
       SYMBOL_SET_LANGUAGE (sym, cu->language);
       linkagename = dwarf2_physname (name, die, cu);
       SYMBOL_SET_NAMES (sym, linkagename, strlen (linkagename), 0, objfile);

Clearly the comment is incorrect or misleading:  only one name is stored in
the symbol.

Before the patch, dwarf2_physname() would call dwarf2_compute_name() which
would return 'name' unmodified.

After the patch, dwarf2_physname() returns the linkage name or a mangled
name.

(I initially misread the code to save the source name in the symbol,
then overwrite it with the linkage name.)

> [It's not what I see.]
>
>> Here is the test case:
>>
>> $ cat t.c
>> extern int foo(void) asm ("xfoo");
>>
>> int foo(void) {}
>>
>> int main (void)
>> {
>>   foo ();
>> }
>>
>> $ gdb-before a.out
>> ...
>> (gdb) b foo
>> Breakpoint 1 at ...
>>
>> $ gdb-after a.out
>> ...
>> (gdb) b foo
>> Make breakpoint pending on future shared library load? (y or [n])? n
>> (gdb) b xfoo
>> Breakpoint 1 at ...
>>
>>
>> The symbol "foo" is no longer present in gdb symbol table
>> so trying to use the name specified in the source fails.
>> Before the change, breakpoint, backtrace, and other commands
>> display the symbol name as in the source (and in the DWARF DIE).
>> After the change, the linkage name is displayed instead of the
>> source name.
>
> Even pre-dwarf2_physname gdb's have this problem.
> Maybe you tried a 7.x variant that I didn't (I think I tried 6.8, 7.0,
> 7.1, and 7.4 - at the time I didn't have easy access to 7.2,7.3 and
> didn't want to go to trouble of building them).

This works correctly in 7.2.  Maybe this was an "accident".  :-)

> The problem (or at least one of the problems), as I see it, is that
> the API for creating symbols is broken.

I agree.

> gdb does (or at least can) record both the linkage and source names
> (it does this for "minsyms", e.g. ELF symbols).  But the way to set a
> symbol's name is via symbol_set_names and it only takes a linkage name
> (though the dwarf2read.c further compounds the problem by passing it
> the source name - or more accurately the "search" name).
> symbol_set_names then tries to demangle the name and will record that
> name as well if the demangling succeeds.

Well, the minisym only stores a single name, not both linkage and source
names.  Despite what the comments preceding symbol_set_names() says,
there's only one name saved.

The code in symbol_set_names() seems to be doing way more than it needs
to.  If both the source and linkage names are available, then there
is no need to mangle, demangle, add a prefix or do anything but save
the name.

(symbol_set_demangled_name() does save a demangled name for C++.)

>> Before the change, dwarf2_physname() calls dwarf2_compute_name()
>> which returns the symbol name if the language is not C++, Java,
>> or Fortran, even if the DIE has a DW_AT_linkage_name attribute.
>> After the change, dwarf2_physname() returns the DW_AT_linkage_name.
>>
>> Since gdb doesn't keep both the source name and the linkage
>> name in the symbol table (a design flaw, IMO) what is the
>> right way to get the previous behavior, where gdb recognizes
>> the symbol names from the source file?
>
> We need to have dwarf2read.c store both names (linkage name and dwarf
> name). [More changes may be required beyond that, but I think that's a
> start.]  Your test program shows that we can't assume we can simply
> demangle the linkage name to get the source name.

Yes, this was my conclusion as well.

When debug info is not available, mangling a source name or demangling
a linkage name may be necessary.  I don't know that this should be done
when reading the DWARF info.  When the debug info contains both names,
or the linkage name can be inferred from the DIE, doing anything
other than saving their values should not be necessary.

Understanding what the symbol routines are trying to do with all of
the mangling and demangling is a challenge.   One thing which is not
clear is whether there is agreement whether the name stored for a symbol
is supposed to be the source name or the linkage name.  symbol_set_names()
and symbol_set_demangled_name() suggest that symtab.c thinks it is the
linkage name.  Commands which look up a symbol assume that it is the
source name.

Should there be two symbol entries, one for "foo" and one for "xfoo"?
A single entry which is searched for a match on either name or linkage
name?  Something else?

-- 
Michael Eager	 eager@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Regression handling linkage vs source name
  2012-02-22 19:27   ` Michael Eager
@ 2012-02-23  3:08     ` Doug Evans
  0 siblings, 0 replies; 4+ messages in thread
From: Doug Evans @ 2012-02-23  3:08 UTC (permalink / raw)
  To: Michael Eager; +Cc: gdb, Jan Kratochvil

On Wed, Feb 22, 2012 at 11:27 AM, Michael Eager <eager@eagercon.com> wrote:
> On 02/22/2012 08:40 AM, Doug Evans wrote:
>>
>> On Tue, Feb 21, 2012 at 4:41 PM, Michael Eager<eager@eagercon.com>  wrote:
>>>
>>> There were changes to dwarf2_physname() around July 1, 2011, which
>>> cause it to return a different value.  Arguably, the changes make
>>> the function work better, returning the linkage name for a symbol where
>>> it previously returned the source name.
>>
>>
>> Tangential data point: Outside of dwarf2read.c, gdb generally uses
>> "physname" to mean linkage name.
>
>
> It doesn't appear to be the case for the value returned by
> dwarf2_compute_name().  I haven't looked at other places.

I'm not sure what your point is here, so I'm going to set it aside for now.
[My comment (tangential data point) still stands though.]

>
>> I'm not sure dwarf2_physname ever returned the linkage name (for c/c++).
>> I know there's been some changes in how it computes the source name.
>
>
> Before the patch, it didn't for C.  I didn't look at C++.

Ah.  Indeed.
Fun fun fun.

>>> But since the source name
>>> is overwritten by the linkage name in the symbol entry, gdb's
>>> behavior is different, and, IMO, incorrect.
>>
>>
>> Can you elaborate on what you mean by overwritten?
>
>
> In new_symbol_full():
>      /* Cache this symbol's name and the name's demangled form (if any).  */
>      SYMBOL_SET_LANGUAGE (sym, cu->language);
>      linkagename = dwarf2_physname (name, die, cu);
>      SYMBOL_SET_NAMES (sym, linkagename, strlen (linkagename), 0, objfile);
>
> Clearly the comment is incorrect or misleading:  only one name is stored in
> the symbol.

Well, yes and no.
Only one name is passed to SYMBOL_SET_NAMES, but it does at least try
to set both the mangled(/linkage) and demangled(/source-with-caveats)
names.

>
> Before the patch, dwarf2_physname() would call dwarf2_compute_name() which
> would return 'name' unmodified.
>
> After the patch, dwarf2_physname() returns the linkage name or a mangled
> name.

Right.  This was done to address potential shortcomings in c++-land.
IIUC, We totally missed gcc's asm(name) feature.  Blech.

> (I initially misread the code to save the source name in the symbol,
> then overwrite it with the linkage name.)
>
>
>> [It's not what I see.]
>>
>>> Here is the test case:
>>>
>>> $ cat t.c
>>> extern int foo(void) asm ("xfoo");
>>>
>>> int foo(void) {}
>>>
>>> int main (void)
>>> {
>>>  foo ();
>>> }
>>>
>>> $ gdb-before a.out
>>> ...
>>> (gdb) b foo
>>> Breakpoint 1 at ...
>>>
>>> $ gdb-after a.out
>>> ...
>>> (gdb) b foo
>>> Make breakpoint pending on future shared library load? (y or [n])? n
>>> (gdb) b xfoo
>>> Breakpoint 1 at ...
>>>
>>>
>>> The symbol "foo" is no longer present in gdb symbol table
>>> so trying to use the name specified in the source fails.
>>> Before the change, breakpoint, backtrace, and other commands
>>> display the symbol name as in the source (and in the DWARF DIE).
>>> After the change, the linkage name is displayed instead of the
>>> source name.
>>
>>
>> Even pre-dwarf2_physname gdb's have this problem.
>> Maybe you tried a 7.x variant that I didn't (I think I tried 6.8, 7.0,
>> 7.1, and 7.4 - at the time I didn't have easy access to 7.2,7.3 and
>> didn't want to go to trouble of building them).
>
>
> This works correctly in 7.2.  Maybe this was an "accident".  :-)

I'm not sure, though I think it was at least partly accidental.

>
>> The problem (or at least one of the problems), as I see it, is that
>> the API for creating symbols is broken.
>
>
> I agree.
>
>
>> gdb does (or at least can) record both the linkage and source names
>> (it does this for "minsyms", e.g. ELF symbols).  But the way to set a
>> symbol's name is via symbol_set_names and it only takes a linkage name
>> (though the dwarf2read.c further compounds the problem by passing it
>> the source name - or more accurately the "search" name).
>> symbol_set_names then tries to demangle the name and will record that
>> name as well if the demangling succeeds.
>
>
> Well, the minisym only stores a single name, not both linkage and source
> names.  Despite what the comments preceding symbol_set_names() says,
> there's only one name saved.

Sorry, this is another c-vs-c++ mixup.
Minsyms don't come from the debug info so there's no way to obtain the
source name for C.  However, for c++ the linkage name can be demangled
and that name is indeed stored for minsyms (in addition to the linkage
name). [Setting aside the case where the program uses asm(name).]

> The code in symbol_set_names() seems to be doing way more than it needs
> to.  If both the source and linkage names are available, then there
> is no need to mangle, demangle, add a prefix or do anything but save
> the name.

I think part of the intent here is to have the demangled name
available even if there is no debug info (e.g. for c++).  [As for how
one goes about achieving that, I wouldn't disagree with the claim that
there is room for improvement.]

>
> (symbol_set_demangled_name() does save a demangled name for C++.)
>
>
>>> Before the change, dwarf2_physname() calls dwarf2_compute_name()
>>> which returns the symbol name if the language is not C++, Java,
>>> or Fortran, even if the DIE has a DW_AT_linkage_name attribute.
>>> After the change, dwarf2_physname() returns the DW_AT_linkage_name.
>>>
>>> Since gdb doesn't keep both the source name and the linkage
>>> name in the symbol table (a design flaw, IMO) what is the
>>> right way to get the previous behavior, where gdb recognizes
>>> the symbol names from the source file?
>>
>>
>> We need to have dwarf2read.c store both names (linkage name and dwarf
>> name). [More changes may be required beyond that, but I think that's a
>> start.]  Your test program shows that we can't assume we can simply
>> demangle the linkage name to get the source name.
>
>
> Yes, this was my conclusion as well.
>
> When debug info is not available, mangling a source name or demangling
> a linkage name may be necessary.  I don't know that this should be done
> when reading the DWARF info.  When the debug info contains both names,
> or the linkage name can be inferred from the DIE, doing anything
> other than saving their values should not be necessary.

In general I agree.  I think part of the concern here is if there is a
problem in what gcc (or whatever) provides, and how to best cope.

> Understanding what the symbol routines are trying to do with all of
> the mangling and demangling is a challenge.

You're preaching to the choir here ... :-)

> One thing which is not
> clear is whether there is agreement whether the name stored for a symbol
> is supposed to be the source name or the linkagename.

This is part of what I was referring to in my tangential data point.
dwarf2read.c stores the demangled name in symbol.ginfo.name which is
defined to contain the mangled name (if it wasn't, why does
symbol.ginfo.language_specific.cplus_specific.demangled_name exist
:-)).

> symbol_set_names()
> and symbol_set_demangled_name() suggest that symtab.c thinks it is the
> linkage name.  Commands which look up a symbol assume that it is the
> source name.

re: commands: minsym lookup will make two passes first trying the
passed in name as the linkage name, and then trying it as the
source(/search) name, whereas fullsym lookup will just look at one
(SYMBOL_SEARCH_NAME).

> Should there be two symbol entries, one for "foo" and one for "xfoo"?
> A single entry which is searched for a match on either name or linkage
> name?  Something else?

I think we do need to store both.  Otherwise lookup is going to be too painful.
Beyond that, the devil is in the details.

Note for reference sake: What kind of demangled name to store is a bit
more complex than simply storing the fully demangled name.  gdb stores
the "search" name which is not necessarily equivalent to the full
source name (e.g. for c++ templates that encode the result type in the
linkage name).

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-02-23  3:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-22  0:41 Regression handling linkage vs source name Michael Eager
2012-02-22 16:40 ` Doug Evans
2012-02-22 19:27   ` Michael Eager
2012-02-23  3:08     ` Doug Evans

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).