* Regression handling linkage vs source name @ 2012-02-22 0:41 Michael Eager 2012-02-22 16:40 ` Doug Evans 0 siblings, 1 reply; 4+ messages in thread From: Michael Eager @ 2012-02-22 0:41 UTC (permalink / raw) To: gdb; +Cc: Jan Kratochvil There were changes to dwarf2_physname() around July 1, 2011, which cause it to return a different value. Arguably, the changes make the function work better, returning the linkage name for a symbol where it previously returned the source name. But since the source name is overwritten by the linkage name in the symbol entry, gdb's behavior is different, and, IMO, incorrect. Here is the test case: $ cat t.c extern int foo(void) asm ("xfoo"); int foo(void) {} int main (void) { foo (); } $ gdb-before a.out ... (gdb) b foo Breakpoint 1 at ... $ gdb-after a.out ... (gdb) b foo Make breakpoint pending on future shared library load? (y or [n])? n (gdb) b xfoo Breakpoint 1 at ... The symbol "foo" is no longer present in gdb symbol table so trying to use the name specified in the source fails. Before the change, breakpoint, backtrace, and other commands display the symbol name as in the source (and in the DWARF DIE). After the change, the linkage name is displayed instead of the source name. Before the change, dwarf2_physname() calls dwarf2_compute_name() which returns the symbol name if the language is not C++, Java, or Fortran, even if the DIE has a DW_AT_linkage_name attribute. After the change, dwarf2_physname() returns the DW_AT_linkage_name. Since gdb doesn't keep both the source name and the linkage name in the symbol table (a design flaw, IMO) what is the right way to get the previous behavior, where gdb recognizes the symbol names from the source file? -- Michael Eager eager@eagercon.com 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Regression handling linkage vs source name 2012-02-22 0:41 Regression handling linkage vs source name Michael Eager @ 2012-02-22 16:40 ` Doug Evans 2012-02-22 19:27 ` Michael Eager 0 siblings, 1 reply; 4+ messages in thread From: Doug Evans @ 2012-02-22 16:40 UTC (permalink / raw) To: Michael Eager; +Cc: gdb, Jan Kratochvil On Tue, Feb 21, 2012 at 4:41 PM, Michael Eager <eager@eagercon.com> wrote: > There were changes to dwarf2_physname() around July 1, 2011, which > cause it to return a different value. Arguably, the changes make > the function work better, returning the linkage name for a symbol where > it previously returned the source name. Tangential data point: Outside of dwarf2read.c, gdb generally uses "physname" to mean linkage name. I'm not sure dwarf2_physname ever returned the linkage name (for c/c++). I know there's been some changes in how it computes the source name. > But since the source name > is overwritten by the linkage name in the symbol entry, gdb's > behavior is different, and, IMO, incorrect. Can you elaborate on what you mean by overwritten? [It's not what I see.] > Here is the test case: > > $ cat t.c > extern int foo(void) asm ("xfoo"); > > int foo(void) {} > > int main (void) > { > foo (); > } > > $ gdb-before a.out > ... > (gdb) b foo > Breakpoint 1 at ... > > $ gdb-after a.out > ... > (gdb) b foo > Make breakpoint pending on future shared library load? (y or [n])? n > (gdb) b xfoo > Breakpoint 1 at ... > > > The symbol "foo" is no longer present in gdb symbol table > so trying to use the name specified in the source fails. > Before the change, breakpoint, backtrace, and other commands > display the symbol name as in the source (and in the DWARF DIE). > After the change, the linkage name is displayed instead of the > source name. Even pre-dwarf2_physname gdb's have this problem. Maybe you tried a 7.x variant that I didn't (I think I tried 6.8, 7.0, 7.1, and 7.4 - at the time I didn't have easy access to 7.2,7.3 and didn't want to go to trouble of building them). The problem (or at least one of the problems), as I see it, is that the API for creating symbols is broken. gdb does (or at least can) record both the linkage and source names (it does this for "minsyms", e.g. ELF symbols). But the way to set a symbol's name is via symbol_set_names and it only takes a linkage name (though the dwarf2read.c further compounds the problem by passing it the source name - or more accurately the "search" name). symbol_set_names then tries to demangle the name and will record that name as well if the demangling succeeds. > Before the change, dwarf2_physname() calls dwarf2_compute_name() > which returns the symbol name if the language is not C++, Java, > or Fortran, even if the DIE has a DW_AT_linkage_name attribute. > After the change, dwarf2_physname() returns the DW_AT_linkage_name. > > Since gdb doesn't keep both the source name and the linkage > name in the symbol table (a design flaw, IMO) what is the > right way to get the previous behavior, where gdb recognizes > the symbol names from the source file? We need to have dwarf2read.c store both names (linkage name and dwarf name). [More changes may be required beyond that, but I think that's a start.] Your test program shows that we can't assume we can simply demangle the linkage name to get the source name. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Regression handling linkage vs source name 2012-02-22 16:40 ` Doug Evans @ 2012-02-22 19:27 ` Michael Eager 2012-02-23 3:08 ` Doug Evans 0 siblings, 1 reply; 4+ messages in thread From: Michael Eager @ 2012-02-22 19:27 UTC (permalink / raw) To: Doug Evans; +Cc: gdb, Jan Kratochvil On 02/22/2012 08:40 AM, Doug Evans wrote: > On Tue, Feb 21, 2012 at 4:41 PM, Michael Eager<eager@eagercon.com> wrote: >> There were changes to dwarf2_physname() around July 1, 2011, which >> cause it to return a different value. Arguably, the changes make >> the function work better, returning the linkage name for a symbol where >> it previously returned the source name. > > Tangential data point: Outside of dwarf2read.c, gdb generally uses > "physname" to mean linkage name. It doesn't appear to be the case for the value returned by dwarf2_compute_name(). I haven't looked at other places. > I'm not sure dwarf2_physname ever returned the linkage name (for c/c++). > I know there's been some changes in how it computes the source name. Before the patch, it didn't for C. I didn't look at C++. >> But since the source name >> is overwritten by the linkage name in the symbol entry, gdb's >> behavior is different, and, IMO, incorrect. > > Can you elaborate on what you mean by overwritten? In new_symbol_full(): /* Cache this symbol's name and the name's demangled form (if any). */ SYMBOL_SET_LANGUAGE (sym, cu->language); linkagename = dwarf2_physname (name, die, cu); SYMBOL_SET_NAMES (sym, linkagename, strlen (linkagename), 0, objfile); Clearly the comment is incorrect or misleading: only one name is stored in the symbol. Before the patch, dwarf2_physname() would call dwarf2_compute_name() which would return 'name' unmodified. After the patch, dwarf2_physname() returns the linkage name or a mangled name. (I initially misread the code to save the source name in the symbol, then overwrite it with the linkage name.) > [It's not what I see.] > >> Here is the test case: >> >> $ cat t.c >> extern int foo(void) asm ("xfoo"); >> >> int foo(void) {} >> >> int main (void) >> { >> foo (); >> } >> >> $ gdb-before a.out >> ... >> (gdb) b foo >> Breakpoint 1 at ... >> >> $ gdb-after a.out >> ... >> (gdb) b foo >> Make breakpoint pending on future shared library load? (y or [n])? n >> (gdb) b xfoo >> Breakpoint 1 at ... >> >> >> The symbol "foo" is no longer present in gdb symbol table >> so trying to use the name specified in the source fails. >> Before the change, breakpoint, backtrace, and other commands >> display the symbol name as in the source (and in the DWARF DIE). >> After the change, the linkage name is displayed instead of the >> source name. > > Even pre-dwarf2_physname gdb's have this problem. > Maybe you tried a 7.x variant that I didn't (I think I tried 6.8, 7.0, > 7.1, and 7.4 - at the time I didn't have easy access to 7.2,7.3 and > didn't want to go to trouble of building them). This works correctly in 7.2. Maybe this was an "accident". :-) > The problem (or at least one of the problems), as I see it, is that > the API for creating symbols is broken. I agree. > gdb does (or at least can) record both the linkage and source names > (it does this for "minsyms", e.g. ELF symbols). But the way to set a > symbol's name is via symbol_set_names and it only takes a linkage name > (though the dwarf2read.c further compounds the problem by passing it > the source name - or more accurately the "search" name). > symbol_set_names then tries to demangle the name and will record that > name as well if the demangling succeeds. Well, the minisym only stores a single name, not both linkage and source names. Despite what the comments preceding symbol_set_names() says, there's only one name saved. The code in symbol_set_names() seems to be doing way more than it needs to. If both the source and linkage names are available, then there is no need to mangle, demangle, add a prefix or do anything but save the name. (symbol_set_demangled_name() does save a demangled name for C++.) >> Before the change, dwarf2_physname() calls dwarf2_compute_name() >> which returns the symbol name if the language is not C++, Java, >> or Fortran, even if the DIE has a DW_AT_linkage_name attribute. >> After the change, dwarf2_physname() returns the DW_AT_linkage_name. >> >> Since gdb doesn't keep both the source name and the linkage >> name in the symbol table (a design flaw, IMO) what is the >> right way to get the previous behavior, where gdb recognizes >> the symbol names from the source file? > > We need to have dwarf2read.c store both names (linkage name and dwarf > name). [More changes may be required beyond that, but I think that's a > start.] Your test program shows that we can't assume we can simply > demangle the linkage name to get the source name. Yes, this was my conclusion as well. When debug info is not available, mangling a source name or demangling a linkage name may be necessary. I don't know that this should be done when reading the DWARF info. When the debug info contains both names, or the linkage name can be inferred from the DIE, doing anything other than saving their values should not be necessary. Understanding what the symbol routines are trying to do with all of the mangling and demangling is a challenge. One thing which is not clear is whether there is agreement whether the name stored for a symbol is supposed to be the source name or the linkage name. symbol_set_names() and symbol_set_demangled_name() suggest that symtab.c thinks it is the linkage name. Commands which look up a symbol assume that it is the source name. Should there be two symbol entries, one for "foo" and one for "xfoo"? A single entry which is searched for a match on either name or linkage name? Something else? -- Michael Eager eager@eagercon.com 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Regression handling linkage vs source name 2012-02-22 19:27 ` Michael Eager @ 2012-02-23 3:08 ` Doug Evans 0 siblings, 0 replies; 4+ messages in thread From: Doug Evans @ 2012-02-23 3:08 UTC (permalink / raw) To: Michael Eager; +Cc: gdb, Jan Kratochvil On Wed, Feb 22, 2012 at 11:27 AM, Michael Eager <eager@eagercon.com> wrote: > On 02/22/2012 08:40 AM, Doug Evans wrote: >> >> On Tue, Feb 21, 2012 at 4:41 PM, Michael Eager<eager@eagercon.com> wrote: >>> >>> There were changes to dwarf2_physname() around July 1, 2011, which >>> cause it to return a different value. Arguably, the changes make >>> the function work better, returning the linkage name for a symbol where >>> it previously returned the source name. >> >> >> Tangential data point: Outside of dwarf2read.c, gdb generally uses >> "physname" to mean linkage name. > > > It doesn't appear to be the case for the value returned by > dwarf2_compute_name(). I haven't looked at other places. I'm not sure what your point is here, so I'm going to set it aside for now. [My comment (tangential data point) still stands though.] > >> I'm not sure dwarf2_physname ever returned the linkage name (for c/c++). >> I know there's been some changes in how it computes the source name. > > > Before the patch, it didn't for C. I didn't look at C++. Ah. Indeed. Fun fun fun. >>> But since the source name >>> is overwritten by the linkage name in the symbol entry, gdb's >>> behavior is different, and, IMO, incorrect. >> >> >> Can you elaborate on what you mean by overwritten? > > > In new_symbol_full(): > /* Cache this symbol's name and the name's demangled form (if any). */ > SYMBOL_SET_LANGUAGE (sym, cu->language); > linkagename = dwarf2_physname (name, die, cu); > SYMBOL_SET_NAMES (sym, linkagename, strlen (linkagename), 0, objfile); > > Clearly the comment is incorrect or misleading: only one name is stored in > the symbol. Well, yes and no. Only one name is passed to SYMBOL_SET_NAMES, but it does at least try to set both the mangled(/linkage) and demangled(/source-with-caveats) names. > > Before the patch, dwarf2_physname() would call dwarf2_compute_name() which > would return 'name' unmodified. > > After the patch, dwarf2_physname() returns the linkage name or a mangled > name. Right. This was done to address potential shortcomings in c++-land. IIUC, We totally missed gcc's asm(name) feature. Blech. > (I initially misread the code to save the source name in the symbol, > then overwrite it with the linkage name.) > > >> [It's not what I see.] >> >>> Here is the test case: >>> >>> $ cat t.c >>> extern int foo(void) asm ("xfoo"); >>> >>> int foo(void) {} >>> >>> int main (void) >>> { >>> foo (); >>> } >>> >>> $ gdb-before a.out >>> ... >>> (gdb) b foo >>> Breakpoint 1 at ... >>> >>> $ gdb-after a.out >>> ... >>> (gdb) b foo >>> Make breakpoint pending on future shared library load? (y or [n])? n >>> (gdb) b xfoo >>> Breakpoint 1 at ... >>> >>> >>> The symbol "foo" is no longer present in gdb symbol table >>> so trying to use the name specified in the source fails. >>> Before the change, breakpoint, backtrace, and other commands >>> display the symbol name as in the source (and in the DWARF DIE). >>> After the change, the linkage name is displayed instead of the >>> source name. >> >> >> Even pre-dwarf2_physname gdb's have this problem. >> Maybe you tried a 7.x variant that I didn't (I think I tried 6.8, 7.0, >> 7.1, and 7.4 - at the time I didn't have easy access to 7.2,7.3 and >> didn't want to go to trouble of building them). > > > This works correctly in 7.2. Maybe this was an "accident". :-) I'm not sure, though I think it was at least partly accidental. > >> The problem (or at least one of the problems), as I see it, is that >> the API for creating symbols is broken. > > > I agree. > > >> gdb does (or at least can) record both the linkage and source names >> (it does this for "minsyms", e.g. ELF symbols). But the way to set a >> symbol's name is via symbol_set_names and it only takes a linkage name >> (though the dwarf2read.c further compounds the problem by passing it >> the source name - or more accurately the "search" name). >> symbol_set_names then tries to demangle the name and will record that >> name as well if the demangling succeeds. > > > Well, the minisym only stores a single name, not both linkage and source > names. Despite what the comments preceding symbol_set_names() says, > there's only one name saved. Sorry, this is another c-vs-c++ mixup. Minsyms don't come from the debug info so there's no way to obtain the source name for C. However, for c++ the linkage name can be demangled and that name is indeed stored for minsyms (in addition to the linkage name). [Setting aside the case where the program uses asm(name).] > The code in symbol_set_names() seems to be doing way more than it needs > to. If both the source and linkage names are available, then there > is no need to mangle, demangle, add a prefix or do anything but save > the name. I think part of the intent here is to have the demangled name available even if there is no debug info (e.g. for c++). [As for how one goes about achieving that, I wouldn't disagree with the claim that there is room for improvement.] > > (symbol_set_demangled_name() does save a demangled name for C++.) > > >>> Before the change, dwarf2_physname() calls dwarf2_compute_name() >>> which returns the symbol name if the language is not C++, Java, >>> or Fortran, even if the DIE has a DW_AT_linkage_name attribute. >>> After the change, dwarf2_physname() returns the DW_AT_linkage_name. >>> >>> Since gdb doesn't keep both the source name and the linkage >>> name in the symbol table (a design flaw, IMO) what is the >>> right way to get the previous behavior, where gdb recognizes >>> the symbol names from the source file? >> >> >> We need to have dwarf2read.c store both names (linkage name and dwarf >> name). [More changes may be required beyond that, but I think that's a >> start.] Your test program shows that we can't assume we can simply >> demangle the linkage name to get the source name. > > > Yes, this was my conclusion as well. > > When debug info is not available, mangling a source name or demangling > a linkage name may be necessary. I don't know that this should be done > when reading the DWARF info. When the debug info contains both names, > or the linkage name can be inferred from the DIE, doing anything > other than saving their values should not be necessary. In general I agree. I think part of the concern here is if there is a problem in what gcc (or whatever) provides, and how to best cope. > Understanding what the symbol routines are trying to do with all of > the mangling and demangling is a challenge. You're preaching to the choir here ... :-) > One thing which is not > clear is whether there is agreement whether the name stored for a symbol > is supposed to be the source name or the linkagename. This is part of what I was referring to in my tangential data point. dwarf2read.c stores the demangled name in symbol.ginfo.name which is defined to contain the mangled name (if it wasn't, why does symbol.ginfo.language_specific.cplus_specific.demangled_name exist :-)). > symbol_set_names() > and symbol_set_demangled_name() suggest that symtab.c thinks it is the > linkage name. Commands which look up a symbol assume that it is the > source name. re: commands: minsym lookup will make two passes first trying the passed in name as the linkage name, and then trying it as the source(/search) name, whereas fullsym lookup will just look at one (SYMBOL_SEARCH_NAME). > Should there be two symbol entries, one for "foo" and one for "xfoo"? > A single entry which is searched for a match on either name or linkage > name? Something else? I think we do need to store both. Otherwise lookup is going to be too painful. Beyond that, the devil is in the details. Note for reference sake: What kind of demangled name to store is a bit more complex than simply storing the fully demangled name. gdb stores the "search" name which is not necessarily equivalent to the full source name (e.g. for c++ templates that encode the result type in the linkage name). ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-02-23 3:08 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-02-22 0:41 Regression handling linkage vs source name Michael Eager 2012-02-22 16:40 ` Doug Evans 2012-02-22 19:27 ` Michael Eager 2012-02-23 3:08 ` Doug Evans
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).