public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed
* [Bug symtab/17199] New: Recording two copies of plt minimal symbols is a pain to deal with
@ 2014-07-25 16:58 dje at google dot com
  2014-07-25 19:42 ` [Bug symtab/17199] " dje at google dot com
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: dje at google dot com @ 2014-07-25 16:58 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=17199

            Bug ID: 17199
           Summary: Recording two copies of plt minimal symbols is a pain
                    to deal with
           Product: gdb
           Version: HEAD
            Status: NEW
          Severity: minor
          Priority: P2
         Component: symtab
          Assignee: unassigned at sourceware dot org
          Reporter: dje at google dot com

If there's a category for code cleanup issues, this goes there.

Creating two copies of plt minimal symbols is a pain to deal with.
It complicates trying to understand what's going on, not least of which
because foo@plt gets mst_text and foo gets mst_solib_trampoline ("foo" is the
"special" symbol here, not foo@plt).

I understand the problem being solved here, I'm just thinking there's
got to be a cleaner way.

elfread.c:

          /* For @plt symbols, also record a trampoline to the                  
             destination symbol.  The @plt symbol will be used in               
             disassembly, and the trampoline will be used when we are           
             trying to find the target.  */
          if (msym && ms_type == mst_text && type == ST_SYNTHETIC)
            {
              int len = strlen (sym->name);

              if (len > 4 && strcmp (sym->name + len - 4, "@plt") == 0)
                {
                  struct minimal_symbol *mtramp;

                  mtramp = record_minimal_symbol (sym->name, len - 4, 1,
                                                  symaddr,
                                                  mst_solib_trampoline,
                                                  sym->section, objfile);
                  if (mtramp)
                    {
                      SET_MSYMBOL_SIZE (mtramp, MSYMBOL_SIZE (msym));
                      mtramp->created_by_gdb = 1;
                      mtramp->filename = filesymname;
                      gdbarch_elf_make_msymbol_special (gdbarch, sym, mtramp);
                    }
                }
            }

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug symtab/17199] Recording two copies of plt minimal symbols is a pain to deal with
  2014-07-25 16:58 [Bug symtab/17199] New: Recording two copies of plt minimal symbols is a pain to deal with dje at google dot com
@ 2014-07-25 19:42 ` dje at google dot com
  2014-07-26 21:48 ` xdje42 at gmail dot com
  2023-02-12  6:21 ` cvs-commit at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: dje at google dot com @ 2014-07-25 19:42 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=17199

--- Comment #1 from dje at google dot com ---
More data.
This comment in symtab.h is incomplete.

  /* GDB uses mst_solib_trampoline for the start address of a shared            
     library trampoline entry.  Breakpoints for shared library functions        
     are put there if the shared library is not yet loaded.                     
     After the shared library is loaded, lookup_minimal_symbol will             
     prefer the minimal symbol from the shared library (usually                 
     a mst_text symbol) over the mst_solib_trampoline symbol, and the           
     breakpoints will be moved to their true address in the shared              
     library via breakpoint_re_set.  */

This is incomplete because in actuality there are three symbols:
1) in main exec, foo@plt, mst_text
2) in main exec, foo, mst_solib_trampoline
3) in shared lib, foo, mst_text

The comment leaves the reader guessing how #1 is discarded once the shared lib
is loaded, one reason being it too is mst_text.  Plus if you do "i b" before
the shared library is loaded it is foo@plt that is displayed, not foo.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug symtab/17199] Recording two copies of plt minimal symbols is a pain to deal with
  2014-07-25 16:58 [Bug symtab/17199] New: Recording two copies of plt minimal symbols is a pain to deal with dje at google dot com
  2014-07-25 19:42 ` [Bug symtab/17199] " dje at google dot com
@ 2014-07-26 21:48 ` xdje42 at gmail dot com
  2023-02-12  6:21 ` cvs-commit at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: xdje42 at gmail dot com @ 2014-07-26 21:48 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=17199

Doug Evans <xdje42 at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |xdje42 at gmail dot com

--- Comment #2 from Doug Evans <xdje42 at gmail dot com> ---
[for completeness sake]
It may turn out that fixing pr 17201 involves cleaning this up.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug symtab/17199] Recording two copies of plt minimal symbols is a pain to deal with
  2014-07-25 16:58 [Bug symtab/17199] New: Recording two copies of plt minimal symbols is a pain to deal with dje at google dot com
  2014-07-25 19:42 ` [Bug symtab/17199] " dje at google dot com
  2014-07-26 21:48 ` xdje42 at gmail dot com
@ 2023-02-12  6:21 ` cvs-commit at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-02-12  6:21 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=17199

--- Comment #3 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Andrew Burgess <aburgess@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=f0bdf68d3fb6db1dd2b83e07062e2104cdb785c2

commit f0bdf68d3fb6db1dd2b83e07062e2104cdb785c2
Author: Andrew Burgess <aburgess@redhat.com>
Date:   Fri Dec 16 15:15:42 2022 +0000

    gdb/c++: fix handling of breakpoints on @plt symbols

    This commit should fix PR gdb/20091, PR gdb/17201, and PR gdb/17071.
    Additionally, PR gdb/17199 relates to this area of code, but is more
    of a request to refactor some parts of GDB, this commit does not
    address that request, but it is probably worth reading that PR when
    looking at this commit.

    When the current language is C++, and the user places a breakpoint on
    a function in a shared library, GDB will currently find two locations
    for the breakpoint, one location will be within the function itself as
    we would expect, but the other location will be within the PLT table
    for the call to the named function.  Consider this session:

      $ gdb -q /tmp/breakpoint-shlib-func
      Reading symbols from /tmp/breakpoint-shlib-func...
      (gdb) start
      Temporary breakpoint 1 at 0x40112e: file /tmp/breakpoint-shlib-func.cc,
line 20.
      Starting program: /tmp/breakpoint-shlib-func

      Temporary breakpoint 1, main () at /tmp/breakpoint-shlib-func.cc:20
      20      int answer = foo ();
      (gdb) break foo
      Breakpoint 2 at 0x401030 (2 locations)
      (gdb) info breakpoints
      Num     Type           Disp Enb Address            What
      2       breakpoint     keep y   <MULTIPLE>
      2.1                         y   0x0000000000401030 <foo()@plt>
      2.2                         y   0x00007ffff7fc50fd in foo() at
/tmp/breakpoint-shlib-func-lib.cc:20

    This is not the expected behaviour.  If we compile the same test using
    a C compiler then we see this:

      (gdb) break foo
      Breakpoint 2 at 0x7ffff7fc50fd: file /tmp/breakpoint-shlib-func-c-lib.c,
line 20.
      (gdb) info breakpoints
      Num     Type           Disp Enb Address            What
      2       breakpoint     keep y   0x00007ffff7fc50fd in foo at
/tmp/breakpoint-shlib-func-c-lib.c:20

    Here's what's happening.  When GDB parses the symbols in the main
    executable and the shared library we see a number of different symbols
    for foo, and use these to create entries in GDB's msymbol table:

      - In the main executable we see a symbol 'foo@plt' that points at
        the plt entry for foo, from this we add two entries into GDB's
        msymbol table, one called 'foo@plt' which points at the plt entry
        and has type mst_text, then we create a second symbol, this time
        called 'foo' with type mst_solib_trampoline which also points at
        the plt entry,

      - Then, when the shared library is loaded we see another symbol
        called 'foo', this one points at the actual implementation in the
        shared library.  This time GDB creates a msymbol called 'foo' with
        type mst_text that points at the implementation.

    This means that GDB creates 3 msymbols to represent the 2 symbols
    found in the executable and shared library.

    When the user creates a breakpoint on 'foo' GDB eventually ends up in
    search_minsyms_for_name (linespec.c), this function then calls
    iterate_over_minimal_symbols passing in the name we are looking for
    wrapped in a lookup_name_info object.

    In iterate_over_minimal_symbols we iterate over two hash tables (using
    the name we're looking for as the hash key), first we walk the hash
    table of symbol linkage names, then we walk the hash table of
    demangled symbol names.

    When the language is C++ the symbols for 'foo' will all have been
    mangled, as a result, in this case, the iteration of the linkage name
    hash table will find no matching results.

    However, when we walk the demangled hash table we do find some
    results.  In order to match symbol names, GDB obtains a symbol name
    matching function by calling the get_symbol_name_matcher method on the
    language_defn class.  For C++, in this case, the matching function we
    use is cp_fq_symbol_name_matches, which delegates the work to
    strncmp_iw_with_mode with mode strncmp_iw_mode::MATCH_PARAMS and
    language set to language_cplus.

    The strncmp_iw_mode::MATCH_PARAMS mode means that strncmp_iw_mode will
    skip any parameters in the demangled symbol name when checking for a
    match, e.g. 'foo' will match the demangled name 'foo()'.  The way this
    is done is that the strings are matched character by character, but,
    once the string we are looking for ('foo' here) is exhausted, if we
    are looking at '(' then we consider the match a success.

    Lets consider the 3 symbols GDB created.  If the function declaration
    is 'void foo ()' then from the main executable we added symbols
    '_Z3foov@plt' and '_Z3foov', while from the shared library we added
    another symbol call '_Z3foov'.  When these are demangled they become
    'foo()@plt', 'foo()', and 'foo()' respectively.

    Now, the '_Z3foov' symbol from the main executable has the type
    mst_solib_trampoline, and in search_minsyms_for_name, we search for
    any symbols of type mst_solib_trampoline and filter these out of the
    results.

    However, the '_Z3foov@plt' symbol (from the main executable), and the
    '_Z3foov' symbol (from the shared library) both have type mst_text.

    During the demangled name matching, due to the use of MATCH_PARAMS
    mode, we stop the comparison as soon as we hit a '(' in the demangled
    name.  And so, '_Z3foov@plt', which demangles to 'foo()@plt' matches
    'foo', and '_Z3foov', which demangles to 'foo()' also matches 'foo'.

    By contrast, for C, there are no demangled hash table entries to be
    iterated over (in iterate_over_minimal_symbols), we only consider the
    linkage name symbols which are 'foo@plt' and 'foo'.  The plain 'foo'
    symbol obviously matches when we are looking for 'foo', but in this
    case the 'foo@plt' will not match due to the '@plt' suffix.

    And so, when the user asks for a breakpoint in 'foo', and the language
    is C, search_minsyms_for_name, returns a single msymbol, the mst_text
    symbol for foo in the shared library, while, when the language is C++,
    we get two results, '_Z3foov' for the shared library function, and
    '_Z3foov@plt' for the plt entry in the main executable.

    I propose to fix this in strncmp_iw_with_mode.  When the mode is
    MATCH_PARAMS, instead of stopping at a '(' and assuming the match is a
    success, GDB will instead search forward for the matching, closing,
    ')', effectively skipping the parameter list, and then resume
    matching.  Thus, when comparing 'foo' to 'foo()@plt' GDB will
    effectively compare against 'foo@plt' (skipping the parameter list),
    and the match will fail, just as it does when the language is C.

    There is one slight complication, which is revealed by the test
    gdb.linespec/cpcompletion.exp, when searching for the symbol of a
    const member function, the demangled symbol will have 'const' at the
    end of its name, e.g.:

      struct_with_const_overload::const_overload_fn() const

    Previously, the matching would stop at the '(' character, but after my
    change the whole '()' is skipped, and the match resumes.  As a result,
    the 'const' modifier results in a failure to match, when previously
    GDB would have found a match.

    To work around this issue, in strncmp_iw_with_mode, when mode is
    MATCH_PARAMS, after skipping the parameter list, if the next character
    is '@' then we assume we are looking at something like '@plt' and
    return a value indicating the match failed, otherwise, we return a
    value indicating the match succeeded, this allows things like 'const'
    to be skipped.

    With these changes in place I now see GDB correctly setting a
    breakpoint only at the implementation of 'foo' in the shared library.

    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=20091
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17201
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17071
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=17199

    Tested-By: Bruno Larsen <blarsen@redhat.com>
    Approved-By: Simon Marchi <simon.marchi@efficios.com>

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-02-12  6:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-25 16:58 [Bug symtab/17199] New: Recording two copies of plt minimal symbols is a pain to deal with dje at google dot com
2014-07-25 19:42 ` [Bug symtab/17199] " dje at google dot com
2014-07-26 21:48 ` xdje42 at gmail dot com
2023-02-12  6:21 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).