public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Lancelot SIX <lsix@lancelotsix.com>
To: Simon Marchi <simon.marchi@polymtl.ca>
Cc: gdb-patches@sourceware.org
Subject: Re: [PATCH v3 5/7] gdb: add "id" fields to identify symtabs and subfiles
Date: Thu, 28 Apr 2022 23:53:24 +0000	[thread overview]
Message-ID: <20220428235314.elsyab5pk5bufivr@ubuntu.lan> (raw)
In-Reply-To: <20220428033542.1636284-6-simon.marchi@polymtl.ca>

Hi,

Just some nits below.

When applying the patch I have:

    Applying: gdb: add "id" fields to identify symtabs and subfiles
    .git/rebase-apply/patch:153: trailing whitespace.
         It is used to look up existing subfiles in calls to start_subfile.  */ 
    warning: 1 line adds whitespace errors.


On Wed, Apr 27, 2022 at 11:35:40PM -0400, Simon Marchi via Gdb-patches wrote:
> Printing macros defined in the main source file doesn't work reliably
> using various toolchains, especially when DWARF 5 is used.  For example,
> using the binaries produced by either of these commands:
> 
>     $ gcc --version
>     gcc (GCC) 11.2.0
>     $ ld --version
>     GNU ld (GNU Binutils) 2.38
>     $ gcc test.c -g3 -gdwarf-5
> 
>     $ clang --version
>     clang version 13.0.1
>     $ clang test.c -gdwarf-5 -fdebug-macro
> 
> I get:
> 
>     $ ./gdb -nx -q --data-directory=data-directory a.out
>     (gdb) start
>     Temporary breakpoint 1 at 0x111d: file test.c, line 6.
>     Starting program: /home/simark/build/binutils-gdb-one-target/gdb/a.out
> 
>     Temporary breakpoint 1, main () at test.c:6
>     6         return ZERO;
>     (gdb) p ZERO
>     No symbol "ZERO" in current context.
> 
> When starting to investigate this (taking the gcc-compiled binary as an
> example), we see that GDB fails to look up the appropriate macro scope
> when evaluating the expression.  While stopped in
> macro_lookup_inclusion:
> 
>     (top-gdb) p name
>     $1 = 0x62100011a980 "test.c"
>     (top-gdb) p source.filename
>     $2 = 0x62100011a9a0 "/home/simark/build/binutils-gdb-one-target/gdb/test.c"
> 
> `source` is the macro_source_file that we would expect GDB to find.
> `name` comes from the symtab::filename field of the symtab we are
> stopped in.  GDB doesn't find the appropriate macro_source_file because
> the name of the macro_source_file doesn't match exactly the name of the
> symtab.
> 
> The name of the main symtab comes from the compilation unit's
> DW_AT_name, passed to the buildsym_compunit's constructor:
> 
>   https://gitlab.com/gnutools/binutils-gdb/-/blob/4815d6125ec580cc02a1094d61b8c9d1cc83c0a1/gdb/dwarf2/read.c#L10627-10630
> 
> The contents of DW_AT_name, in this case, is "test.c".  It is typically
> (what I witnessed all compilers do) the same string that was passed to
> the compiler on the command-line.
> 
> The name of the macro_source_file comes from the line number program
> header's file table, from the call to the line_header::file_file_name
> method:
> 
>   https://gitlab.com/gnutools/binutils-gdb/-/blob/4815d6125ec580cc02a1094d61b8c9d1cc83c0a1/gdb/dwarf2/macro.c#L54-65
> 
> line_header::file_file_name prepends the directory path that the file
> entry refers to, in the file table (if the file name is not already
> absolute).  In this case, the file name is "test.c", appended to the
> directory "/home/simark/build/binutils-gdb-one-target/gdb".
> 
> Because the symtab's name is not created the same way as the
> macro_source_file's name is created, we get this mismatch.  GDB fails to
> find the appropriate macro scope for the symtab, and we can't print
> macros when stopped in that symtab.
> 
> To make this work, we must ensure that paths produced in these two ways
> end up identical.  This can be tricky because of the different ways a
> path can be passed to the compiler.
> 
> Another this to consider is that while the main symtab's name (or

s/this/thing/ ?

> subfile, before it becomes a symtab) is created using DW_AT_name, the
> main symtab is also referred to using its entry in the line table
> header's file table, when processing the line table.  We must therefore
> ensure that the same name is produced in both cases, so that a call to
> "start_subfile" for the main subfile will correctly find the
> already-created subfile, created by buildsym_compunit's constructor.  If
> we fail to do that, things still often work, because of a fallback: the
> watch_main_source_file_lossage method.  This method determines that if
> the main subfile has no symbols but there exists another subfile with
> the same basename (e.g. "test.c") that does have symbols, it's probably
> because there was some filename mismatch.  So it replaces the main
> subfile with that other subfile.  I think that heuristic is useful as a
> last effort to work around any bug or bad debug info, but I don't think
> we should design things such as to rely on it.  It's a heuristic, it can
> get things wrong.  So in my search for a fix, it is important that given
> some good debug info, we don't end up relying on that for things to
> work.
> 
> A first attempt at fixing this was to try to prepend the compilation
> directory here or not prepend it there.  In practice, because of all the
> possible combinations of debug info the compilers produce, it was not
> possible to get something that would produce reliable, consistent paths.
> 
> Another attempt at fixing this was to make both macro_source_file
> objects and symtab objects use the most complete form of path possible.
> That means to prepend directories at least until we get an absolute
> path.  In theory, we should end up with the same path in all cases.
> This generally worked, but because it changed the symtab names, it
> resulted in user-visible changes (for example, paths to source files in
> Breakpoint hit messages becoming always absolute).  I didn't find this
> very good, first because there is a "set filename-display" setting that
> lets the user control how they want the paths to be displayed, and that
> would suddenly make this setting completely ineffective (although even
> today, it is a bit dependent on the debug info).  Second, it would
> require a good amount of testsuite tweaks to make tests accept these
> suddenly absolute paths.
> 
> This new patch is a slight variation of that: it adds a new field called
> "filename_for_id" in struct symtab and struct subfile, next to the
> existing filename field. The goal is to separate the internal ids used
> for finding objects from the names used for presentation.  This field is
> used for identifying subfiles, symtabs and macro_source_files
> internally.  For DWARF symtabs, this new field is meant to contain the
> "most complete possible" path, as discussed above.  So for a given file,
> it must always be in the same form, everywhere.  The existing
> symtab::filename field remains the one used for printing to the user, so
> there shouldn't be any change in how paths are printed.
> 
> Changes in the core symtab files are:
> 
>  - Add "name_for_id" and "filename_for_id" fields to "struct subfile"
>    and "struct symta"b, next to existing "name" and "filename" fields.
                      ^
Two chars are swapped here.

>  - Make buildsym_compunit::buildsym_compunit and
>    buildsym_compunit::start_subfile accept a "name_for_id" parameter
>    next to the existing "name" ones.
>  - Make buildsym_compunit::start_subfile used "name_for_id" for looking
>    up existing subfiles.  This is the key thing for making calls
>    to start_subfile for the main source file look up the existing
>    subfile successfully, and avoid relying on
>    watch_main_source_file_lossage.
>  - Make sal_macro_scope pass "filename_for_id", rather than "filename",
>    to macro_lookup_inclusion.  This is the key thing to making the
>    lookup work and macro printing work.
> 
> Changes in the DWARF files are:
> 
>  - Make line_header::file_file_name return the "most complete possible"
>    name.  The only pre-existing user of this method is the macro code,
>    to give the macro_source_file objects their name.  And we now want
>    them to have this "most complete possible" name, which will match the
>    corresponding symtab's "filename_for_id".
>  - Make dwarf2_cu::start_compunit_symtab pass the "most complete
>    possible" name for the main symtab's "filename_for_id".  In this
>    context, where the info comes from the compilation unit's DW_AT_name
>    / DW_AT_comp_dir, it means prepending DW_AT_comp_dir to DW_AT_name if
>    DW_AT_name is not already absolute.
>  - Change dwarf2_start_subfile to build a name_for_id for the subfile
>    being started.  The simplest way is to re-use
>    line_header::file_file_name, since the callers always have a
>    file_entry handy.  This ensures that it will get the exact same path
>    representation as the macro code does, for the same file (since it
>    also uses line_header::file_file_name).
>  - Update calls to allocate_symtab to pass the "name_for_id" from the
>    subfile.
> 
> The rest of the changes are to update the other symtab users (jit,
> ctfread, mdebugread, xcoffread).  For those, the same value is passed
> for the "id" as the for the filename, so they should keep the same
> behavior they have today.

I guess this is just personal preference, but I would have chosen to
add overloads so those files do not have to change / bother about an
argument which is meaningless for them.

Something like:

    // gdb/buildsym.h
    struct buildsym_compunit
    {
      buildsym_compunit (struct objfile *objfile_, const char *name,
                         const char *comp_dir_, enum language language_,
                         CORE_ADDR last_addr)
        : buildsym_compunit (objfile_, name, comp_dir_, name,
                             language_, last_addr);
      buildsym_compunit (struct objfile *objfile_, const char *name,
                         const char *comp_dir_, const char *name_for_id,
                         language language_, CORE_ADDR last_addr);

      ...

      void start_subfile (const char *name, const char *name_for_id);

      void start_subfile (const char *name)
      { start_subfile (name, name); }
    };

    // gdb/symfile.h
    extern struct symtab *allocate_symtab
      (struct compunit_symtab *cust, const char *filename, const char *id))
      ATTRIBUTE_NONNULL (1);

    static inline struct symtab *
    allocate_symtab (struct compunit_symtab *cust, const char *filename)
    {
      return allocate_sumtab (cust, filename, filename);
    }

In the end it does not make much difference, your approach work also
perfectly fine so feel free to keep it the way it is.

Otherwise, FWIW I find the approach to keep one string representation of
the filename for display and one for some sort of canonicalization
perfectly appropriate.

> 
> Tests exercising all this are added by the following patch.
> 
> Of all the cases I tried, the only one I found that ends up relying on
> watch_main_source_file_lossage is the following one:
> 
>     $ clang --version
>     clang version 13.0.1
>     Target: x86_64-pc-linux-gnu
>     Thread model: posix
>     InstalledDir: /usr/bin
>     $ clang  ./test.c -g3 -O0 -gdwarf-4
>     $ ./gdb -nx --data-directory=data-directory -q -readnow -iex "set debug symtab-create 1"  a.out
>     ...
>     [symtab-create] start_subfile: name = test.c, name_for_id = /home/simark/build/binutils-gdb-one-target/gdb/test.c
>     [symtab-create] start_subfile: name = ./test.c, name_for_id = /home/simark/build/binutils-gdb-one-target/gdb/./test.c
>     [symtab-create] start_subfile: name = ./test.c, name_for_id = /home/simark/build/binutils-gdb-one-target/gdb/./test.c
>     [symtab-create] start_subfile: found existing symtab with name_for_id /home/simark/build/binutils-gdb-one-target/gdb/./test.c (/home/simark/build/binutils-gdb-one-target/gdb/./test.c)
>     [symtab-create] watch_main_source_file_lossage: using subfile ./test.c as the main subfile
> 
> As we can see, there are two forms used for "test.c", one with a "." and
> one without.  This comes from the fact that the compilation unit DIE
> contains:
> 
>     DW_AT_name ("test.c")
>     DW_AT_comp_dir ("/home/simark/build/binutils-gdb-one-target/gdb")
> 
> without a ".", and the line table for that file contains:
> 
>     include_directories[  1] = "."
>     file_names[  1]:
>                name: "test.c"
>           dir_index: 1
> 
> When assembling the filename from that entry, we get a ".".
> 
> It is a bit unexpected that the main filename resulting from the line
> table header does not match exactly the name in the compilation unit.
> For instance, gcc uses "./test.c" for the DW_AT_name, which gives
> identical paths in the compilation unit and in the line table header.
> 
> Similary, with DWARF 5:

s/Similary/Similarly/

Best,
Lancelot.
> 
>     $ clang  ./test.c -g3 -O0 -gdwarf-5
> 
> clang create two entries that refer to the same file but are of in a different
> form.
> 
>     include_directories[  0] = "/home/simark/build/binutils-gdb-one-target/gdb"
>     include_directories[  1] = "."
>     file_names[  0]:
>                name: "test.c"
>           dir_index: 0
>     file_names[  1]:
>                name: "test.c"
>           dir_index: 1
> 
> The first file name produces a path without a "." while the second does.
> This is not caught by watch_main_source_file_lossage, because of
> dwarf_decode_lines that creates a symtab for each file entry in the line
> table.  It therefore appears as "non-empty" to
> watch_main_source_file_lossage.  This results in two symtabs:
> 
>     (gdb) maintenance info symtabs
>     { objfile /home/simark/build/binutils-gdb-one-target/gdb/a.out ((struct objfile *) 0x613000005d00)
>       { ((struct compunit_symtab *) 0x62100011aca0)
>         debugformat DWARF 5
>         producer clang version 13.0.1
>         name test.c
>         dirname /home/simark/build/binutils-gdb-one-target/gdb
>         blockvector ((struct blockvector *) 0x621000129ec0)
>         user ((struct compunit_symtab *) (null))
>             { symtab test.c ((struct symtab *) 0x62100011ad20)
>               fullname (null)
>               linetable ((struct linetable *) 0x0)
>             }
>             { symtab ./test.c ((struct symtab *) 0x62100011ad60)
>               fullname (null)
>               linetable ((struct linetable *) 0x621000129ef0)
>             }
>       }
>     }
> 
> I am not sure what is the consequence of this, but this is also what
> happens before my patch, so I think its acceptable to leave it as-is.
> 
> To handle these two cases nicely, I think we will need a function that
> removes the unnecessary "." from path names, something that can be done
> later.
> 
> Change-Id: I6fb4f10de040602779da07a0934b7078701a1856

  reply	other threads:[~2022-04-28 23:53 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-28  3:35 [PATCH v3 0/7] Fix printing macros Simon Marchi
2022-04-28  3:35 ` [PATCH v3 1/7] gdb: introduce symtab_create_debug_printf Simon Marchi
2022-04-28 15:49   ` Tom Tromey
2022-04-28  3:35 ` [PATCH v3 2/7] gdb: add debug prints in buildsym.c Simon Marchi
2022-04-28 15:50   ` Tom Tromey
2022-04-28  3:35 ` [PATCH v3 3/7] gdb/dwarf: pass compilation directory to line header Simon Marchi
2022-04-28 15:48   ` Tom Tromey
2022-04-28 15:59     ` Simon Marchi
2022-04-28  3:35 ` [PATCH v3 4/7] gdb/dwarf: pass a file_entry to line_header::file_file_name Simon Marchi
2022-05-03 20:12   ` Bruno Larsen
2022-07-28 16:26     ` Simon Marchi
2022-04-28  3:35 ` [PATCH v3 5/7] gdb: add "id" fields to identify symtabs and subfiles Simon Marchi
2022-04-28 23:53   ` Lancelot SIX [this message]
2022-07-28 17:46     ` Simon Marchi
2022-04-28  3:35 ` [PATCH v3 6/7] gdb: remove code to prepend comp dir in buildsym_compunit::start_subfile Simon Marchi
2022-05-12 13:07   ` Bruno Larsen
2022-07-28 17:47     ` Simon Marchi
2022-04-28  3:35 ` [PATCH v3 7/7] gdb/testsuite: add macros test for source files compiled in various ways Simon Marchi
2022-05-12 13:17   ` Bruno Larsen
2022-07-28 17:51     ` Simon Marchi
2022-07-30  0:56 ` [PATCH v3 0/7] Fix printing macros Simon Marchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220428235314.elsyab5pk5bufivr@ubuntu.lan \
    --to=lsix@lancelotsix.com \
    --cc=gdb-patches@sourceware.org \
    --cc=simon.marchi@polymtl.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).