public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Tom Tromey <tom@tromey.com>
To: Aaron Merey via Gdb-patches <gdb-patches@sourceware.org>
Cc: Aaron Merey <amerey@redhat.com>
Subject: Re: [PATCH 5/7] gdb/debuginfod: Support on-demand debuginfo downloading
Date: Tue, 07 Mar 2023 13:20:18 -0700	[thread overview]
Message-ID: <87mt4oqpod.fsf@tromey.com> (raw)
In-Reply-To: <20230227194212.348003-5-amerey@redhat.com> (Aaron Merey via Gdb-patches's message of "Mon, 27 Feb 2023 14:42:10 -0500")

>>>>> "Aaron" == Aaron Merey via Gdb-patches <gdb-patches@sourceware.org> writes:

Aaron> At the beginning of a session, gdb may attempt to download debuginfo
Aaron> for all shared libraries associated with the process or core file
Aaron> being debugged.  This can be a waste of time and storage space when much
Aaron> of the debuginfo ends up not being used during the session.

Thank you for the patch.

Aaron> +/* See frame.h.  */
Aaron> +
Aaron> +void
Aaron> +clear_comp_unit (struct objfile *objfile)

"comp unit" has a very different meaning in the DWARF code, it's
confusing here.

Maybe a name like "dwarf2_clear_frame_data" would be better.
Though I wonder how this gets populated if there's no frame info.

Aaron> +  bool do_expand_symtabs_matching

I tend to think this isn't needed, see below.

Aaron> +bool
Aaron> +dwarf2_gdb_index::expand_symtabs_matching
Aaron> +    (struct objfile *objfile,
Aaron> +     gdb::function_view<expand_symtabs_file_matcher_ftype> file_matcher,
Aaron> +     const lookup_name_info *lookup_name,
Aaron> +     gdb::function_view<expand_symtabs_symbol_matcher_ftype> symbol_matcher,
Aaron> +     gdb::function_view<expand_symtabs_exp_notify_ftype> expansion_notify,
Aaron> +     block_search_flags search_flags,
Aaron> +     domain_enum domain,
Aaron> +     enum search_domain kind)
Aaron> +{
Aaron> +  if (objfile->flags & OBJF_READNEVER)

Can this even happen here?  If 'readnever' is set, debuginfod simply
should never be queried in the first plae.

Aaron> +  try
Aaron> +    {
Aaron> +      return do_expand_symtabs_matching (objfile, file_matcher, lookup_name,
Aaron> +					 symbol_matcher, expansion_notify,
Aaron> +					 search_flags, domain, kind);
Aaron> +    }
Aaron> +  catch (gdb_exception e)
Aaron> +    {
Aaron> +      if (!objfile->is_separate_index_stub ())
Aaron> +	{
Aaron> +	  exception_print (gdb_stderr, e);
Aaron> +	  return false;
Aaron> +	}

I guess the idea here is to try not to download anything -- maybe the
search will fail and we can skip the downloading entirely.

However, it seems better in this case to integrate it into the inner
loop of expand_symtabs_matching.

Also other methods of dwarf2_gdb_index can result in CU expansion, and
so require all the DWARF.  So maybe some other refactoring is needed,
like some kind of virtual function that is called before any possible
call to dw2_instantiate_symtab along these code paths.  I see some of
this in read.c but it seems like some are missing.

E.g., it seems like calls to dw2_expand_symtabs_matching_one are
unaffected, but should be.

Aaron> +/* See public.h.  */
Aaron> +
Aaron> +bool
Aaron> +dwarf2_has_separate_index (struct objfile *objfile)
Aaron> +{
Aaron> +  if (objfile->flags & OBJF_MAINLINE

What's the reason for testing this flag?

Anyway, gdb style is: (objfile->flags & OBJF_MAINLINE) != 0

Aaron> +    return 0;

bool

Aaron> +      /* We found a separate .gdb_index file so a separate debuginfo file
Aaron> +	 should exist, but don't want to read it until we really have to.
Aaron> +	 Create an objfile to own the index information and act as a
Aaron> +	 placeholder for the debuginfo that we have the option of aquiring
Aaron> +	 later.  */
Aaron> +      gdb_bfd_ref_ptr abfd (gdb_bfd_open (objfile_filename (objfile), gnutarget));
Aaron> +      if (abfd == nullptr)
Aaron> +	return false;
Aaron> +
Aaron> +      dwarf2_per_bfd_objfile_data_key.clear (objfile);
Aaron> +      dwarf2_objfile_data_key.clear (objfile);

This seems suspicious.

Aaron> +      dwarf2_per_bfd *per_bfd;
Aaron> +      dwarf2_per_objfile *per_objfile = get_dwarf2_per_objfile (objfile);
Aaron> +
Aaron> +      /* Perform a limited initialization based to dwarf2_has_info and
Aaron> +	 dwarf2_initialize_objfile.  */
Aaron> +      if (per_objfile == nullptr)
Aaron> +	{
Aaron> +	  per_bfd = dwarf2_per_bfd_objfile_data_key.get (objfile);
Aaron> +	  if (per_bfd == nullptr)
Aaron> +	    {
Aaron> +	      per_bfd = new dwarf2_per_bfd (objfile->obfd.get (), nullptr, false);
Aaron> +	      dwarf2_per_bfd_objfile_data_key.set (objfile, per_bfd);
Aaron> +	    }
Aaron> +	  per_objfile = dwarf2_objfile_data_key.emplace (objfile, objfile, per_bfd);
Aaron> +	}

Changing the objfile seems wrong.
I don't understand why this is needed.

Aaron> +      struct objfile *stub = objfile->separate_debug_objfile;
Aaron> +      per_objfile = get_dwarf2_per_objfile (stub);
Aaron> +      if (per_objfile == nullptr)
Aaron> +	{
Aaron> +	  per_bfd = dwarf2_per_bfd_objfile_data_key.get (stub);
Aaron> +	  if (per_bfd == nullptr)
Aaron> +	    {
Aaron> +	      per_bfd = new dwarf2_per_bfd (stub->obfd.get (), nullptr, false);
Aaron> +	      dwarf2_per_bfd_objfile_data_key.set (stub, per_bfd);
Aaron> +	    }
Aaron> +	  per_objfile = dwarf2_objfile_data_key.emplace (stub, stub, per_bfd);
Aaron> +	}

This also seems suspect.  Shouldn't it have been done when the new stub
objfile was created, like by dwarf2_has_info or something like that?

Aaron> diff --git a/gdb/dwarf2/section.c b/gdb/dwarf2/section.c
Aaron> index c9ef41893ee..ff8ef527c29 100644
Aaron> --- a/gdb/dwarf2/section.c
Aaron> +++ b/gdb/dwarf2/section.c
Aaron> @@ -54,7 +54,9 @@ dwarf2_section_info::get_bfd_owner () const
Aaron>        section = get_containing_section ();
Aaron>        gdb_assert (!section->is_virtual);
Aaron>      }
Aaron> -  gdb_assert (section->s.section != nullptr);
Aaron> +  if (section->s.section == nullptr)
Aaron> +    throw_error (NOT_FOUND_ERROR,
Aaron> +		 _("Can't find owner of DWARF section."));

I think this can just use error, but when is this needed?

Aaron> +  /* Return true if this objfile holds .gdb_index data and represents
Aaron> +     a debuginfo file whose download has been deferred.  */
Aaron> +
Aaron> +  bool is_separate_index_stub ()
Aaron> +  {
Aaron> +    return strcmp (original_name, SEPARATE_INDEX_FILENAME) == 0;
Aaron> +  }

If this is needed then I think it's better to have a new OBJF_ flag
rather than a magic file name.

Aaron> +bool
Aaron> +objfile::reinit (struct bfd *abfd)
Aaron> +{
Aaron> +  try
Aaron> +  {
Aaron> +    deferred_read_symbols (this, abfd);
Aaron> +    return true;
Aaron> +  }
Aaron> +  catch (const gdb_exception &except)
Aaron> +  {
Aaron> +    exception_print (gdb_stderr, except);
Aaron> +    this->flags |= OBJF_READNEVER;

Ok, maybe this answers my readnever question.

Aaron> +void
Aaron> +symbol_file_add_from_index (const gdb_bfd_ref_ptr &bfd,
Aaron> +			    symfile_add_flags symfile_flags,
Aaron> +			    struct objfile *parent)
Aaron> +{
Aaron> +  section_addr_info sap = build_section_addr_info_from_objfile (parent);
Aaron> +
Aaron> +  symbol_file_add_with_addrs
Aaron> +    (bfd, SEPARATE_INDEX_FILENAME, symfile_flags, &sap,
Aaron> +     (parent->flags & (OBJF_REORDERED | OBJF_SHARED | OBJF_READNOW
Aaron> +		      | OBJF_USERLOADED | OBJF_MAINLINE | OBJF_PSYMTABS_READ))
Aaron> +		  | OBJF_NOT_FILENAME, parent, true);
Aaron> +
Aaron> +  objfile *result = parent->separate_debug_objfile;
Aaron> +  init_objfile_sect_indices (result);
Aaron> +
Aaron> +  return;

No need for this.

Aaron> +/* See symtab.h.  */
Aaron> +
Aaron> +void
Aaron> +deferred_read_symbols (struct objfile *objfile, struct bfd *abfd)
Aaron> +{
Aaron> +  const char *obfd_filename;
Aaron> +
Aaron> +  obfd_filename = bfd_get_filename (abfd);
Aaron> +  gdb_bfd_ref_ptr temp (gdb_bfd_open (obfd_filename, gnutarget));
Aaron> +
Aaron> +  if (temp == nullptr)
Aaron> +    error (_("Can't open %s to read symbols."), obfd_filename);
Aaron> +
Aaron> +  if (!bfd_check_format (temp.get (), bfd_object))
Aaron> +    error (_("Can't read symbols from %s: %s."), obfd_filename,
Aaron> +	   bfd_errmsg (bfd_get_error ()));
Aaron> +
Aaron> +  objfile->obfd = std::move (temp);
Aaron> +
Aaron> +  /* Nuke all the state that we will re-read.  */
Aaron> +  objfile->registry_fields.clear_registry ();
Aaron> +  objfile->sections = NULL;
Aaron> +  objfile->sect_index_bss = -1;
Aaron> +  objfile->sect_index_data = -1;
Aaron> +  objfile->sect_index_rodata = -1;
Aaron> +  objfile->sect_index_text = -1;
Aaron> +  objfile->compunit_symtabs = NULL;
Aaron> +  objfile->template_symbols = NULL;
Aaron> +
Aaron> +  objfile_set_sym_fns (objfile, find_sym_fns (objfile->obfd.get ()));
Aaron> +  build_objfile_section_table (objfile);
Aaron> +  (*objfile->sf->sym_init) (objfile);
Aaron> +  init_objfile_sect_indices (objfile);
Aaron> +
Aaron> +  objfile->section_offsets.resize (gdb_bfd_count_sections
Aaron> +				     (objfile->obfd.get ()));
Aaron> +  read_symbols (objfile, 0);
Aaron> +
Aaron> +  objfile->mtime = bfd_get_mtime (objfile->obfd.get ());
Aaron> +  objfile->original_name
Aaron> +    = obstack_strdup (&objfile->objfile_obstack, obfd_filename);

So, gdb already does this kind of thing in reread_symbols, and it's been
a source of bugs.  I wonder if it's possible to just create a new,
non-stub objfile here and splice it in; removing the stub one.

Alternatively, I also don't really understand why this is needed here.
It seems like the reading of the symbols can be done purely in the DWARF
reader.

The idea of the "quick" API is to isolate the core gdb from whatever the
readers do under the hood.  It seems to be that the stub objfile can be
created with the index, and then when the full data is needed, it can
just be read in without involving or notifying any other part of gdb.
This is what's currently done with section data after all, just in this
case it's being downloaded from some server.

One other thing to consider is the sharing of indices.  The DWARF reader
tries pretty hard to share data across inferiors.  However I think this
new code will download the index separately for each inferior.  The full
DWARF will be shared again, due to the BFD cache.

The way this works is that the DWARF reader puts all the shareable data
into an object that is attached to the BFD's registry.  Then when a new
objfile is initialized, it checks the BFD's registry to see if there's
already DWARF info attached.  If so, it's reused.

The index code could maybe attach its own object to the parent BFD to
make this sharing work.

Tom

  reply	other threads:[~2023-03-07 20:20 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-27 19:42 [PATCH 1/7] gdb/debuginfod: Add debuginfod_section_query Aaron Merey
2023-02-27 19:42 ` [PATCH 2/7] gdb: add 'lazy' setting for command 'set debuginfod enabled' Aaron Merey
2023-02-27 19:54   ` Eli Zaretskii
2023-05-24  9:31   ` Andrew Burgess
2023-02-27 19:42 ` [PATCH 3/7] gdb/debuginfod: disable pagination during downloads Aaron Merey
2023-03-03 21:33   ` Tom Tromey
2023-03-06 23:07     ` Aaron Merey
2023-05-24  9:38   ` Andrew Burgess
2023-05-24 18:57     ` Aaron Merey
2023-02-27 19:42 ` [PATCH 4/7] gdb/ui-file: Add newline tracking Aaron Merey
2023-03-07 19:33   ` Tom Tromey
2023-03-07 20:30     ` Aaron Merey
2023-03-07 20:47       ` Tom Tromey
2023-02-27 19:42 ` [PATCH 5/7] gdb/debuginfod: Support on-demand debuginfo downloading Aaron Merey
2023-03-07 20:20   ` Tom Tromey [this message]
2023-03-09  0:22     ` Aaron Merey
2023-02-27 19:42 ` [PATCH 6/7] gdb/testsuite/gdb.debuginfod: Add lazy downloading tests Aaron Merey
2023-05-02 15:48   ` Andrew Burgess
2023-05-02 16:24     ` Aaron Merey
2023-05-24 10:12   ` Andrew Burgess
2023-02-27 19:42 ` [PATCH 7/7] gdb/debuginfod: Add .debug_line downloading Aaron Merey
2023-03-07 20:36   ` Tom Tromey
2023-03-09  0:26     ` Aaron Merey
2023-02-28 11:11 ` [PATCH 1/7] gdb/debuginfod: Add debuginfod_section_query Alexandra Petlanova Hajkova
2023-05-24  9:01 ` Andrew Burgess

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mt4oqpod.fsf@tromey.com \
    --to=tom@tromey.com \
    --cc=amerey@redhat.com \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).