From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=VjTX=67=tromey.com=tom@sourceware.org>
Received: from progateway7-pub.mail.pro1.eigbox.com (gproxy5-pub.mail.unifiedlayer.com [67.222.38.55])
	by sourceware.org (Postfix) with ESMTPS id 717FB3858C2C
	for <gdb-patches@sourceware.org>; Tue,  7 Mar 2023 20:20:21 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 717FB3858C2C
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tromey.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tromey.com
Received: from cmgw12.mail.unifiedlayer.com (unknown [10.0.90.127])
	by progateway7.mail.pro1.eigbox.com (Postfix) with ESMTP id E9770100448F1
	for <gdb-patches@sourceware.org>; Tue,  7 Mar 2023 20:20:20 +0000 (UTC)
Received: from box5379.bluehost.com ([162.241.216.53])
	by cmsmtp with ESMTP
	id ZdnIpnMmwbQKPZdnIphwPe; Tue, 07 Mar 2023 20:20:20 +0000
X-Authority-Reason: nr=8
X-Authority-Analysis: v=2.4 cv=Jd15EWGV c=1 sm=1 tr=0 ts=64079c84
 a=ApxJNpeYhEAb1aAlGBBbmA==:117 a=ApxJNpeYhEAb1aAlGBBbmA==:17
 a=dLZJa+xiwSxG16/P+YVxDGlgEgI=:19 a=k__wU0fu6RkA:10:nop_rcvd_month_year
 a=Qbun_eYptAEA:10:endurance_base64_authed_username_1 a=CCpqsmhAAAAA:8
 a=UMRP8rYKxYbHVOyA-gUA:9 a=ul9cdbp4aOFLsgKbc677:22
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tromey.com;
	s=default; h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date:References
	:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID:
	Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
	:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe:
	List-Post:List-Owner:List-Archive;
	bh=s0+VWAQZMMpZKf8xQ/Pmu3ZKIndOI4c9U3QGpY6cfK0=; b=ID9JGDPQYYIt1eiHMPlVM9Vq4L
	Kh9Yj71OV3HZIJL49FGq0BYg4OnlA+M9nGIAWpeWFjMaN3kEphQmFKGx98HOhlrpkXScbirI8LIsE
	VTgyJzWlxDPO2eTKdz2fNUlFz;
Received: from 75-166-130-93.hlrn.qwest.net ([75.166.130.93]:36634 helo=murgatroyd)
	by box5379.bluehost.com with esmtpsa  (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.95)
	(envelope-from <tom@tromey.com>)
	id 1pZdnI-001d4g-L2;
	Tue, 07 Mar 2023 13:20:20 -0700
From: Tom Tromey <tom@tromey.com>
To: Aaron Merey via Gdb-patches <gdb-patches@sourceware.org>
Cc: Aaron Merey <amerey@redhat.com>
Subject: Re: [PATCH 5/7] gdb/debuginfod: Support on-demand debuginfo
 downloading
References: <20230227194212.348003-1-amerey@redhat.com>
	<20230227194212.348003-5-amerey@redhat.com>
X-Attribution:  Tom
Date: Tue, 07 Mar 2023 13:20:18 -0700
In-Reply-To: <20230227194212.348003-5-amerey@redhat.com> (Aaron Merey via
	Gdb-patches's message of "Mon, 27 Feb 2023 14:42:10 -0500")
Message-ID: <87mt4oqpod.fsf@tromey.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - box5379.bluehost.com
X-AntiAbuse: Original Domain - sourceware.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - tromey.com
X-BWhitelist: no
X-Source-IP: 75.166.130.93
X-Source-L: No
X-Exim-ID: 1pZdnI-001d4g-L2
X-Source: 
X-Source-Args: 
X-Source-Dir: 
X-Source-Sender: 75-166-130-93.hlrn.qwest.net (murgatroyd) [75.166.130.93]:36634
X-Source-Auth: tom+tromey.com
X-Email-Count: 1
X-Source-Cap: ZWx5bnJvYmk7ZWx5bnJvYmk7Ym94NTM3OS5ibHVlaG9zdC5jb20=
X-Local-Domain: yes
X-Spam-Status: No, score=-3026.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,GIT_PATCH_0,JMQ_SPF_NEUTRAL,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gdb-patches.sourceware.org>

>>>>> "Aaron" == Aaron Merey via Gdb-patches <gdb-patches@sourceware.org> writes:

Aaron> At the beginning of a session, gdb may attempt to download debuginfo
Aaron> for all shared libraries associated with the process or core file
Aaron> being debugged.  This can be a waste of time and storage space when much
Aaron> of the debuginfo ends up not being used during the session.

Thank you for the patch.

Aaron> +/* See frame.h.  */
Aaron> +
Aaron> +void
Aaron> +clear_comp_unit (struct objfile *objfile)

"comp unit" has a very different meaning in the DWARF code, it's
confusing here.

Maybe a name like "dwarf2_clear_frame_data" would be better.
Though I wonder how this gets populated if there's no frame info.

Aaron> +  bool do_expand_symtabs_matching

I tend to think this isn't needed, see below.

Aaron> +bool
Aaron> +dwarf2_gdb_index::expand_symtabs_matching
Aaron> +    (struct objfile *objfile,
Aaron> +     gdb::function_view<expand_symtabs_file_matcher_ftype> file_matcher,
Aaron> +     const lookup_name_info *lookup_name,
Aaron> +     gdb::function_view<expand_symtabs_symbol_matcher_ftype> symbol_matcher,
Aaron> +     gdb::function_view<expand_symtabs_exp_notify_ftype> expansion_notify,
Aaron> +     block_search_flags search_flags,
Aaron> +     domain_enum domain,
Aaron> +     enum search_domain kind)
Aaron> +{
Aaron> +  if (objfile->flags & OBJF_READNEVER)

Can this even happen here?  If 'readnever' is set, debuginfod simply
should never be queried in the first plae.

Aaron> +  try
Aaron> +    {
Aaron> +      return do_expand_symtabs_matching (objfile, file_matcher, lookup_name,
Aaron> +					 symbol_matcher, expansion_notify,
Aaron> +					 search_flags, domain, kind);
Aaron> +    }
Aaron> +  catch (gdb_exception e)
Aaron> +    {
Aaron> +      if (!objfile->is_separate_index_stub ())
Aaron> +	{
Aaron> +	  exception_print (gdb_stderr, e);
Aaron> +	  return false;
Aaron> +	}

I guess the idea here is to try not to download anything -- maybe the
search will fail and we can skip the downloading entirely.

However, it seems better in this case to integrate it into the inner
loop of expand_symtabs_matching.

Also other methods of dwarf2_gdb_index can result in CU expansion, and
so require all the DWARF.  So maybe some other refactoring is needed,
like some kind of virtual function that is called before any possible
call to dw2_instantiate_symtab along these code paths.  I see some of
this in read.c but it seems like some are missing.

E.g., it seems like calls to dw2_expand_symtabs_matching_one are
unaffected, but should be.

Aaron> +/* See public.h.  */
Aaron> +
Aaron> +bool
Aaron> +dwarf2_has_separate_index (struct objfile *objfile)
Aaron> +{
Aaron> +  if (objfile->flags & OBJF_MAINLINE

What's the reason for testing this flag?

Anyway, gdb style is: (objfile->flags & OBJF_MAINLINE) != 0

Aaron> +    return 0;

bool

Aaron> +      /* We found a separate .gdb_index file so a separate debuginfo file
Aaron> +	 should exist, but don't want to read it until we really have to.
Aaron> +	 Create an objfile to own the index information and act as a
Aaron> +	 placeholder for the debuginfo that we have the option of aquiring
Aaron> +	 later.  */
Aaron> +      gdb_bfd_ref_ptr abfd (gdb_bfd_open (objfile_filename (objfile), gnutarget));
Aaron> +      if (abfd == nullptr)
Aaron> +	return false;
Aaron> +
Aaron> +      dwarf2_per_bfd_objfile_data_key.clear (objfile);
Aaron> +      dwarf2_objfile_data_key.clear (objfile);

This seems suspicious.

Aaron> +      dwarf2_per_bfd *per_bfd;
Aaron> +      dwarf2_per_objfile *per_objfile = get_dwarf2_per_objfile (objfile);
Aaron> +
Aaron> +      /* Perform a limited initialization based to dwarf2_has_info and
Aaron> +	 dwarf2_initialize_objfile.  */
Aaron> +      if (per_objfile == nullptr)
Aaron> +	{
Aaron> +	  per_bfd = dwarf2_per_bfd_objfile_data_key.get (objfile);
Aaron> +	  if (per_bfd == nullptr)
Aaron> +	    {
Aaron> +	      per_bfd = new dwarf2_per_bfd (objfile->obfd.get (), nullptr, false);
Aaron> +	      dwarf2_per_bfd_objfile_data_key.set (objfile, per_bfd);
Aaron> +	    }
Aaron> +	  per_objfile = dwarf2_objfile_data_key.emplace (objfile, objfile, per_bfd);
Aaron> +	}

Changing the objfile seems wrong.
I don't understand why this is needed.

Aaron> +      struct objfile *stub = objfile->separate_debug_objfile;
Aaron> +      per_objfile = get_dwarf2_per_objfile (stub);
Aaron> +      if (per_objfile == nullptr)
Aaron> +	{
Aaron> +	  per_bfd = dwarf2_per_bfd_objfile_data_key.get (stub);
Aaron> +	  if (per_bfd == nullptr)
Aaron> +	    {
Aaron> +	      per_bfd = new dwarf2_per_bfd (stub->obfd.get (), nullptr, false);
Aaron> +	      dwarf2_per_bfd_objfile_data_key.set (stub, per_bfd);
Aaron> +	    }
Aaron> +	  per_objfile = dwarf2_objfile_data_key.emplace (stub, stub, per_bfd);
Aaron> +	}

This also seems suspect.  Shouldn't it have been done when the new stub
objfile was created, like by dwarf2_has_info or something like that?

Aaron> diff --git a/gdb/dwarf2/section.c b/gdb/dwarf2/section.c
Aaron> index c9ef41893ee..ff8ef527c29 100644
Aaron> --- a/gdb/dwarf2/section.c
Aaron> +++ b/gdb/dwarf2/section.c
Aaron> @@ -54,7 +54,9 @@ dwarf2_section_info::get_bfd_owner () const
Aaron>        section = get_containing_section ();
Aaron>        gdb_assert (!section->is_virtual);
Aaron>      }
Aaron> -  gdb_assert (section->s.section != nullptr);
Aaron> +  if (section->s.section == nullptr)
Aaron> +    throw_error (NOT_FOUND_ERROR,
Aaron> +		 _("Can't find owner of DWARF section."));

I think this can just use error, but when is this needed?

Aaron> +  /* Return true if this objfile holds .gdb_index data and represents
Aaron> +     a debuginfo file whose download has been deferred.  */
Aaron> +
Aaron> +  bool is_separate_index_stub ()
Aaron> +  {
Aaron> +    return strcmp (original_name, SEPARATE_INDEX_FILENAME) == 0;
Aaron> +  }

If this is needed then I think it's better to have a new OBJF_ flag
rather than a magic file name.

Aaron> +bool
Aaron> +objfile::reinit (struct bfd *abfd)
Aaron> +{
Aaron> +  try
Aaron> +  {
Aaron> +    deferred_read_symbols (this, abfd);
Aaron> +    return true;
Aaron> +  }
Aaron> +  catch (const gdb_exception &except)
Aaron> +  {
Aaron> +    exception_print (gdb_stderr, except);
Aaron> +    this->flags |= OBJF_READNEVER;

Ok, maybe this answers my readnever question.

Aaron> +void
Aaron> +symbol_file_add_from_index (const gdb_bfd_ref_ptr &bfd,
Aaron> +			    symfile_add_flags symfile_flags,
Aaron> +			    struct objfile *parent)
Aaron> +{
Aaron> +  section_addr_info sap = build_section_addr_info_from_objfile (parent);
Aaron> +
Aaron> +  symbol_file_add_with_addrs
Aaron> +    (bfd, SEPARATE_INDEX_FILENAME, symfile_flags, &sap,
Aaron> +     (parent->flags & (OBJF_REORDERED | OBJF_SHARED | OBJF_READNOW
Aaron> +		      | OBJF_USERLOADED | OBJF_MAINLINE | OBJF_PSYMTABS_READ))
Aaron> +		  | OBJF_NOT_FILENAME, parent, true);
Aaron> +
Aaron> +  objfile *result = parent->separate_debug_objfile;
Aaron> +  init_objfile_sect_indices (result);
Aaron> +
Aaron> +  return;

No need for this.

Aaron> +/* See symtab.h.  */
Aaron> +
Aaron> +void
Aaron> +deferred_read_symbols (struct objfile *objfile, struct bfd *abfd)
Aaron> +{
Aaron> +  const char *obfd_filename;
Aaron> +
Aaron> +  obfd_filename = bfd_get_filename (abfd);
Aaron> +  gdb_bfd_ref_ptr temp (gdb_bfd_open (obfd_filename, gnutarget));
Aaron> +
Aaron> +  if (temp == nullptr)
Aaron> +    error (_("Can't open %s to read symbols."), obfd_filename);
Aaron> +
Aaron> +  if (!bfd_check_format (temp.get (), bfd_object))
Aaron> +    error (_("Can't read symbols from %s: %s."), obfd_filename,
Aaron> +	   bfd_errmsg (bfd_get_error ()));
Aaron> +
Aaron> +  objfile->obfd = std::move (temp);
Aaron> +
Aaron> +  /* Nuke all the state that we will re-read.  */
Aaron> +  objfile->registry_fields.clear_registry ();
Aaron> +  objfile->sections = NULL;
Aaron> +  objfile->sect_index_bss = -1;
Aaron> +  objfile->sect_index_data = -1;
Aaron> +  objfile->sect_index_rodata = -1;
Aaron> +  objfile->sect_index_text = -1;
Aaron> +  objfile->compunit_symtabs = NULL;
Aaron> +  objfile->template_symbols = NULL;
Aaron> +
Aaron> +  objfile_set_sym_fns (objfile, find_sym_fns (objfile->obfd.get ()));
Aaron> +  build_objfile_section_table (objfile);
Aaron> +  (*objfile->sf->sym_init) (objfile);
Aaron> +  init_objfile_sect_indices (objfile);
Aaron> +
Aaron> +  objfile->section_offsets.resize (gdb_bfd_count_sections
Aaron> +				     (objfile->obfd.get ()));
Aaron> +  read_symbols (objfile, 0);
Aaron> +
Aaron> +  objfile->mtime = bfd_get_mtime (objfile->obfd.get ());
Aaron> +  objfile->original_name
Aaron> +    = obstack_strdup (&objfile->objfile_obstack, obfd_filename);

So, gdb already does this kind of thing in reread_symbols, and it's been
a source of bugs.  I wonder if it's possible to just create a new,
non-stub objfile here and splice it in; removing the stub one.

Alternatively, I also don't really understand why this is needed here.
It seems like the reading of the symbols can be done purely in the DWARF
reader.

The idea of the "quick" API is to isolate the core gdb from whatever the
readers do under the hood.  It seems to be that the stub objfile can be
created with the index, and then when the full data is needed, it can
just be read in without involving or notifying any other part of gdb.
This is what's currently done with section data after all, just in this
case it's being downloaded from some server.

One other thing to consider is the sharing of indices.  The DWARF reader
tries pretty hard to share data across inferiors.  However I think this
new code will download the index separately for each inferior.  The full
DWARF will be shared again, due to the BFD cache.

The way this works is that the DWARF reader puts all the shareable data
into an object that is attached to the BFD's registry.  Then when a new
objfile is initialized, it checks the BFD's registry to see if there's
already DWARF info attached.  If so, it's reused.

The index code could maybe attach its own object to the parent BFD to
make this sharing work.

Tom