From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from progateway7-pub.mail.pro1.eigbox.com (gproxy5-pub.mail.unifiedlayer.com [67.222.38.55]) by sourceware.org (Postfix) with ESMTPS id 717FB3858C2C for ; Tue, 7 Mar 2023 20:20:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 717FB3858C2C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tromey.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tromey.com Received: from cmgw12.mail.unifiedlayer.com (unknown [10.0.90.127]) by progateway7.mail.pro1.eigbox.com (Postfix) with ESMTP id E9770100448F1 for ; Tue, 7 Mar 2023 20:20:20 +0000 (UTC) Received: from box5379.bluehost.com ([162.241.216.53]) by cmsmtp with ESMTP id ZdnIpnMmwbQKPZdnIphwPe; Tue, 07 Mar 2023 20:20:20 +0000 X-Authority-Reason: nr=8 X-Authority-Analysis: v=2.4 cv=Jd15EWGV c=1 sm=1 tr=0 ts=64079c84 a=ApxJNpeYhEAb1aAlGBBbmA==:117 a=ApxJNpeYhEAb1aAlGBBbmA==:17 a=dLZJa+xiwSxG16/P+YVxDGlgEgI=:19 a=k__wU0fu6RkA:10:nop_rcvd_month_year a=Qbun_eYptAEA:10:endurance_base64_authed_username_1 a=CCpqsmhAAAAA:8 a=UMRP8rYKxYbHVOyA-gUA:9 a=ul9cdbp4aOFLsgKbc677:22 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tromey.com; s=default; h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date:References :Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=s0+VWAQZMMpZKf8xQ/Pmu3ZKIndOI4c9U3QGpY6cfK0=; b=ID9JGDPQYYIt1eiHMPlVM9Vq4L Kh9Yj71OV3HZIJL49FGq0BYg4OnlA+M9nGIAWpeWFjMaN3kEphQmFKGx98HOhlrpkXScbirI8LIsE VTgyJzWlxDPO2eTKdz2fNUlFz; Received: from 75-166-130-93.hlrn.qwest.net ([75.166.130.93]:36634 helo=murgatroyd) by box5379.bluehost.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1pZdnI-001d4g-L2; Tue, 07 Mar 2023 13:20:20 -0700 From: Tom Tromey To: Aaron Merey via Gdb-patches Cc: Aaron Merey Subject: Re: [PATCH 5/7] gdb/debuginfod: Support on-demand debuginfo downloading References: <20230227194212.348003-1-amerey@redhat.com> <20230227194212.348003-5-amerey@redhat.com> X-Attribution: Tom Date: Tue, 07 Mar 2023 13:20:18 -0700 In-Reply-To: <20230227194212.348003-5-amerey@redhat.com> (Aaron Merey via Gdb-patches's message of "Mon, 27 Feb 2023 14:42:10 -0500") Message-ID: <87mt4oqpod.fsf@tromey.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - box5379.bluehost.com X-AntiAbuse: Original Domain - sourceware.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - tromey.com X-BWhitelist: no X-Source-IP: 75.166.130.93 X-Source-L: No X-Exim-ID: 1pZdnI-001d4g-L2 X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: 75-166-130-93.hlrn.qwest.net (murgatroyd) [75.166.130.93]:36634 X-Source-Auth: tom+tromey.com X-Email-Count: 1 X-Source-Cap: ZWx5bnJvYmk7ZWx5bnJvYmk7Ym94NTM3OS5ibHVlaG9zdC5jb20= X-Local-Domain: yes X-Spam-Status: No, score=-3026.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,GIT_PATCH_0,JMQ_SPF_NEUTRAL,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: >>>>> "Aaron" == Aaron Merey via Gdb-patches writes: Aaron> At the beginning of a session, gdb may attempt to download debuginfo Aaron> for all shared libraries associated with the process or core file Aaron> being debugged. This can be a waste of time and storage space when much Aaron> of the debuginfo ends up not being used during the session. Thank you for the patch. Aaron> +/* See frame.h. */ Aaron> + Aaron> +void Aaron> +clear_comp_unit (struct objfile *objfile) "comp unit" has a very different meaning in the DWARF code, it's confusing here. Maybe a name like "dwarf2_clear_frame_data" would be better. Though I wonder how this gets populated if there's no frame info. Aaron> + bool do_expand_symtabs_matching I tend to think this isn't needed, see below. Aaron> +bool Aaron> +dwarf2_gdb_index::expand_symtabs_matching Aaron> + (struct objfile *objfile, Aaron> + gdb::function_view file_matcher, Aaron> + const lookup_name_info *lookup_name, Aaron> + gdb::function_view symbol_matcher, Aaron> + gdb::function_view expansion_notify, Aaron> + block_search_flags search_flags, Aaron> + domain_enum domain, Aaron> + enum search_domain kind) Aaron> +{ Aaron> + if (objfile->flags & OBJF_READNEVER) Can this even happen here? If 'readnever' is set, debuginfod simply should never be queried in the first plae. Aaron> + try Aaron> + { Aaron> + return do_expand_symtabs_matching (objfile, file_matcher, lookup_name, Aaron> + symbol_matcher, expansion_notify, Aaron> + search_flags, domain, kind); Aaron> + } Aaron> + catch (gdb_exception e) Aaron> + { Aaron> + if (!objfile->is_separate_index_stub ()) Aaron> + { Aaron> + exception_print (gdb_stderr, e); Aaron> + return false; Aaron> + } I guess the idea here is to try not to download anything -- maybe the search will fail and we can skip the downloading entirely. However, it seems better in this case to integrate it into the inner loop of expand_symtabs_matching. Also other methods of dwarf2_gdb_index can result in CU expansion, and so require all the DWARF. So maybe some other refactoring is needed, like some kind of virtual function that is called before any possible call to dw2_instantiate_symtab along these code paths. I see some of this in read.c but it seems like some are missing. E.g., it seems like calls to dw2_expand_symtabs_matching_one are unaffected, but should be. Aaron> +/* See public.h. */ Aaron> + Aaron> +bool Aaron> +dwarf2_has_separate_index (struct objfile *objfile) Aaron> +{ Aaron> + if (objfile->flags & OBJF_MAINLINE What's the reason for testing this flag? Anyway, gdb style is: (objfile->flags & OBJF_MAINLINE) != 0 Aaron> + return 0; bool Aaron> + /* We found a separate .gdb_index file so a separate debuginfo file Aaron> + should exist, but don't want to read it until we really have to. Aaron> + Create an objfile to own the index information and act as a Aaron> + placeholder for the debuginfo that we have the option of aquiring Aaron> + later. */ Aaron> + gdb_bfd_ref_ptr abfd (gdb_bfd_open (objfile_filename (objfile), gnutarget)); Aaron> + if (abfd == nullptr) Aaron> + return false; Aaron> + Aaron> + dwarf2_per_bfd_objfile_data_key.clear (objfile); Aaron> + dwarf2_objfile_data_key.clear (objfile); This seems suspicious. Aaron> + dwarf2_per_bfd *per_bfd; Aaron> + dwarf2_per_objfile *per_objfile = get_dwarf2_per_objfile (objfile); Aaron> + Aaron> + /* Perform a limited initialization based to dwarf2_has_info and Aaron> + dwarf2_initialize_objfile. */ Aaron> + if (per_objfile == nullptr) Aaron> + { Aaron> + per_bfd = dwarf2_per_bfd_objfile_data_key.get (objfile); Aaron> + if (per_bfd == nullptr) Aaron> + { Aaron> + per_bfd = new dwarf2_per_bfd (objfile->obfd.get (), nullptr, false); Aaron> + dwarf2_per_bfd_objfile_data_key.set (objfile, per_bfd); Aaron> + } Aaron> + per_objfile = dwarf2_objfile_data_key.emplace (objfile, objfile, per_bfd); Aaron> + } Changing the objfile seems wrong. I don't understand why this is needed. Aaron> + struct objfile *stub = objfile->separate_debug_objfile; Aaron> + per_objfile = get_dwarf2_per_objfile (stub); Aaron> + if (per_objfile == nullptr) Aaron> + { Aaron> + per_bfd = dwarf2_per_bfd_objfile_data_key.get (stub); Aaron> + if (per_bfd == nullptr) Aaron> + { Aaron> + per_bfd = new dwarf2_per_bfd (stub->obfd.get (), nullptr, false); Aaron> + dwarf2_per_bfd_objfile_data_key.set (stub, per_bfd); Aaron> + } Aaron> + per_objfile = dwarf2_objfile_data_key.emplace (stub, stub, per_bfd); Aaron> + } This also seems suspect. Shouldn't it have been done when the new stub objfile was created, like by dwarf2_has_info or something like that? Aaron> diff --git a/gdb/dwarf2/section.c b/gdb/dwarf2/section.c Aaron> index c9ef41893ee..ff8ef527c29 100644 Aaron> --- a/gdb/dwarf2/section.c Aaron> +++ b/gdb/dwarf2/section.c Aaron> @@ -54,7 +54,9 @@ dwarf2_section_info::get_bfd_owner () const Aaron> section = get_containing_section (); Aaron> gdb_assert (!section->is_virtual); Aaron> } Aaron> - gdb_assert (section->s.section != nullptr); Aaron> + if (section->s.section == nullptr) Aaron> + throw_error (NOT_FOUND_ERROR, Aaron> + _("Can't find owner of DWARF section.")); I think this can just use error, but when is this needed? Aaron> + /* Return true if this objfile holds .gdb_index data and represents Aaron> + a debuginfo file whose download has been deferred. */ Aaron> + Aaron> + bool is_separate_index_stub () Aaron> + { Aaron> + return strcmp (original_name, SEPARATE_INDEX_FILENAME) == 0; Aaron> + } If this is needed then I think it's better to have a new OBJF_ flag rather than a magic file name. Aaron> +bool Aaron> +objfile::reinit (struct bfd *abfd) Aaron> +{ Aaron> + try Aaron> + { Aaron> + deferred_read_symbols (this, abfd); Aaron> + return true; Aaron> + } Aaron> + catch (const gdb_exception &except) Aaron> + { Aaron> + exception_print (gdb_stderr, except); Aaron> + this->flags |= OBJF_READNEVER; Ok, maybe this answers my readnever question. Aaron> +void Aaron> +symbol_file_add_from_index (const gdb_bfd_ref_ptr &bfd, Aaron> + symfile_add_flags symfile_flags, Aaron> + struct objfile *parent) Aaron> +{ Aaron> + section_addr_info sap = build_section_addr_info_from_objfile (parent); Aaron> + Aaron> + symbol_file_add_with_addrs Aaron> + (bfd, SEPARATE_INDEX_FILENAME, symfile_flags, &sap, Aaron> + (parent->flags & (OBJF_REORDERED | OBJF_SHARED | OBJF_READNOW Aaron> + | OBJF_USERLOADED | OBJF_MAINLINE | OBJF_PSYMTABS_READ)) Aaron> + | OBJF_NOT_FILENAME, parent, true); Aaron> + Aaron> + objfile *result = parent->separate_debug_objfile; Aaron> + init_objfile_sect_indices (result); Aaron> + Aaron> + return; No need for this. Aaron> +/* See symtab.h. */ Aaron> + Aaron> +void Aaron> +deferred_read_symbols (struct objfile *objfile, struct bfd *abfd) Aaron> +{ Aaron> + const char *obfd_filename; Aaron> + Aaron> + obfd_filename = bfd_get_filename (abfd); Aaron> + gdb_bfd_ref_ptr temp (gdb_bfd_open (obfd_filename, gnutarget)); Aaron> + Aaron> + if (temp == nullptr) Aaron> + error (_("Can't open %s to read symbols."), obfd_filename); Aaron> + Aaron> + if (!bfd_check_format (temp.get (), bfd_object)) Aaron> + error (_("Can't read symbols from %s: %s."), obfd_filename, Aaron> + bfd_errmsg (bfd_get_error ())); Aaron> + Aaron> + objfile->obfd = std::move (temp); Aaron> + Aaron> + /* Nuke all the state that we will re-read. */ Aaron> + objfile->registry_fields.clear_registry (); Aaron> + objfile->sections = NULL; Aaron> + objfile->sect_index_bss = -1; Aaron> + objfile->sect_index_data = -1; Aaron> + objfile->sect_index_rodata = -1; Aaron> + objfile->sect_index_text = -1; Aaron> + objfile->compunit_symtabs = NULL; Aaron> + objfile->template_symbols = NULL; Aaron> + Aaron> + objfile_set_sym_fns (objfile, find_sym_fns (objfile->obfd.get ())); Aaron> + build_objfile_section_table (objfile); Aaron> + (*objfile->sf->sym_init) (objfile); Aaron> + init_objfile_sect_indices (objfile); Aaron> + Aaron> + objfile->section_offsets.resize (gdb_bfd_count_sections Aaron> + (objfile->obfd.get ())); Aaron> + read_symbols (objfile, 0); Aaron> + Aaron> + objfile->mtime = bfd_get_mtime (objfile->obfd.get ()); Aaron> + objfile->original_name Aaron> + = obstack_strdup (&objfile->objfile_obstack, obfd_filename); So, gdb already does this kind of thing in reread_symbols, and it's been a source of bugs. I wonder if it's possible to just create a new, non-stub objfile here and splice it in; removing the stub one. Alternatively, I also don't really understand why this is needed here. It seems like the reading of the symbols can be done purely in the DWARF reader. The idea of the "quick" API is to isolate the core gdb from whatever the readers do under the hood. It seems to be that the stub objfile can be created with the index, and then when the full data is needed, it can just be read in without involving or notifying any other part of gdb. This is what's currently done with section data after all, just in this case it's being downloaded from some server. One other thing to consider is the sharing of indices. The DWARF reader tries pretty hard to share data across inferiors. However I think this new code will download the index separately for each inferior. The full DWARF will be shared again, due to the BFD cache. The way this works is that the DWARF reader puts all the shareable data into an object that is attached to the BFD's registry. Then when a new objfile is initialized, it checks the BFD's registry to see if there's already DWARF info attached. If so, it's reused. The index code could maybe attach its own object to the parent BFD to make this sharing work. Tom