From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 9B8633858412 for ; Wed, 13 Sep 2023 07:09:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9B8633858412 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id BA049218E8; Wed, 13 Sep 2023 07:09:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1694588962; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3HOJiXkVkCw+8bpuo17cQhK/aD3xJa8bg3u25a4Vqhc=; b=GrcYCUWeFa4cJkVt29/0VCoPSmZ/6XNnx5vy7ofkGlFqhEOLyYc5b+8//yw7pIAzE8ar8N jBm51l0LgdPPWeRHJu2QGsycdR/jfukRY7+RAHBtBvEiKFPm721RVkf3IlbythHBZmMyZn xMRTgx/cPSds/I5qnzfLLzaP/XcKmiY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1694588962; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3HOJiXkVkCw+8bpuo17cQhK/aD3xJa8bg3u25a4Vqhc=; b=rz6YJShjjtL0T05sEASDZjyfwTcyEszUh67fl1IszpuabBg8axoEry5vSGjrljIKAq2jJh ZmW8dj/KS6rH70Aw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 971DF13582; Wed, 13 Sep 2023 07:09:22 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id PDZZIyJgAWUoewAAMHmgww (envelope-from ); Wed, 13 Sep 2023 07:09:22 +0000 Message-ID: Date: Wed, 13 Sep 2023 09:09:12 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3] Add name_of_main and language_of_main to the DWARF index Content-Language: en-US To: Matheus Branco Borella , gdb-patches@sourceware.org Cc: tom@tromey.com References: <20230608214012.1561-1-dark.ryu.550@gmail.com> <20230811182131.492-1-dark.ryu.550@gmail.com> From: Tom de Vries In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 8/14/23 09:31, Tom de Vries via Gdb-patches wrote: > On 8/11/23 20:21, Matheus Branco Borella wrote: >> This should hopefully be the final version. Has a more descriptive >> entry in the >> NEWS file, but is otherwise the same as v2. I believe the >> misunderstanding with >> my contributor status should have also been sorted out? >> > > Hi, > > As per Eli's last comment in this thread, that's indeed the case. > >> --- >> This patch adds a new section to the DWARF index containing the name >> and the language of the main function symbol, gathered from >> `cooked_index::get_main`, if available. Currently, for lack of a >> better name, >> this section is called the "shortcut table". The way this name is both >> saved and >> applied upon an index being loaded in mirrors how it is done in >> `cooked_index_functions`, more specifically, the full name of the main >> function >> symbol is saved and `set_objfile_main_name` is used to apply it after >> it is >> loaded. >> >> The main use case for this patch is in improving startup times when >> dealing with >> large binaries. Currently, when an index is used, GDB has to expand >> symtabs >> until it finds out what the language of the main function symbol is. >> For some >> large executables, this may take a considerable amount of time to >> complete, >> slowing down startup. This patch bypasses that operation by having >> both the name >> and language of the main function symbol be provided ahead of time by >> the index. >> >> In my testing (a binary with about 1.8GB worth of DWARF data) this >> change brings >> startup time down from about 34 seconds to about 1.5 seconds. >> > > I've reviewed the patch, and found a few nits. > > There are two spots that look unintentional to me, one adding and one > removing an empty line.  You might want to remove those. > > I found this bit: > ... > +      lang = dwarf_lang_to_enum_language (attr->constant_value (0)); > +      dw_lang = (dwarf_source_language)attr->constant_value (0); > ... > > I would reformulate as: > ... > +      dw_lang = (dwarf_source_language)attr->constant_value (0); > +      lang = dwarf_lang_to_enum_language (dw_lang); > ... > but I don't feel strongly about that. > > I've looked over Tom Tromey's comments earlier in the thread, and I > think they all have been addresses.  Furthermore, Eli already approved > the documentation part, so I'd say: approved, but wait for one week in > case somebody else has other comments. > > Approved-By: Tom de Vries > Hi, any update on this? Thanks, - Tom > Thanks, > - Tom > >> PR symtab/24549 >> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=24549 >> --- >>   gdb/NEWS                    |  3 ++ >>   gdb/doc/gdb.texinfo         | 23 +++++++++++++-- >>   gdb/dwarf2/index-write.c    | 47 +++++++++++++++++++++++++++---- >>   gdb/dwarf2/read-gdb-index.c | 56 +++++++++++++++++++++++++++++++++++-- >>   gdb/dwarf2/read.c           | 13 +++++++-- >>   gdb/dwarf2/read.h           | 12 ++++++++ >>   6 files changed, 143 insertions(+), 11 deletions(-) >> >> diff --git a/gdb/NEWS b/gdb/NEWS >> index d97e3c15a8..ac455f39f2 100644 >> --- a/gdb/NEWS >> +++ b/gdb/NEWS >> @@ -3,6 +3,9 @@ >>   *** Changes since GDB 13 >> +* GDB index now contains information about the main function. This >> speeds up >> +  startup when it is being used for some large binaries. >> + >>   * The AArch64 'org.gnu.gdb.aarch64.pauth' Pointer Authentication >> feature string >>     has been deprecated in favor of the 'org.gnu.gdb.aarch64.pauth_v2' >> feature >>     string. >> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo >> index d1059e0cb7..3b2fdcd19e 100644 >> --- a/gdb/doc/gdb.texinfo >> +++ b/gdb/doc/gdb.texinfo >> @@ -49093,13 +49093,14 @@ unless otherwise noted: >>   @enumerate >>   @item >> -The version number, currently 8.  Versions 1, 2 and 3 are obsolete. >> +The version number, currently 9.  Versions 1, 2 and 3 are obsolete. >>   Version 4 uses a different hashing function from versions 5 and 6. >>   Version 6 includes symbols for inlined functions, whereas versions 4 >>   and 5 do not.  Version 7 adds attributes to the CU indices in the >>   symbol table.  Version 8 specifies that symbols from DWARF type units >>   (@samp{DW_TAG_type_unit}) refer to the type unit's symbol table and >> not the >> -compilation unit (@samp{DW_TAG_comp_unit}) using the type. >> +compilation unit (@samp{DW_TAG_comp_unit}) using the type.  Version 9 >> adds >> +the name and the language of the main function to the index. >>   @value{GDBN} will only read version 4, 5, or 6 indices >>   by specifying @code{set use-deprecated-index-sections on}. >> @@ -49120,6 +49121,9 @@ The offset, from the start of the file, of the >> address area. >>   @item >>   The offset, from the start of the file, of the symbol table. >> +@item >> +The offset, from the start of the file, of the shortcut table. >> + >>   @item >>   The offset, from the start of the file, of the constant pool. >>   @end enumerate >> @@ -49196,6 +49200,21 @@ don't currently have a simple description of >> the canonicalization >>   algorithm; if you intend to create new index sections, you must read >>   the code. >> +@item The shortcut table >> +This is a data structure with the following fields: >> + >> +@table @asis >> +@item Language of main >> +A 32-bit little-endian value indicating the language of the main >> function as a >> +@code{DW_LANG_} constant.  This value will be zero if main function >> information >> +is not present. >> + >> +@item Name of main >> +An @code{offset_type} value indicating the offset of the main >> function's name >> +in the constant pool.  This value must be ignored if the value for >> the language >> +of main is zero. >> +@end table >> + >>   @item >>   The constant pool.  This is simply a bunch of bytes.  It is organized >>   so that alignment is correct: CU vectors are stored first, followed by >> diff --git a/gdb/dwarf2/index-write.c b/gdb/dwarf2/index-write.c >> index 62c2cc6ac7..7117a5184b 100644 >> --- a/gdb/dwarf2/index-write.c >> +++ b/gdb/dwarf2/index-write.c >> @@ -1080,14 +1080,15 @@ write_gdbindex_1 (FILE *out_file, >>             const data_buf &types_cu_list, >>             const data_buf &addr_vec, >>             const data_buf &symtab_vec, >> -          const data_buf &constant_pool) >> +          const data_buf &constant_pool, >> +          const data_buf &shortcuts) >>   { >>     data_buf contents; >> -  const offset_type size_of_header = 6 * sizeof (offset_type); >> +  const offset_type size_of_header = 7 * sizeof (offset_type); >>     offset_type total_len = size_of_header; >>     /* The version number.  */ >> -  contents.append_offset (8); >> +  contents.append_offset (9); >>     /* The offset of the CU list from the start of the file.  */ >>     contents.append_offset (total_len); >> @@ -1105,6 +1106,10 @@ write_gdbindex_1 (FILE *out_file, >>     contents.append_offset (total_len); >>     total_len += symtab_vec.size (); >> +  /* The offset of the shortcut table from the start of the file.  */ >> +  contents.append_offset (total_len); >> +  total_len += shortcuts.size (); >> + >>     /* The offset of the constant pool from the start of the file.  */ >>     contents.append_offset (total_len); >>     total_len += constant_pool.size (); >> @@ -1116,6 +1121,7 @@ write_gdbindex_1 (FILE *out_file, >>     types_cu_list.file_write (out_file); >>     addr_vec.file_write (out_file); >>     symtab_vec.file_write (out_file); >> +  shortcuts.file_write (out_file); >>     constant_pool.file_write (out_file); >>     assert_file_size (out_file, total_len); >> @@ -1193,6 +1199,34 @@ write_cooked_index (cooked_index *table, >>       } >>   } >> +/* Write shortcut information. */ >> + >> +static void >> +write_shortcuts_table (cooked_index *table, data_buf& shortcuts, >> +               data_buf& cpool) >> +{ >> +  const auto main_info = table->get_main (); >> +  size_t main_name_offset = 0; >> +  dwarf_source_language dw_lang = (dwarf_source_language)0; >> + >> +  if (main_info != nullptr) >> +    { >> +      dw_lang = main_info->per_cu->dw_lang; >> + >> +      if (dw_lang != 0) >> +    { >> +      auto_obstack obstack; >> +      const auto main_name = main_info->full_name (&obstack, true); >> + >> +      main_name_offset = cpool.size (); >> +      cpool.append_cstr0 (main_name); >> +    } >> +    } >> + >> +  shortcuts.append_uint (4, BFD_ENDIAN_LITTLE, dw_lang); >> +  shortcuts.append_offset (main_name_offset); >> +} >> + >>   /* Write contents of a .gdb_index section for OBJFILE into OUT_FILE. >>      If OBJFILE has an associated dwz file, write contents of a >> .gdb_index >>      section for that dwz file into DWZ_OUT_FILE.  If OBJFILE does not >> have an >> @@ -1270,11 +1304,14 @@ write_gdbindex (dwarf2_per_bfd *per_bfd, >> cooked_index *table, >>     write_hash_table (&symtab, symtab_vec, constant_pool); >> +  data_buf shortcuts; >> +  write_shortcuts_table (table, shortcuts, constant_pool); >> + >>     write_gdbindex_1(out_file, objfile_cu_list, types_cu_list, addr_vec, >> -           symtab_vec, constant_pool); >> +           symtab_vec, constant_pool, shortcuts); >>     if (dwz_out_file != NULL) >> -    write_gdbindex_1 (dwz_out_file, dwz_cu_list, {}, {}, {}, {}); >> +    write_gdbindex_1 (dwz_out_file, dwz_cu_list, {}, {}, {}, {}, {}); >>     else >>       gdb_assert (dwz_cu_list.empty ()); >>   } >> diff --git a/gdb/dwarf2/read-gdb-index.c b/gdb/dwarf2/read-gdb-index.c >> index 1006386cb2..f09c5ba234 100644 >> --- a/gdb/dwarf2/read-gdb-index.c >> +++ b/gdb/dwarf2/read-gdb-index.c >> @@ -88,6 +88,9 @@ struct mapped_gdb_index final : public >> mapped_index_base >>     /* A pointer to the constant pool.  */ >>     gdb::array_view constant_pool; >> +  /* The shortcut table data. */ >> +  gdb::array_view shortcut_table; >> + >>     /* Return the index into the constant pool of the name of the IDXth >>        symbol in the symbol table.  */ >>     offset_type symbol_name_index (offset_type idx) const >> @@ -166,6 +169,7 @@ dwarf2_gdb_index::dump (struct objfile *objfile) >>     mapped_gdb_index *index = >> (gdb::checked_static_cast >>                    (per_objfile->per_bfd->index_table.get ())); >> + >>     gdb_printf (".gdb_index: version %d\n", index->version); >>     gdb_printf ("\n"); >>   } >> @@ -583,7 +587,7 @@ to use the section anyway."), >>     /* Indexes with higher version than the one supported by GDB may >> be no >>        longer backward compatible.  */ >> -  if (version > 8) >> +  if (version > 9) >>       return 0; >>     map->version = version; >> @@ -608,8 +612,17 @@ to use the section anyway."), >>     map->symbol_table >>       = offset_view (gdb::array_view (symbol_table, >>                               symbol_table_end)); >> - >>     ++i; >> + >> +  if (version >= 9) >> +    { >> +      const gdb_byte *shortcut_table = addr + metadata[i]; >> +      const gdb_byte *shortcut_table_end = addr + metadata[i + 1]; >> +      map->shortcut_table >> +    = gdb::array_view (shortcut_table, >> shortcut_table_end); >> +      ++i; >> +    } >> + >>     map->constant_pool = buffer.slice (metadata[i]); >>     if (map->constant_pool.empty () && !map->symbol_table.empty ()) >> @@ -763,6 +776,43 @@ create_addrmap_from_gdb_index (dwarf2_per_objfile >> *per_objfile, >>       = new (&per_bfd->obstack) addrmap_fixed (&per_bfd->obstack, >> &mutable_map); >>   } >> +/* Sets the name and language of the main function from the shortcut >> table. */ >> + >> +static void >> +set_main_name_from_gdb_index (dwarf2_per_objfile *per_objfile, >> +                  mapped_gdb_index *index) >> +{ >> +  const auto expected_size = 4 + sizeof (offset_type); >> +  if (index->shortcut_table.size () < expected_size) >> +    /* The data in the section is not present, is corrupted or is in >> a version >> +     * we don't know about. Regardless, we can't make use of it. */ >> +    return; >> + >> +  auto ptr = index->shortcut_table.data (); >> +  const auto dw_lang = extract_unsigned_integer (ptr, 4, >> BFD_ENDIAN_LITTLE); >> +  if (dw_lang >= DW_LANG_hi_user) >> +    { >> +      complaint (_(".gdb_index shortcut table has invalid main >> language %u"), >> +           (unsigned) dw_lang); >> +      return; >> +    } >> +  if (dw_lang == 0) >> +    { >> +      /* Don't bother if the language for the main symbol was not >> known or if >> +       * there was no main symbol at all when the index was built. */ >> +      return; >> +    } >> +  ptr += 4; >> + >> +  const auto lang = dwarf_lang_to_enum_language (dw_lang); >> +  const auto name_offset = extract_unsigned_integer (ptr, >> +                             sizeof (offset_type), >> +                             BFD_ENDIAN_LITTLE); >> +  const auto name = (const char*) (index->constant_pool.data () + >> name_offset); >> + >> +  set_objfile_main_name (per_objfile->objfile, name, (enum language) >> lang); >> +} >> + >>   /* See read-gdb-index.h.  */ >>   int >> @@ -848,6 +898,8 @@ dwarf2_read_gdb_index >>     create_addrmap_from_gdb_index (per_objfile, map.get ()); >> +  set_main_name_from_gdb_index (per_objfile, map.get ()); >> + >>     per_bfd->index_table = std::move (map); >>     per_bfd->quick_file_names_table = >>       create_quick_file_names_table (per_bfd->all_units.size ()); >> diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c >> index 4828409222..89acd94c05 100644 >> --- a/gdb/dwarf2/read.c >> +++ b/gdb/dwarf2/read.c >> @@ -17745,7 +17745,9 @@ leb128_size (const gdb_byte *buf) >>       } >>   } >> -static enum language >> +/* Converts DWARF language names to GDB language names. */ >> + >> +enum language >>   dwarf_lang_to_enum_language (unsigned int lang) >>   { >>     enum language language; >> @@ -21661,6 +21663,7 @@ prepare_one_comp_unit (struct dwarf2_cu *cu, >> struct die_info *comp_unit_die, >>     /* Set the language we're debugging.  */ >>     attr = dwarf2_attr (comp_unit_die, DW_AT_language, cu); >>     enum language lang; >> +  dwarf_source_language dw_lang = (dwarf_source_language)0; >>     if (cu->producer != nullptr >>         && strstr (cu->producer, "IBM XL C for OpenCL") != NULL) >>       { >> @@ -21669,18 +21672,24 @@ prepare_one_comp_unit (struct dwarf2_cu *cu, >> struct die_info *comp_unit_die, >>        language detection we fall back to the DW_AT_producer >>        string.  */ >>         lang = language_opencl; >> +      dw_lang = DW_LANG_OpenCL; >>       } >>     else if (cu->producer != nullptr >>          && strstr (cu->producer, "GNU Go ") != NULL) >>       { >>         /* Similar hack for Go.  */ >>         lang = language_go; >> +      dw_lang = DW_LANG_Go; >>       } >>     else if (attr != nullptr) >> -    lang = dwarf_lang_to_enum_language (attr->constant_value (0)); >> +    { >> +      lang = dwarf_lang_to_enum_language (attr->constant_value (0)); >> +      dw_lang = (dwarf_source_language)attr->constant_value (0); >> +    } >>     else >>       lang = pretend_language; >> +  cu->per_cu->dw_lang = dw_lang; >>     cu->language_defn = language_def (lang); >>     switch (comp_unit_die->tag) >> diff --git a/gdb/dwarf2/read.h b/gdb/dwarf2/read.h >> index 37023a2070..6707c400cf 100644 >> --- a/gdb/dwarf2/read.h >> +++ b/gdb/dwarf2/read.h >> @@ -245,6 +245,14 @@ struct dwarf2_per_cu_data >>        functions above.  */ >>     std::vector *imported_symtabs = nullptr; >> +  /* The original DW_LANG_* value of the CU, as provided to us by >> +   * DW_AT_language. It is interesting to keep this value around in >> cases where >> +   * we can't use the values from the language enum, as the mapping >> to them is >> +   * lossy, and, while that is usually fine, things like the index >> have an >> +   * understandable bias towards not exposing internal GDB structures >> to the >> +   * outside world, and so prefer to use DWARF constants in their >> stead. */ >> +  dwarf_source_language dw_lang; >> + >>     /* Return true of IMPORTED_SYMTABS is empty or not yet allocated.  */ >>     bool imported_symtabs_empty () const >>     { >> @@ -755,6 +763,10 @@ struct dwarf2_per_objfile >>                std::unique_ptr> m_dwarf2_cus; >>   }; >> +/* Converts DWARF language names to GDB language names. */ >> + >> +enum language dwarf_lang_to_enum_language (unsigned int lang); >> + >>   /* Get the dwarf2_per_objfile associated to OBJFILE.  */ >>   dwarf2_per_objfile *get_dwarf2_per_objfile (struct objfile *objfile); >