From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id AB7333858D39 for ; Mon, 14 Aug 2023 07:31:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AB7333858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id CF80F21984; Mon, 14 Aug 2023 07:31:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1691998297; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6dClRJ70QIfYd0mU2Cm3IcXvXwGyg/8rp9qEG8LouPA=; b=QfXXDHaSOywJjtV0A3qj23kmeizigCIgRgAFgTd2H5p76154pyp3MMVESMcJnUUQH0Lfds N1Kv4iCWzEzBBW4KH3YPXW09ikRENCWtIdQUEYoYqmXewDfuzw7lq942aeqA4pN7fqq0Uw AazM+9mDCwJa1k3FBE2AiOKJh0l4UDI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1691998297; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6dClRJ70QIfYd0mU2Cm3IcXvXwGyg/8rp9qEG8LouPA=; b=z1b+/8qQe3qrGQOMK7F/To3RNqNXRSrYvqZgpipWFHPFbq8GsS1xAY+IXF4Flg+iAZK9F7 EsYoB3BRVtu3UbCg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B3811138EE; Mon, 14 Aug 2023 07:31:37 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id M/6jKlnY2WS5OwAAMHmgww (envelope-from ); Mon, 14 Aug 2023 07:31:37 +0000 Message-ID: Date: Mon, 14 Aug 2023 09:31:31 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3] Add name_of_main and language_of_main to the DWARF index Content-Language: en-US To: Matheus Branco Borella , gdb-patches@sourceware.org Cc: tom@tromey.com References: <20230608214012.1561-1-dark.ryu.550@gmail.com> <20230811182131.492-1-dark.ryu.550@gmail.com> From: Tom de Vries In-Reply-To: <20230811182131.492-1-dark.ryu.550@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 8/11/23 20:21, Matheus Branco Borella wrote: > This should hopefully be the final version. Has a more descriptive entry in the > NEWS file, but is otherwise the same as v2. I believe the misunderstanding with > my contributor status should have also been sorted out? > Hi, As per Eli's last comment in this thread, that's indeed the case. > --- > This patch adds a new section to the DWARF index containing the name > and the language of the main function symbol, gathered from > `cooked_index::get_main`, if available. Currently, for lack of a better name, > this section is called the "shortcut table". The way this name is both saved and > applied upon an index being loaded in mirrors how it is done in > `cooked_index_functions`, more specifically, the full name of the main function > symbol is saved and `set_objfile_main_name` is used to apply it after it is > loaded. > > The main use case for this patch is in improving startup times when dealing with > large binaries. Currently, when an index is used, GDB has to expand symtabs > until it finds out what the language of the main function symbol is. For some > large executables, this may take a considerable amount of time to complete, > slowing down startup. This patch bypasses that operation by having both the name > and language of the main function symbol be provided ahead of time by the index. > > In my testing (a binary with about 1.8GB worth of DWARF data) this change brings > startup time down from about 34 seconds to about 1.5 seconds. > I've reviewed the patch, and found a few nits. There are two spots that look unintentional to me, one adding and one removing an empty line. You might want to remove those. I found this bit: ... + lang = dwarf_lang_to_enum_language (attr->constant_value (0)); + dw_lang = (dwarf_source_language)attr->constant_value (0); ... I would reformulate as: ... + dw_lang = (dwarf_source_language)attr->constant_value (0); + lang = dwarf_lang_to_enum_language (dw_lang); ... but I don't feel strongly about that. I've looked over Tom Tromey's comments earlier in the thread, and I think they all have been addresses. Furthermore, Eli already approved the documentation part, so I'd say: approved, but wait for one week in case somebody else has other comments. Approved-By: Tom de Vries Thanks, - Tom > PR symtab/24549 > Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=24549 > --- > gdb/NEWS | 3 ++ > gdb/doc/gdb.texinfo | 23 +++++++++++++-- > gdb/dwarf2/index-write.c | 47 +++++++++++++++++++++++++++---- > gdb/dwarf2/read-gdb-index.c | 56 +++++++++++++++++++++++++++++++++++-- > gdb/dwarf2/read.c | 13 +++++++-- > gdb/dwarf2/read.h | 12 ++++++++ > 6 files changed, 143 insertions(+), 11 deletions(-) > > diff --git a/gdb/NEWS b/gdb/NEWS > index d97e3c15a8..ac455f39f2 100644 > --- a/gdb/NEWS > +++ b/gdb/NEWS > @@ -3,6 +3,9 @@ > > *** Changes since GDB 13 > > +* GDB index now contains information about the main function. This speeds up > + startup when it is being used for some large binaries. > + > * The AArch64 'org.gnu.gdb.aarch64.pauth' Pointer Authentication feature string > has been deprecated in favor of the 'org.gnu.gdb.aarch64.pauth_v2' feature > string. > diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo > index d1059e0cb7..3b2fdcd19e 100644 > --- a/gdb/doc/gdb.texinfo > +++ b/gdb/doc/gdb.texinfo > @@ -49093,13 +49093,14 @@ unless otherwise noted: > > @enumerate > @item > -The version number, currently 8. Versions 1, 2 and 3 are obsolete. > +The version number, currently 9. Versions 1, 2 and 3 are obsolete. > Version 4 uses a different hashing function from versions 5 and 6. > Version 6 includes symbols for inlined functions, whereas versions 4 > and 5 do not. Version 7 adds attributes to the CU indices in the > symbol table. Version 8 specifies that symbols from DWARF type units > (@samp{DW_TAG_type_unit}) refer to the type unit's symbol table and not the > -compilation unit (@samp{DW_TAG_comp_unit}) using the type. > +compilation unit (@samp{DW_TAG_comp_unit}) using the type. Version 9 adds > +the name and the language of the main function to the index. > > @value{GDBN} will only read version 4, 5, or 6 indices > by specifying @code{set use-deprecated-index-sections on}. > @@ -49120,6 +49121,9 @@ The offset, from the start of the file, of the address area. > @item > The offset, from the start of the file, of the symbol table. > > +@item > +The offset, from the start of the file, of the shortcut table. > + > @item > The offset, from the start of the file, of the constant pool. > @end enumerate > @@ -49196,6 +49200,21 @@ don't currently have a simple description of the canonicalization > algorithm; if you intend to create new index sections, you must read > the code. > > +@item The shortcut table > +This is a data structure with the following fields: > + > +@table @asis > +@item Language of main > +A 32-bit little-endian value indicating the language of the main function as a > +@code{DW_LANG_} constant. This value will be zero if main function information > +is not present. > + > +@item Name of main > +An @code{offset_type} value indicating the offset of the main function's name > +in the constant pool. This value must be ignored if the value for the language > +of main is zero. > +@end table > + > @item > The constant pool. This is simply a bunch of bytes. It is organized > so that alignment is correct: CU vectors are stored first, followed by > diff --git a/gdb/dwarf2/index-write.c b/gdb/dwarf2/index-write.c > index 62c2cc6ac7..7117a5184b 100644 > --- a/gdb/dwarf2/index-write.c > +++ b/gdb/dwarf2/index-write.c > @@ -1080,14 +1080,15 @@ write_gdbindex_1 (FILE *out_file, > const data_buf &types_cu_list, > const data_buf &addr_vec, > const data_buf &symtab_vec, > - const data_buf &constant_pool) > + const data_buf &constant_pool, > + const data_buf &shortcuts) > { > data_buf contents; > - const offset_type size_of_header = 6 * sizeof (offset_type); > + const offset_type size_of_header = 7 * sizeof (offset_type); > offset_type total_len = size_of_header; > > /* The version number. */ > - contents.append_offset (8); > + contents.append_offset (9); > > /* The offset of the CU list from the start of the file. */ > contents.append_offset (total_len); > @@ -1105,6 +1106,10 @@ write_gdbindex_1 (FILE *out_file, > contents.append_offset (total_len); > total_len += symtab_vec.size (); > > + /* The offset of the shortcut table from the start of the file. */ > + contents.append_offset (total_len); > + total_len += shortcuts.size (); > + > /* The offset of the constant pool from the start of the file. */ > contents.append_offset (total_len); > total_len += constant_pool.size (); > @@ -1116,6 +1121,7 @@ write_gdbindex_1 (FILE *out_file, > types_cu_list.file_write (out_file); > addr_vec.file_write (out_file); > symtab_vec.file_write (out_file); > + shortcuts.file_write (out_file); > constant_pool.file_write (out_file); > > assert_file_size (out_file, total_len); > @@ -1193,6 +1199,34 @@ write_cooked_index (cooked_index *table, > } > } > > +/* Write shortcut information. */ > + > +static void > +write_shortcuts_table (cooked_index *table, data_buf& shortcuts, > + data_buf& cpool) > +{ > + const auto main_info = table->get_main (); > + size_t main_name_offset = 0; > + dwarf_source_language dw_lang = (dwarf_source_language)0; > + > + if (main_info != nullptr) > + { > + dw_lang = main_info->per_cu->dw_lang; > + > + if (dw_lang != 0) > + { > + auto_obstack obstack; > + const auto main_name = main_info->full_name (&obstack, true); > + > + main_name_offset = cpool.size (); > + cpool.append_cstr0 (main_name); > + } > + } > + > + shortcuts.append_uint (4, BFD_ENDIAN_LITTLE, dw_lang); > + shortcuts.append_offset (main_name_offset); > +} > + > /* Write contents of a .gdb_index section for OBJFILE into OUT_FILE. > If OBJFILE has an associated dwz file, write contents of a .gdb_index > section for that dwz file into DWZ_OUT_FILE. If OBJFILE does not have an > @@ -1270,11 +1304,14 @@ write_gdbindex (dwarf2_per_bfd *per_bfd, cooked_index *table, > > write_hash_table (&symtab, symtab_vec, constant_pool); > > + data_buf shortcuts; > + write_shortcuts_table (table, shortcuts, constant_pool); > + > write_gdbindex_1(out_file, objfile_cu_list, types_cu_list, addr_vec, > - symtab_vec, constant_pool); > + symtab_vec, constant_pool, shortcuts); > > if (dwz_out_file != NULL) > - write_gdbindex_1 (dwz_out_file, dwz_cu_list, {}, {}, {}, {}); > + write_gdbindex_1 (dwz_out_file, dwz_cu_list, {}, {}, {}, {}, {}); > else > gdb_assert (dwz_cu_list.empty ()); > } > diff --git a/gdb/dwarf2/read-gdb-index.c b/gdb/dwarf2/read-gdb-index.c > index 1006386cb2..f09c5ba234 100644 > --- a/gdb/dwarf2/read-gdb-index.c > +++ b/gdb/dwarf2/read-gdb-index.c > @@ -88,6 +88,9 @@ struct mapped_gdb_index final : public mapped_index_base > /* A pointer to the constant pool. */ > gdb::array_view constant_pool; > > + /* The shortcut table data. */ > + gdb::array_view shortcut_table; > + > /* Return the index into the constant pool of the name of the IDXth > symbol in the symbol table. */ > offset_type symbol_name_index (offset_type idx) const > @@ -166,6 +169,7 @@ dwarf2_gdb_index::dump (struct objfile *objfile) > > mapped_gdb_index *index = (gdb::checked_static_cast > (per_objfile->per_bfd->index_table.get ())); > + > gdb_printf (".gdb_index: version %d\n", index->version); > gdb_printf ("\n"); > } > @@ -583,7 +587,7 @@ to use the section anyway."), > > /* Indexes with higher version than the one supported by GDB may be no > longer backward compatible. */ > - if (version > 8) > + if (version > 9) > return 0; > > map->version = version; > @@ -608,8 +612,17 @@ to use the section anyway."), > map->symbol_table > = offset_view (gdb::array_view (symbol_table, > symbol_table_end)); > - > ++i; > + > + if (version >= 9) > + { > + const gdb_byte *shortcut_table = addr + metadata[i]; > + const gdb_byte *shortcut_table_end = addr + metadata[i + 1]; > + map->shortcut_table > + = gdb::array_view (shortcut_table, shortcut_table_end); > + ++i; > + } > + > map->constant_pool = buffer.slice (metadata[i]); > > if (map->constant_pool.empty () && !map->symbol_table.empty ()) > @@ -763,6 +776,43 @@ create_addrmap_from_gdb_index (dwarf2_per_objfile *per_objfile, > = new (&per_bfd->obstack) addrmap_fixed (&per_bfd->obstack, &mutable_map); > } > > +/* Sets the name and language of the main function from the shortcut table. */ > + > +static void > +set_main_name_from_gdb_index (dwarf2_per_objfile *per_objfile, > + mapped_gdb_index *index) > +{ > + const auto expected_size = 4 + sizeof (offset_type); > + if (index->shortcut_table.size () < expected_size) > + /* The data in the section is not present, is corrupted or is in a version > + * we don't know about. Regardless, we can't make use of it. */ > + return; > + > + auto ptr = index->shortcut_table.data (); > + const auto dw_lang = extract_unsigned_integer (ptr, 4, BFD_ENDIAN_LITTLE); > + if (dw_lang >= DW_LANG_hi_user) > + { > + complaint (_(".gdb_index shortcut table has invalid main language %u"), > + (unsigned) dw_lang); > + return; > + } > + if (dw_lang == 0) > + { > + /* Don't bother if the language for the main symbol was not known or if > + * there was no main symbol at all when the index was built. */ > + return; > + } > + ptr += 4; > + > + const auto lang = dwarf_lang_to_enum_language (dw_lang); > + const auto name_offset = extract_unsigned_integer (ptr, > + sizeof (offset_type), > + BFD_ENDIAN_LITTLE); > + const auto name = (const char*) (index->constant_pool.data () + name_offset); > + > + set_objfile_main_name (per_objfile->objfile, name, (enum language) lang); > +} > + > /* See read-gdb-index.h. */ > > int > @@ -848,6 +898,8 @@ dwarf2_read_gdb_index > > create_addrmap_from_gdb_index (per_objfile, map.get ()); > > + set_main_name_from_gdb_index (per_objfile, map.get ()); > + > per_bfd->index_table = std::move (map); > per_bfd->quick_file_names_table = > create_quick_file_names_table (per_bfd->all_units.size ()); > diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c > index 4828409222..89acd94c05 100644 > --- a/gdb/dwarf2/read.c > +++ b/gdb/dwarf2/read.c > @@ -17745,7 +17745,9 @@ leb128_size (const gdb_byte *buf) > } > } > > -static enum language > +/* Converts DWARF language names to GDB language names. */ > + > +enum language > dwarf_lang_to_enum_language (unsigned int lang) > { > enum language language; > @@ -21661,6 +21663,7 @@ prepare_one_comp_unit (struct dwarf2_cu *cu, struct die_info *comp_unit_die, > /* Set the language we're debugging. */ > attr = dwarf2_attr (comp_unit_die, DW_AT_language, cu); > enum language lang; > + dwarf_source_language dw_lang = (dwarf_source_language)0; > if (cu->producer != nullptr > && strstr (cu->producer, "IBM XL C for OpenCL") != NULL) > { > @@ -21669,18 +21672,24 @@ prepare_one_comp_unit (struct dwarf2_cu *cu, struct die_info *comp_unit_die, > language detection we fall back to the DW_AT_producer > string. */ > lang = language_opencl; > + dw_lang = DW_LANG_OpenCL; > } > else if (cu->producer != nullptr > && strstr (cu->producer, "GNU Go ") != NULL) > { > /* Similar hack for Go. */ > lang = language_go; > + dw_lang = DW_LANG_Go; > } > else if (attr != nullptr) > - lang = dwarf_lang_to_enum_language (attr->constant_value (0)); > + { > + lang = dwarf_lang_to_enum_language (attr->constant_value (0)); > + dw_lang = (dwarf_source_language)attr->constant_value (0); > + } > else > lang = pretend_language; > > + cu->per_cu->dw_lang = dw_lang; > cu->language_defn = language_def (lang); > > switch (comp_unit_die->tag) > diff --git a/gdb/dwarf2/read.h b/gdb/dwarf2/read.h > index 37023a2070..6707c400cf 100644 > --- a/gdb/dwarf2/read.h > +++ b/gdb/dwarf2/read.h > @@ -245,6 +245,14 @@ struct dwarf2_per_cu_data > functions above. */ > std::vector *imported_symtabs = nullptr; > > + /* The original DW_LANG_* value of the CU, as provided to us by > + * DW_AT_language. It is interesting to keep this value around in cases where > + * we can't use the values from the language enum, as the mapping to them is > + * lossy, and, while that is usually fine, things like the index have an > + * understandable bias towards not exposing internal GDB structures to the > + * outside world, and so prefer to use DWARF constants in their stead. */ > + dwarf_source_language dw_lang; > + > /* Return true of IMPORTED_SYMTABS is empty or not yet allocated. */ > bool imported_symtabs_empty () const > { > @@ -755,6 +763,10 @@ struct dwarf2_per_objfile > std::unique_ptr> m_dwarf2_cus; > }; > > +/* Converts DWARF language names to GDB language names. */ > + > +enum language dwarf_lang_to_enum_language (unsigned int lang); > + > /* Get the dwarf2_per_objfile associated to OBJFILE. */ > > dwarf2_per_objfile *get_dwarf2_per_objfile (struct objfile *objfile);