From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by sourceware.org (Postfix) with ESMTPS id 04A383857B93 for ; Fri, 7 Jul 2023 15:01:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 04A383857B93 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pl1-x62d.google.com with SMTP id d9443c01a7336-1b8b32c1dd1so11169995ad.0 for ; Fri, 07 Jul 2023 08:01:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1688742060; x=1691334060; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CqDlm+x9h5gWAhSb8Z2eTzLRlOv+quzEP5Z1iH1NbyA=; b=HF7VMXcpi0Px7oWlNkYFqYiwhspqGsVz5dw4pWGtoK7JRGg+0ouqxWe8S3sCTHYQWI FBNI0MFEQizioSilo8P9+HON4Ksfzk2Y4sat4Xaj6mmeT3ScpMRjFFJUZL4sJ89ayiry PuMzab6xZmWmtuFIVSuJzbabydvDCmVwwfGAoCRprROHshNV3bdclhRYB5LsGb2HTSsn I04kaQZFNPGGfUvyeCYQ4faibiAj9+D8tFgfqovqoqwmLbBTXLYZrcN5KdZ+uZ7MbJYw LZjlqD6E7uGuXbOekzaEfrRfF2zWhWeC9uau64nx9FwxHhIm2f77WWSWAFOQb+Ov1FMC vI8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688742060; x=1691334060; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CqDlm+x9h5gWAhSb8Z2eTzLRlOv+quzEP5Z1iH1NbyA=; b=dqN0B4TsXWiwYA8/WpANaFQrQaXoV5DBbMPSz9QemkSWN3RJ+3Q7aF0WwaENglnTrR ymABugKvotkwdPQ4ypH5/Msx8Br9Phhabb2n+mlXBINSwacJ/RpYrjKo8krMvmzLGUVo 77Y09Nc7m+h5fGJpYbs2P9BDIWAlNSeOzGV2FEWhfKF/xVkxc5IUuNIH5wOxl21JRe3U SX/qt1GKR0IS/pAPJeuIPmpAcEYjmUEzQMcRRQ1ZBBfmFQeWwJIutcT79SdmqPOcttyt rnNhpxe2BzYnsr/n5ZNW77EbA8G4+UQ+TmIdRmCIAYfYLHAbTX1tWLWYjh+Ocwm+S259 u4aA== X-Gm-Message-State: ABy/qLZhSylxE4g77SdZgsr3KBs7wki2aV4T/Bmelc17n/KLMclFiEN+ BDikMBXAeKo5Ey1nXejKyX0dM5mX5MuJ5g== X-Google-Smtp-Source: APBJJlHcDF8mmfqsarstQWhcmjLHX6mEegm23XWI4FXSeBjCxBft9TRetlLoJXanvtoHSjToTss8HA== X-Received: by 2002:a17:902:e84a:b0:1b9:ca58:34df with SMTP id t10-20020a170902e84a00b001b9ca5834dfmr490984plg.63.1688742059659; Fri, 07 Jul 2023 08:00:59 -0700 (PDT) Received: from localhost.localdomain ([177.200.77.243]) by smtp.gmail.com with ESMTPSA id v6-20020a170902b7c600b001ab1b7bae5asm3373001plz.184.2023.07.07.08.00.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Jul 2023 08:00:59 -0700 (PDT) From: Matheus Branco Borella To: eliz@gnu.org Cc: gdb-patches@sourceware.org, Matheus Branco Borella Subject: [PATCH] Add name_of_main and language_of_main to the DWARF index Date: Fri, 7 Jul 2023 12:00:22 -0300 Message-Id: <20230707150021.618-1-dark.ryu.550@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <83bkgwrxnh.fsf@gnu.org> References: <83bkgwrxnh.fsf@gnu.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Eli Zaretskii wrote: > Your assignment is not on file yet, AFAICT. Was the paperwork > completed, i.e. did you get a copy of the assignment signed by you and > by the FSF? If not, you need to wait some more. Huh, that's weird. I do have the copy, signed by both parties. Maybe it just hasn't been filed yet? If you want, I could forward it to you. > Thanks. The documentation parts are OK, but please fix the text to > leave two spaces between sentences, not one. Alright, I've fixed that. Is there anything else I'm missing? --- This patch adds a new section to the DWARF index containing the name and the language of the main function symbol, gathered from `cooked_index::get_main`, if available. Currently, for lack of a better name, this section is called the "shortcut table". The way this name is both saved and applied upon an index being loaded in mirrors how it is done in `cooked_index_functions`, more specifically, the full name of the main function symbol is saved and `set_objfile_main_name` is used to apply it after it is loaded. The main use case for this patch is in improving startup times when dealing with large binaries. Currently, when an index is used, GDB has to expand symtabs until it finds out what the language of the main function symbol is. For some large executables, this may take a considerable amount of time to complete, slowing down startup. This patch bypasses that operation by having both the name and language of the main function symbol be provided ahead of time by the index. In my testing (a binary with about 1.8GB worth of DWARF data) this change brings startup time down from about 34 seconds to about 1.5 seconds. --- gdb/NEWS | 2 ++ gdb/doc/gdb.texinfo | 23 +++++++++++++-- gdb/dwarf2/index-write.c | 47 +++++++++++++++++++++++++++---- gdb/dwarf2/read-gdb-index.c | 56 +++++++++++++++++++++++++++++++++++-- gdb/dwarf2/read.c | 13 +++++++-- gdb/dwarf2/read.h | 12 ++++++++ 6 files changed, 142 insertions(+), 11 deletions(-) diff --git a/gdb/NEWS b/gdb/NEWS index d97e3c15a8..2d940d1f79 100644 --- a/gdb/NEWS +++ b/gdb/NEWS @@ -3,6 +3,8 @@ *** Changes since GDB 13 +* DWARF index now contains information about the main function. + * The AArch64 'org.gnu.gdb.aarch64.pauth' Pointer Authentication feature string has been deprecated in favor of the 'org.gnu.gdb.aarch64.pauth_v2' feature string. diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo index d1059e0cb7..4c58ea5709 100644 --- a/gdb/doc/gdb.texinfo +++ b/gdb/doc/gdb.texinfo @@ -49093,13 +49093,14 @@ unless otherwise noted: @enumerate @item -The version number, currently 8. Versions 1, 2 and 3 are obsolete. +The version number, currently 9. Versions 1, 2 and 3 are obsolete. Version 4 uses a different hashing function from versions 5 and 6. Version 6 includes symbols for inlined functions, whereas versions 4 and 5 do not. Version 7 adds attributes to the CU indices in the symbol table. Version 8 specifies that symbols from DWARF type units (@samp{DW_TAG_type_unit}) refer to the type unit's symbol table and not the -compilation unit (@samp{DW_TAG_comp_unit}) using the type. +compilation unit (@samp{DW_TAG_comp_unit}) using the type. Version 9 adds +the name and the language of the main function to the index. @value{GDBN} will only read version 4, 5, or 6 indices by specifying @code{set use-deprecated-index-sections on}. @@ -49120,6 +49121,9 @@ The offset, from the start of the file, of the address area. @item The offset, from the start of the file, of the symbol table. +@item +The offset, from the start of the file, of the shortcut table. + @item The offset, from the start of the file, of the constant pool. @end enumerate @@ -49196,6 +49200,21 @@ don't currently have a simple description of the canonicalization algorithm; if you intend to create new index sections, you must read the code. +@item +The shortcut table. This is a data structure with the following fields: + +@table @asis +@item +A 32-bit little-endian value indicating the language of the main function as a +@code{DW_LANG_} constant. This value will be zero if main function information +is not present. + +@item +An @code{offset_type} value indicating the offset of the main function's name +in the constant pool. This value must be ignored if the value for the language +of main is zero. +@ end table + @item The constant pool. This is simply a bunch of bytes. It is organized so that alignment is correct: CU vectors are stored first, followed by diff --git a/gdb/dwarf2/index-write.c b/gdb/dwarf2/index-write.c index 62c2cc6ac7..ee6eaa7b87 100644 --- a/gdb/dwarf2/index-write.c +++ b/gdb/dwarf2/index-write.c @@ -1080,14 +1080,15 @@ write_gdbindex_1 (FILE *out_file, const data_buf &types_cu_list, const data_buf &addr_vec, const data_buf &symtab_vec, - const data_buf &constant_pool) + const data_buf &constant_pool, + const data_buf &shortcuts) { data_buf contents; - const offset_type size_of_header = 6 * sizeof (offset_type); + const offset_type size_of_header = 7 * sizeof (offset_type); offset_type total_len = size_of_header; /* The version number. */ - contents.append_offset (8); + contents.append_offset (9); /* The offset of the CU list from the start of the file. */ contents.append_offset (total_len); @@ -1105,6 +1106,10 @@ write_gdbindex_1 (FILE *out_file, contents.append_offset (total_len); total_len += symtab_vec.size (); + /* The offset of the shortcut table from the start of the file. */ + contents.append_offset (total_len); + total_len += shortcuts.size (); + /* The offset of the constant pool from the start of the file. */ contents.append_offset (total_len); total_len += constant_pool.size (); @@ -1116,6 +1121,7 @@ write_gdbindex_1 (FILE *out_file, types_cu_list.file_write (out_file); addr_vec.file_write (out_file); symtab_vec.file_write (out_file); + shortcuts.file_write (out_file); constant_pool.file_write (out_file); assert_file_size (out_file, total_len); @@ -1193,6 +1199,34 @@ write_cooked_index (cooked_index *table, } } +/* Write shortcut information. */ + +static void +write_shortcuts_table (cooked_index *table, data_buf& shortcuts, + data_buf& cpool) +{ + const auto main_info = table->get_main (); + size_t main_name_offset = 0; + dwarf_source_language dw_lang = (dwarf_source_language)0; + + if (main_info != nullptr) + { + dw_lang = main_info->per_cu->dw_lang; + + if (dw_lang != 0) + { + auto_obstack obstack; + const auto main_name = main_info->full_name (&obstack, true); + + main_name_offset = cpool.size (); + cpool.append_cstr0 (main_name); + } + } + + shortcuts.append_uint (4, BFD_ENDIAN_LITTLE, dw_lang); + shortcuts.append_offset (main_name_offset); +} + /* Write contents of a .gdb_index section for OBJFILE into OUT_FILE. If OBJFILE has an associated dwz file, write contents of a .gdb_index section for that dwz file into DWZ_OUT_FILE. If OBJFILE does not have an @@ -1270,11 +1304,14 @@ write_gdbindex (dwarf2_per_bfd *per_bfd, cooked_index *table, write_hash_table (&symtab, symtab_vec, constant_pool); + data_buf shortcuts; + write_shortcuts_table (table, shortcuts, constant_pool); + write_gdbindex_1(out_file, objfile_cu_list, types_cu_list, addr_vec, - symtab_vec, constant_pool); + symtab_vec, constant_pool, shortcuts); if (dwz_out_file != NULL) - write_gdbindex_1 (dwz_out_file, dwz_cu_list, {}, {}, {}, {}); + write_gdbindex_1 (dwz_out_file, dwz_cu_list, {}, {}, {}, {}, {}); else gdb_assert (dwz_cu_list.empty ()); } diff --git a/gdb/dwarf2/read-gdb-index.c b/gdb/dwarf2/read-gdb-index.c index 1006386cb2..534c5a7fd7 100644 --- a/gdb/dwarf2/read-gdb-index.c +++ b/gdb/dwarf2/read-gdb-index.c @@ -88,6 +88,9 @@ struct mapped_gdb_index final : public mapped_index_base /* A pointer to the constant pool. */ gdb::array_view constant_pool; + /* The shortcut table data. */ + gdb::array_view shortcut_table; + /* Return the index into the constant pool of the name of the IDXth symbol in the symbol table. */ offset_type symbol_name_index (offset_type idx) const @@ -166,6 +169,7 @@ dwarf2_gdb_index::dump (struct objfile *objfile) mapped_gdb_index *index = (gdb::checked_static_cast (per_objfile->per_bfd->index_table.get ())); + gdb_printf (".gdb_index: version %d\n", index->version); gdb_printf ("\n"); } @@ -583,7 +587,7 @@ to use the section anyway."), /* Indexes with higher version than the one supported by GDB may be no longer backward compatible. */ - if (version > 8) + if (version > 9) return 0; map->version = version; @@ -608,8 +612,17 @@ to use the section anyway."), map->symbol_table = offset_view (gdb::array_view (symbol_table, symbol_table_end)); - ++i; + + if (version >= 9) + { + const gdb_byte *shortcut_table = addr + metadata[i]; + const gdb_byte *shortcut_table_end = addr + metadata[i + 1]; + map->shortcut_table + = gdb::array_view (shortcut_table, shortcut_table_end); + ++i; + } + map->constant_pool = buffer.slice (metadata[i]); if (map->constant_pool.empty () && !map->symbol_table.empty ()) @@ -763,6 +776,43 @@ create_addrmap_from_gdb_index (dwarf2_per_objfile *per_objfile, = new (&per_bfd->obstack) addrmap_fixed (&per_bfd->obstack, &mutable_map); } +/* Sets the name and language of the main function from the shortcut table. */ + +static void +set_main_name_from_gdb_index (dwarf2_per_objfile *per_objfile, + mapped_gdb_index *index) +{ + const auto expected_size = 4 + sizeof (offset_type); + if (index->shortcut_table.size () < expected_size) + /* The data in the section is not present, is corrupted or is in a version + * we don't know about. Regardless, we can't make use of it. */ + return; + + auto ptr = index->shortcut_table.data (); + const auto dw_lang = extract_unsigned_integer (ptr, 4, BFD_ENDIAN_LITTLE); + if (dw_lang >= DW_LANG_hi_user) + { + complaint (_(".gdb_index shortcut table has invalid main language %u"), + (unsigned) dw_lang); + return; + } + if (dw_lang == 0) + { + /* Don't bother if the language for the main symbol was not known or if + * there was no main symbol at all when the index was built. */ + return; + } + ptr += 4; + + const auto lang = dwarf_lang_to_enum_language (dw_lang); + const auto name_offset = extract_unsigned_integer (ptr, + sizeof (offset_type), + BFD_ENDIAN_LITTLE); + const auto name = (const char*) (index->constant_pool.data () + name_offset); + + set_objfile_main_name (per_objfile->objfile, name, (enum language) lang); +} + /* See read-gdb-index.h. */ int @@ -848,6 +898,8 @@ dwarf2_read_gdb_index create_addrmap_from_gdb_index (per_objfile, map.get ()); + set_main_name_from_gdb_index (per_objfile, map.get ()); + per_bfd->index_table = std::move (map); per_bfd->quick_file_names_table = create_quick_file_names_table (per_bfd->all_units.size ()); diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c index 4828409222..89acd94c05 100644 --- a/gdb/dwarf2/read.c +++ b/gdb/dwarf2/read.c @@ -17745,7 +17745,9 @@ leb128_size (const gdb_byte *buf) } } -static enum language +/* Converts DWARF language names to GDB language names. */ + +enum language dwarf_lang_to_enum_language (unsigned int lang) { enum language language; @@ -21661,6 +21663,7 @@ prepare_one_comp_unit (struct dwarf2_cu *cu, struct die_info *comp_unit_die, /* Set the language we're debugging. */ attr = dwarf2_attr (comp_unit_die, DW_AT_language, cu); enum language lang; + dwarf_source_language dw_lang = (dwarf_source_language)0; if (cu->producer != nullptr && strstr (cu->producer, "IBM XL C for OpenCL") != NULL) { @@ -21669,18 +21672,24 @@ prepare_one_comp_unit (struct dwarf2_cu *cu, struct die_info *comp_unit_die, language detection we fall back to the DW_AT_producer string. */ lang = language_opencl; + dw_lang = DW_LANG_OpenCL; } else if (cu->producer != nullptr && strstr (cu->producer, "GNU Go ") != NULL) { /* Similar hack for Go. */ lang = language_go; + dw_lang = DW_LANG_Go; } else if (attr != nullptr) - lang = dwarf_lang_to_enum_language (attr->constant_value (0)); + { + lang = dwarf_lang_to_enum_language (attr->constant_value (0)); + dw_lang = (dwarf_source_language)attr->constant_value (0); + } else lang = pretend_language; + cu->per_cu->dw_lang = dw_lang; cu->language_defn = language_def (lang); switch (comp_unit_die->tag) diff --git a/gdb/dwarf2/read.h b/gdb/dwarf2/read.h index 37023a2070..1235f62bfc 100644 --- a/gdb/dwarf2/read.h +++ b/gdb/dwarf2/read.h @@ -245,6 +245,14 @@ struct dwarf2_per_cu_data functions above. */ std::vector *imported_symtabs = nullptr; + /* The original DW_LANG_* value of the CU, as provided to us by + * DW_AT_language. It is interesting to keep this value around in cases where + * we can't use the values from the language enum, as the mapping to them is + * lossy, and, while that is usually fine, things like the index have an + * understandable bias towards not exposing internal GDB structures to the + * outside world, and so prefer to use DWARF constants in their stead. */ + dwarf_source_language dw_lang; + /* Return true of IMPORTED_SYMTABS is empty or not yet allocated. */ bool imported_symtabs_empty () const { @@ -755,6 +763,10 @@ struct dwarf2_per_objfile std::unique_ptr> m_dwarf2_cus; }; +/* Converts DWARF language names to GDB language names. */ + +enum language dwarf_lang_to_enum_language (unsigned int lang); + /* Get the dwarf2_per_objfile associated to OBJFILE. */ dwarf2_per_objfile *get_dwarf2_per_objfile (struct objfile *objfile); -- 2.40.1