From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by sourceware.org (Postfix) with ESMTPS id 4F42E3858D3C for ; Mon, 25 Sep 2023 18:48:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4F42E3858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1c60cec8041so19426985ad.3 for ; Mon, 25 Sep 2023 11:48:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695667690; x=1696272490; darn=sourceware.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=M/JWpy6TF2jHPxfPykqctjlgnPxPJcUZO3ZyLsh0PBE=; b=cjokkZw9I8iJ6ojwNTfZNe/ZstEmmCtBQZOxUlHkE3+Ii1ITZG+9VWES/oX/k0mVD8 0h+WWtvoFgY6BI9/0IiUwuSWeAmPj/QjEE/bnDIJpCIzZpQ7Sy+utH9TZdlyp2BJ73Tz XedCxToLCMY7GKbPCj0dztjVWzbJyLfmfQ5/1vSL3kUhc5VHzIuRdNflhWX+Povc1224 H2KRtDtpzVoqiDGgHbPVg1rztifol6Ayg25/kM2W+f6Kro+NF3iknIUJV4naVaWvpG0f 6PTzWu7ZL8YiIZSWxLuTV+5nWO1DdbdwtwcN2NUaKjqfVbAEYnrDdu5uVIfE+WzkZYOM Q33A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695667690; x=1696272490; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=M/JWpy6TF2jHPxfPykqctjlgnPxPJcUZO3ZyLsh0PBE=; b=kY/F//pL+nutu7baSXncyqzdn6G3dMaE73YdctbHVgFZn2jSt5NOzXeW1kIhdBcJgL PcWxfVnP+tV6vik8SpSrEZtOLF6pU7KLq33Sq0Jn9FkDLc+CunrmYFlzCxy0w6NuKaXt vpJSYApbJaK/VquU+ey5PVYBrgzRduHkmXB6Rt48+H9OIGrTIqsoiqJD/hWsNtTmS7TH 1hBki/ezqw1+AqMmYJjyZMbzpk5L2pREYQHuAbZAmsZOFrAlg7lYosaTQNW7wup7Q4Sa ehzvnJVVMokxT9LN9sXTPyT2i5i7a4p9HL8SA/x/L55NH7h9rGGciQBqlP/nt2ab7o6W yErg== X-Gm-Message-State: AOJu0Yzo+baPEp08ZWOdTb1404TCWNaN5pFVDTPKNi6jlvjhvQvwwp+s Wp/0n7wGbF2YLnRzgTzBFforgcIG2tc= X-Google-Smtp-Source: AGHT+IG9+kyqt0OyUKanTR1IaligfqeYp/LRX2Hw4qh/LtGmd5w0mZJ22nLU2ALxqzgzcfULoNYu5w== X-Received: by 2002:a17:902:b7cb:b0:1bd:ca80:6fe6 with SMTP id v11-20020a170902b7cb00b001bdca806fe6mr4678659plz.41.1695667690033; Mon, 25 Sep 2023 11:48:10 -0700 (PDT) Received: from smtpclient.apple ([2804:10f8:a425:9e00:c126:6869:9ed2:515a]) by smtp.gmail.com with ESMTPSA id iw12-20020a170903044c00b001bbf7fd354csm9140741plb.213.2023.09.25.11.48.08 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Sep 2023 11:48:09 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.100.2.1.2\)) Subject: Re: [PATCH v3] Add name_of_main and language_of_main to the DWARF index From: "Matheus Branco Borella (DarkRyu550)" In-Reply-To: Date: Mon, 25 Sep 2023 15:47:56 -0300 Cc: gdb-patches@sourceware.org Content-Transfer-Encoding: quoted-printable Message-Id: <978D09EE-3B5C-4C29-9EF7-672FD45EE063@gmail.com> References: <20230608214012.1561-1-dark.ryu.550@gmail.com> <20230811182131.492-1-dark.ryu.550@gmail.com> To: Tom de Vries X-Mailer: Apple Mail (2.3774.100.2.1.2) X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: The patch should be mostly complete by this point, no? I think I've = addressed all of the concerns that were raised. > On Sep 13, 2023, at 4:09=E2=80=AFAM, Tom de Vries = wrote: >=20 > On 8/14/23 09:31, Tom de Vries via Gdb-patches wrote: >> On 8/11/23 20:21, Matheus Branco Borella wrote: >>> This should hopefully be the final version. Has a more descriptive = entry in the >>> NEWS file, but is otherwise the same as v2. I believe the = misunderstanding with >>> my contributor status should have also been sorted out? >>>=20 >> Hi, >> As per Eli's last comment in this thread, that's indeed the case. >>> --- >>> This patch adds a new section to the DWARF index containing the name >>> and the language of the main function symbol, gathered from >>> `cooked_index::get_main`, if available. Currently, for lack of a = better name, >>> this section is called the "shortcut table". The way this name is = both saved and >>> applied upon an index being loaded in mirrors how it is done in >>> `cooked_index_functions`, more specifically, the full name of the = main function >>> symbol is saved and `set_objfile_main_name` is used to apply it = after it is >>> loaded. >>>=20 >>> The main use case for this patch is in improving startup times when = dealing with >>> large binaries. Currently, when an index is used, GDB has to expand = symtabs >>> until it finds out what the language of the main function symbol is. = For some >>> large executables, this may take a considerable amount of time to = complete, >>> slowing down startup. This patch bypasses that operation by having = both the name >>> and language of the main function symbol be provided ahead of time = by the index. >>>=20 >>> In my testing (a binary with about 1.8GB worth of DWARF data) this = change brings >>> startup time down from about 34 seconds to about 1.5 seconds. >>>=20 >> I've reviewed the patch, and found a few nits. >> There are two spots that look unintentional to me, one adding and one >> removing an empty line. You might want to remove those. >> I found this bit: >> ... >> + lang =3D dwarf_lang_to_enum_language (attr->constant_value = (0)); >> + dw_lang =3D (dwarf_source_language)attr->constant_value (0); >> ... >> I would reformulate as: >> ... >> + dw_lang =3D (dwarf_source_language)attr->constant_value (0); >> + lang =3D dwarf_lang_to_enum_language (dw_lang); >> ... >> but I don't feel strongly about that. >> I've looked over Tom Tromey's comments earlier in the thread, and I = think they all have been addresses. Furthermore, Eli already approved = the documentation part, so I'd say: approved, but wait for one week in = case somebody else has other comments. >> Approved-By: Tom de Vries >=20 > Hi, >=20 > any update on this? >=20 > Thanks, > - Tom >=20 >> Thanks, >> - Tom >>> PR symtab/24549 >>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=3D24549 >>> --- >>> gdb/NEWS | 3 ++ >>> gdb/doc/gdb.texinfo | 23 +++++++++++++-- >>> gdb/dwarf2/index-write.c | 47 +++++++++++++++++++++++++++---- >>> gdb/dwarf2/read-gdb-index.c | 56 = +++++++++++++++++++++++++++++++++++-- >>> gdb/dwarf2/read.c | 13 +++++++-- >>> gdb/dwarf2/read.h | 12 ++++++++ >>> 6 files changed, 143 insertions(+), 11 deletions(-) >>>=20 >>> diff --git a/gdb/NEWS b/gdb/NEWS >>> index d97e3c15a8..ac455f39f2 100644 >>> --- a/gdb/NEWS >>> +++ b/gdb/NEWS >>> @@ -3,6 +3,9 @@ >>> *** Changes since GDB 13 >>> +* GDB index now contains information about the main function. This = speeds up >>> + startup when it is being used for some large binaries. >>> + >>> * The AArch64 'org.gnu.gdb.aarch64.pauth' Pointer Authentication = feature string >>> has been deprecated in favor of the = 'org.gnu.gdb.aarch64.pauth_v2' feature >>> string. >>> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo >>> index d1059e0cb7..3b2fdcd19e 100644 >>> --- a/gdb/doc/gdb.texinfo >>> +++ b/gdb/doc/gdb.texinfo >>> @@ -49093,13 +49093,14 @@ unless otherwise noted: >>> @enumerate >>> @item >>> -The version number, currently 8. Versions 1, 2 and 3 are obsolete. >>> +The version number, currently 9. Versions 1, 2 and 3 are obsolete. >>> Version 4 uses a different hashing function from versions 5 and 6. >>> Version 6 includes symbols for inlined functions, whereas versions = 4 >>> and 5 do not. Version 7 adds attributes to the CU indices in the >>> symbol table. Version 8 specifies that symbols from DWARF type = units >>> (@samp{DW_TAG_type_unit}) refer to the type unit's symbol table = and not the >>> -compilation unit (@samp{DW_TAG_comp_unit}) using the type. >>> +compilation unit (@samp{DW_TAG_comp_unit}) using the type. Version = 9 adds >>> +the name and the language of the main function to the index. >>> @value{GDBN} will only read version 4, 5, or 6 indices >>> by specifying @code{set use-deprecated-index-sections on}. >>> @@ -49120,6 +49121,9 @@ The offset, from the start of the file, of = the address area. >>> @item >>> The offset, from the start of the file, of the symbol table. >>> +@item >>> +The offset, from the start of the file, of the shortcut table. >>> + >>> @item >>> The offset, from the start of the file, of the constant pool. >>> @end enumerate >>> @@ -49196,6 +49200,21 @@ don't currently have a simple description = of the canonicalization >>> algorithm; if you intend to create new index sections, you must = read >>> the code. >>> +@item The shortcut table >>> +This is a data structure with the following fields: >>> + >>> +@table @asis >>> +@item Language of main >>> +A 32-bit little-endian value indicating the language of the main = function as a >>> +@code{DW_LANG_} constant. This value will be zero if main function = information >>> +is not present. >>> + >>> +@item Name of main >>> +An @code{offset_type} value indicating the offset of the main = function's name >>> +in the constant pool. This value must be ignored if the value for = the language >>> +of main is zero. >>> +@end table >>> + >>> @item >>> The constant pool. This is simply a bunch of bytes. It is = organized >>> so that alignment is correct: CU vectors are stored first, = followed by >>> diff --git a/gdb/dwarf2/index-write.c b/gdb/dwarf2/index-write.c >>> index 62c2cc6ac7..7117a5184b 100644 >>> --- a/gdb/dwarf2/index-write.c >>> +++ b/gdb/dwarf2/index-write.c >>> @@ -1080,14 +1080,15 @@ write_gdbindex_1 (FILE *out_file, >>> const data_buf &types_cu_list, >>> const data_buf &addr_vec, >>> const data_buf &symtab_vec, >>> - const data_buf &constant_pool) >>> + const data_buf &constant_pool, >>> + const data_buf &shortcuts) >>> { >>> data_buf contents; >>> - const offset_type size_of_header =3D 6 * sizeof (offset_type); >>> + const offset_type size_of_header =3D 7 * sizeof (offset_type); >>> offset_type total_len =3D size_of_header; >>> /* The version number. */ >>> - contents.append_offset (8); >>> + contents.append_offset (9); >>> /* The offset of the CU list from the start of the file. */ >>> contents.append_offset (total_len); >>> @@ -1105,6 +1106,10 @@ write_gdbindex_1 (FILE *out_file, >>> contents.append_offset (total_len); >>> total_len +=3D symtab_vec.size (); >>> + /* The offset of the shortcut table from the start of the file. = */ >>> + contents.append_offset (total_len); >>> + total_len +=3D shortcuts.size (); >>> + >>> /* The offset of the constant pool from the start of the file. = */ >>> contents.append_offset (total_len); >>> total_len +=3D constant_pool.size (); >>> @@ -1116,6 +1121,7 @@ write_gdbindex_1 (FILE *out_file, >>> types_cu_list.file_write (out_file); >>> addr_vec.file_write (out_file); >>> symtab_vec.file_write (out_file); >>> + shortcuts.file_write (out_file); >>> constant_pool.file_write (out_file); >>> assert_file_size (out_file, total_len); >>> @@ -1193,6 +1199,34 @@ write_cooked_index (cooked_index *table, >>> } >>> } >>> +/* Write shortcut information. */ >>> + >>> +static void >>> +write_shortcuts_table (cooked_index *table, data_buf& shortcuts, >>> + data_buf& cpool) >>> +{ >>> + const auto main_info =3D table->get_main (); >>> + size_t main_name_offset =3D 0; >>> + dwarf_source_language dw_lang =3D (dwarf_source_language)0; >>> + >>> + if (main_info !=3D nullptr) >>> + { >>> + dw_lang =3D main_info->per_cu->dw_lang; >>> + >>> + if (dw_lang !=3D 0) >>> + { >>> + auto_obstack obstack; >>> + const auto main_name =3D main_info->full_name (&obstack, = true); >>> + >>> + main_name_offset =3D cpool.size (); >>> + cpool.append_cstr0 (main_name); >>> + } >>> + } >>> + >>> + shortcuts.append_uint (4, BFD_ENDIAN_LITTLE, dw_lang); >>> + shortcuts.append_offset (main_name_offset); >>> +} >>> + >>> /* Write contents of a .gdb_index section for OBJFILE into = OUT_FILE. >>> If OBJFILE has an associated dwz file, write contents of a = .gdb_index >>> section for that dwz file into DWZ_OUT_FILE. If OBJFILE does = not have an >>> @@ -1270,11 +1304,14 @@ write_gdbindex (dwarf2_per_bfd *per_bfd, = cooked_index *table, >>> write_hash_table (&symtab, symtab_vec, constant_pool); >>> + data_buf shortcuts; >>> + write_shortcuts_table (table, shortcuts, constant_pool); >>> + >>> write_gdbindex_1(out_file, objfile_cu_list, types_cu_list, = addr_vec, >>> - symtab_vec, constant_pool); >>> + symtab_vec, constant_pool, shortcuts); >>> if (dwz_out_file !=3D NULL) >>> - write_gdbindex_1 (dwz_out_file, dwz_cu_list, {}, {}, {}, {}); >>> + write_gdbindex_1 (dwz_out_file, dwz_cu_list, {}, {}, {}, {}, = {}); >>> else >>> gdb_assert (dwz_cu_list.empty ()); >>> } >>> diff --git a/gdb/dwarf2/read-gdb-index.c = b/gdb/dwarf2/read-gdb-index.c >>> index 1006386cb2..f09c5ba234 100644 >>> --- a/gdb/dwarf2/read-gdb-index.c >>> +++ b/gdb/dwarf2/read-gdb-index.c >>> @@ -88,6 +88,9 @@ struct mapped_gdb_index final : public = mapped_index_base >>> /* A pointer to the constant pool. */ >>> gdb::array_view constant_pool; >>> + /* The shortcut table data. */ >>> + gdb::array_view shortcut_table; >>> + >>> /* Return the index into the constant pool of the name of the = IDXth >>> symbol in the symbol table. */ >>> offset_type symbol_name_index (offset_type idx) const >>> @@ -166,6 +169,7 @@ dwarf2_gdb_index::dump (struct objfile *objfile) >>> mapped_gdb_index *index =3D = (gdb::checked_static_cast >>> (per_objfile->per_bfd->index_table.get ())); >>> + >>> gdb_printf (".gdb_index: version %d\n", index->version); >>> gdb_printf ("\n"); >>> } >>> @@ -583,7 +587,7 @@ to use the section anyway."), >>> /* Indexes with higher version than the one supported by GDB may = be no >>> longer backward compatible. */ >>> - if (version > 8) >>> + if (version > 9) >>> return 0; >>> map->version =3D version; >>> @@ -608,8 +612,17 @@ to use the section anyway."), >>> map->symbol_table >>> =3D offset_view (gdb::array_view = (symbol_table, >>> symbol_table_end)); >>> - >>> ++i; >>> + >>> + if (version >=3D 9) >>> + { >>> + const gdb_byte *shortcut_table =3D addr + metadata[i]; >>> + const gdb_byte *shortcut_table_end =3D addr + metadata[i + = 1]; >>> + map->shortcut_table >>> + =3D gdb::array_view (shortcut_table, = shortcut_table_end); >>> + ++i; >>> + } >>> + >>> map->constant_pool =3D buffer.slice (metadata[i]); >>> if (map->constant_pool.empty () && !map->symbol_table.empty ()) >>> @@ -763,6 +776,43 @@ create_addrmap_from_gdb_index = (dwarf2_per_objfile *per_objfile, >>> =3D new (&per_bfd->obstack) addrmap_fixed (&per_bfd->obstack, = &mutable_map); >>> } >>> +/* Sets the name and language of the main function from the = shortcut table. */ >>> + >>> +static void >>> +set_main_name_from_gdb_index (dwarf2_per_objfile *per_objfile, >>> + mapped_gdb_index *index) >>> +{ >>> + const auto expected_size =3D 4 + sizeof (offset_type); >>> + if (index->shortcut_table.size () < expected_size) >>> + /* The data in the section is not present, is corrupted or is = in a version >>> + * we don't know about. Regardless, we can't make use of it. */ >>> + return; >>> + >>> + auto ptr =3D index->shortcut_table.data (); >>> + const auto dw_lang =3D extract_unsigned_integer (ptr, 4, = BFD_ENDIAN_LITTLE); >>> + if (dw_lang >=3D DW_LANG_hi_user) >>> + { >>> + complaint (_(".gdb_index shortcut table has invalid main = language %u"), >>> + (unsigned) dw_lang); >>> + return; >>> + } >>> + if (dw_lang =3D=3D 0) >>> + { >>> + /* Don't bother if the language for the main symbol was not = known or if >>> + * there was no main symbol at all when the index was built. = */ >>> + return; >>> + } >>> + ptr +=3D 4; >>> + >>> + const auto lang =3D dwarf_lang_to_enum_language (dw_lang); >>> + const auto name_offset =3D extract_unsigned_integer (ptr, >>> + sizeof (offset_type), >>> + BFD_ENDIAN_LITTLE); >>> + const auto name =3D (const char*) (index->constant_pool.data () + = name_offset); >>> + >>> + set_objfile_main_name (per_objfile->objfile, name, (enum = language) lang); >>> +} >>> + >>> /* See read-gdb-index.h. */ >>> int >>> @@ -848,6 +898,8 @@ dwarf2_read_gdb_index >>> create_addrmap_from_gdb_index (per_objfile, map.get ()); >>> + set_main_name_from_gdb_index (per_objfile, map.get ()); >>> + >>> per_bfd->index_table =3D std::move (map); >>> per_bfd->quick_file_names_table =3D >>> create_quick_file_names_table (per_bfd->all_units.size ()); >>> diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c >>> index 4828409222..89acd94c05 100644 >>> --- a/gdb/dwarf2/read.c >>> +++ b/gdb/dwarf2/read.c >>> @@ -17745,7 +17745,9 @@ leb128_size (const gdb_byte *buf) >>> } >>> } >>> -static enum language >>> +/* Converts DWARF language names to GDB language names. */ >>> + >>> +enum language >>> dwarf_lang_to_enum_language (unsigned int lang) >>> { >>> enum language language; >>> @@ -21661,6 +21663,7 @@ prepare_one_comp_unit (struct dwarf2_cu *cu, = struct die_info *comp_unit_die, >>> /* Set the language we're debugging. */ >>> attr =3D dwarf2_attr (comp_unit_die, DW_AT_language, cu); >>> enum language lang; >>> + dwarf_source_language dw_lang =3D (dwarf_source_language)0; >>> if (cu->producer !=3D nullptr >>> && strstr (cu->producer, "IBM XL C for OpenCL") !=3D NULL) >>> { >>> @@ -21669,18 +21672,24 @@ prepare_one_comp_unit (struct dwarf2_cu = *cu, struct die_info *comp_unit_die, >>> language detection we fall back to the DW_AT_producer >>> string. */ >>> lang =3D language_opencl; >>> + dw_lang =3D DW_LANG_OpenCL; >>> } >>> else if (cu->producer !=3D nullptr >>> && strstr (cu->producer, "GNU Go ") !=3D NULL) >>> { >>> /* Similar hack for Go. */ >>> lang =3D language_go; >>> + dw_lang =3D DW_LANG_Go; >>> } >>> else if (attr !=3D nullptr) >>> - lang =3D dwarf_lang_to_enum_language (attr->constant_value = (0)); >>> + { >>> + lang =3D dwarf_lang_to_enum_language (attr->constant_value = (0)); >>> + dw_lang =3D (dwarf_source_language)attr->constant_value (0); >>> + } >>> else >>> lang =3D pretend_language; >>> + cu->per_cu->dw_lang =3D dw_lang; >>> cu->language_defn =3D language_def (lang); >>> switch (comp_unit_die->tag) >>> diff --git a/gdb/dwarf2/read.h b/gdb/dwarf2/read.h >>> index 37023a2070..6707c400cf 100644 >>> --- a/gdb/dwarf2/read.h >>> +++ b/gdb/dwarf2/read.h >>> @@ -245,6 +245,14 @@ struct dwarf2_per_cu_data >>> functions above. */ >>> std::vector *imported_symtabs =3D = nullptr; >>> + /* The original DW_LANG_* value of the CU, as provided to us by >>> + * DW_AT_language. It is interesting to keep this value around in = cases where >>> + * we can't use the values from the language enum, as the mapping = to them is >>> + * lossy, and, while that is usually fine, things like the index = have an >>> + * understandable bias towards not exposing internal GDB = structures to the >>> + * outside world, and so prefer to use DWARF constants in their = stead. */ >>> + dwarf_source_language dw_lang; >>> + >>> /* Return true of IMPORTED_SYMTABS is empty or not yet = allocated. */ >>> bool imported_symtabs_empty () const >>> { >>> @@ -755,6 +763,10 @@ struct dwarf2_per_objfile >>> std::unique_ptr> m_dwarf2_cus; >>> }; >>> +/* Converts DWARF language names to GDB language names. */ >>> + >>> +enum language dwarf_lang_to_enum_language (unsigned int lang); >>> + >>> /* Get the dwarf2_per_objfile associated to OBJFILE. */ >>> dwarf2_per_objfile *get_dwarf2_per_objfile (struct objfile = *objfile);