[PATCH 1/2] [gdb/symtab] Add name_of_main and language_of_main to the DWARF index

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

From: Tom de Vries <tdevries@suse.de>
To: gdb-patches@sourceware.org
Subject: [PATCH 1/2] [gdb/symtab] Add name_of_main and language_of_main to the DWARF index
Date: Fri,  6 Oct 2023 20:31:03 +0200	[thread overview]
Message-ID: <20231006183104.27689-2-tdevries@suse.de> (raw)
In-Reply-To: <20231006183104.27689-1-tdevries@suse.de>

From: Matheus Branco Borella <dark.ryu.550@gmail.com>

This patch adds a new section to the DWARF index containing the name
and the language of the main function symbol, gathered from
`cooked_index::get_main`, if available. Currently, for lack of a better name,
this section is called the "shortcut table". The way this name is both saved and
applied upon an index being loaded in mirrors how it is done in
`cooked_index_functions`, more specifically, the full name of the main function
symbol is saved and `set_objfile_main_name` is used to apply it after it is
loaded.

The main use case for this patch is in improving startup times when dealing with
large binaries. Currently, when an index is used, GDB has to expand symtabs
until it finds out what the language of the main function symbol is. For some
large executables, this may take a considerable amount of time to complete,
slowing down startup. This patch bypasses that operation by having both the name
and language of the main function symbol be provided ahead of time by the index.

In my testing (a binary with about 1.8GB worth of DWARF data) this change brings
startup time down from about 34 seconds to about 1.5 seconds.

When testing the patch with target board cc-with-gdb-index, test-case
gdb.fortran/nested-funcs-2.exp starts failing, but this is due to a
pre-existing issue, filed as PR symtab/30946.

Tested on x86_64-linux, with target board unix and cc-with-gdb-index.

PR symtab/24549
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=24549
---
 gdb/NEWS                    |  3 +++
 gdb/doc/gdb.texinfo         | 23 ++++++++++++++--
 gdb/dwarf2/index-write.c    | 54 +++++++++++++++++++++++++++++++------
 gdb/dwarf2/read-gdb-index.c | 54 ++++++++++++++++++++++++++++++++++++-
 gdb/dwarf2/read.c           | 13 +++++++--
 gdb/dwarf2/read.h           | 12 +++++++++
 6 files changed, 146 insertions(+), 13 deletions(-)

diff --git a/gdb/NEWS b/gdb/NEWS
index 2f6378f9c7a..20f36d278bd 100644
--- a/gdb/NEWS
+++ b/gdb/NEWS
@@ -21,6 +21,9 @@
   styling according to the spec.  See https://no-color.org/.
   Styling can be re-enabled with "set style enabled on".
 
+* GDB index now contains information about the main function. This speeds up
+  startup when it is being used for some large binaries.
+
 * The AArch64 'org.gnu.gdb.aarch64.pauth' Pointer Authentication feature string
   has been deprecated in favor of the 'org.gnu.gdb.aarch64.pauth_v2' feature
   string.
diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
index 4932e49b758..db1a82ec838 100644
--- a/gdb/doc/gdb.texinfo
+++ b/gdb/doc/gdb.texinfo
@@ -49663,13 +49663,14 @@ unless otherwise noted:
 
 @enumerate
 @item
-The version number, currently 8.  Versions 1, 2 and 3 are obsolete.
+The version number, currently 9.  Versions 1, 2 and 3 are obsolete.
 Version 4 uses a different hashing function from versions 5 and 6.
 Version 6 includes symbols for inlined functions, whereas versions 4
 and 5 do not.  Version 7 adds attributes to the CU indices in the
 symbol table.  Version 8 specifies that symbols from DWARF type units
 (@samp{DW_TAG_type_unit}) refer to the type unit's symbol table and not the
-compilation unit (@samp{DW_TAG_comp_unit}) using the type.
+compilation unit (@samp{DW_TAG_comp_unit}) using the type.  Version 9 adds
+the name and the language of the main function to the index.
 
 @value{GDBN} will only read version 4, 5, or 6 indices
 by specifying @code{set use-deprecated-index-sections on}.
@@ -49690,6 +49691,9 @@ The offset, from the start of the file, of the address area.
 @item
 The offset, from the start of the file, of the symbol table.
 
+@item
+The offset, from the start of the file, of the shortcut table.
+
 @item
 The offset, from the start of the file, of the constant pool.
 @end enumerate
@@ -49766,6 +49770,21 @@ don't currently have a simple description of the canonicalization
 algorithm; if you intend to create new index sections, you must read
 the code.
 
+@item The shortcut table
+This is a data structure with the following fields:
+
+@table @asis
+@item Language of main
+An @code{offset_type} value indicating the language of the main function as a
+@code{DW_LANG_} constant.  This value will be zero if main function information
+is not present.
+
+@item Name of main
+An @code{offset_type} value indicating the offset of the main function's name
+in the constant pool.  This value must be ignored if the value for the language
+of main is zero.
+@end table
+
 @item
 The constant pool.  This is simply a bunch of bytes.  It is organized
 so that alignment is correct: CU vectors are stored first, followed by
diff --git a/gdb/dwarf2/index-write.c b/gdb/dwarf2/index-write.c
index 3acff266ab3..6ea4217fb22 100644
--- a/gdb/dwarf2/index-write.c
+++ b/gdb/dwarf2/index-write.c
@@ -1079,14 +1079,15 @@ write_gdbindex_1 (FILE *out_file,
 		  const data_buf &types_cu_list,
 		  const data_buf &addr_vec,
 		  const data_buf &symtab_vec,
-		  const data_buf &constant_pool)
+		  const data_buf &constant_pool,
+		  const data_buf &shortcuts)
 {
   data_buf contents;
-  const offset_type size_of_header = 6 * sizeof (offset_type);
+  const offset_type size_of_header = 7 * sizeof (offset_type);
   uint64_t total_len = size_of_header;
 
   /* The version number.  */
-  contents.append_offset (8);
+  contents.append_offset (9);
 
   /* The offset of the CU list from the start of the file.  */
   contents.append_offset (total_len);
@@ -1104,6 +1105,10 @@ write_gdbindex_1 (FILE *out_file,
   contents.append_offset (total_len);
   total_len += symtab_vec.size ();
 
+  /* The offset of the shortcut table from the start of the file.  */
+  contents.append_offset (total_len);
+  total_len += shortcuts.size ();
+
   /* The offset of the constant pool from the start of the file.  */
   contents.append_offset (total_len);
   total_len += constant_pool.size ();
@@ -1125,6 +1130,7 @@ write_gdbindex_1 (FILE *out_file,
   types_cu_list.file_write (out_file);
   addr_vec.file_write (out_file);
   symtab_vec.file_write (out_file);
+  shortcuts.file_write (out_file);
   constant_pool.file_write (out_file);
 
   assert_file_size (out_file, total_len);
@@ -1187,6 +1193,34 @@ write_cooked_index (cooked_index *table,
     }
 }
 
+/* Write shortcut information. */
+
+static void
+write_shortcuts_table (cooked_index *table, data_buf& shortcuts,
+		       data_buf& cpool)
+{
+  const auto main_info = table->get_main ();
+  size_t main_name_offset = 0;
+  dwarf_source_language dw_lang = (dwarf_source_language)0;
+
+  if (main_info != nullptr)
+    {
+      dw_lang = main_info->per_cu->dw_lang;
+
+      if (dw_lang != 0)
+	{
+	  auto_obstack obstack;
+	  const auto main_name = main_info->full_name (&obstack, true);
+
+	  main_name_offset = cpool.size ();
+	  cpool.append_cstr0 (main_name);
+	}
+    }
+
+  shortcuts.append_uint (4, BFD_ENDIAN_LITTLE, dw_lang);
+  shortcuts.append_offset (main_name_offset);
+}
+
 /* Write contents of a .gdb_index section for OBJFILE into OUT_FILE.
    If OBJFILE has an associated dwz file, write contents of a .gdb_index
    section for that dwz file into DWZ_OUT_FILE.  If OBJFILE does not have an
@@ -1263,11 +1297,14 @@ write_gdbindex (dwarf2_per_bfd *per_bfd, cooked_index *table,
 
   write_hash_table (&symtab, symtab_vec, constant_pool);
 
+  data_buf shortcuts;
+  write_shortcuts_table (table, shortcuts, constant_pool);
+
   write_gdbindex_1(out_file, objfile_cu_list, types_cu_list, addr_vec,
-		   symtab_vec, constant_pool);
+		   symtab_vec, constant_pool, shortcuts);
 
   if (dwz_out_file != NULL)
-    write_gdbindex_1 (dwz_out_file, dwz_cu_list, {}, {}, {}, {});
+    write_gdbindex_1 (dwz_out_file, dwz_cu_list, {}, {}, {}, {}, {});
   else
     gdb_assert (dwz_cu_list.empty ());
 }
@@ -1573,8 +1610,9 @@ gdb_index ()
   pretend_data_buf addr_vec;
   pretend_data_buf symtab_vec;
   pretend_data_buf constant_pool;
+  pretend_data_buf short_cuts;
 
-  const size_t size_of_header = 6 * sizeof (offset_type);
+  const size_t size_of_header = 7 * sizeof (offset_type);
 
   /* Test that an overly large index will throw an error.  */
   symtab_vec.set_pretend_size (~(offset_type)0 - size_of_header);
@@ -1584,7 +1622,7 @@ gdb_index ()
   try
     {
       write_gdbindex_1 (nullptr, cu_list, types_cu_list, addr_vec,
-			symtab_vec, constant_pool);
+			symtab_vec, constant_pool, short_cuts);
     }
   catch (const gdb_exception_error &e)
     {
@@ -1604,7 +1642,7 @@ gdb_index ()
   try
     {
       write_gdbindex_1 (nullptr, cu_list, types_cu_list, addr_vec,
-			symtab_vec, constant_pool);
+			symtab_vec, constant_pool, short_cuts);
     }
   catch (const gdb_exception_error &e)
     {
diff --git a/gdb/dwarf2/read-gdb-index.c b/gdb/dwarf2/read-gdb-index.c
index 9bfc5302b0e..b96eaa96e23 100644
--- a/gdb/dwarf2/read-gdb-index.c
+++ b/gdb/dwarf2/read-gdb-index.c
@@ -88,6 +88,9 @@ struct mapped_gdb_index final : public mapped_index_base
   /* A pointer to the constant pool.  */
   gdb::array_view<const gdb_byte> constant_pool;
 
+  /* The shortcut table data. */
+  gdb::array_view<const gdb_byte> shortcut_table;
+
   /* Return the index into the constant pool of the name of the IDXth
      symbol in the symbol table.  */
   offset_type symbol_name_index (offset_type idx) const
@@ -583,7 +586,7 @@ to use the section anyway."),
 
   /* Indexes with higher version than the one supported by GDB may be no
      longer backward compatible.  */
-  if (version > 8)
+  if (version > 9)
     return 0;
 
   map->version = version;
@@ -610,6 +613,16 @@ to use the section anyway."),
 						    symbol_table_end));
 
   ++i;
+
+  if (version >= 9)
+    {
+      const gdb_byte *shortcut_table = addr + metadata[i];
+      const gdb_byte *shortcut_table_end = addr + metadata[i + 1];
+      map->shortcut_table
+	= gdb::array_view<const gdb_byte> (shortcut_table, shortcut_table_end);
+      ++i;
+    }
+
   map->constant_pool = buffer.slice (metadata[i]);
 
   if (map->constant_pool.empty () && !map->symbol_table.empty ())
@@ -758,6 +771,43 @@ create_addrmap_from_gdb_index (dwarf2_per_objfile *per_objfile,
     = new (&per_bfd->obstack) addrmap_fixed (&per_bfd->obstack, &mutable_map);
 }
 
+/* Sets the name and language of the main function from the shortcut table. */
+
+static void
+set_main_name_from_gdb_index (dwarf2_per_objfile *per_objfile,
+			      mapped_gdb_index *index)
+{
+  const auto expected_size = 4 + sizeof (offset_type);
+  if (index->shortcut_table.size () < expected_size)
+    /* The data in the section is not present, is corrupted or is in a version
+     * we don't know about. Regardless, we can't make use of it. */
+    return;
+
+  auto ptr = index->shortcut_table.data ();
+  const auto dw_lang = extract_unsigned_integer (ptr, 4, BFD_ENDIAN_LITTLE);
+  if (dw_lang >= DW_LANG_hi_user)
+    {
+      complaint (_(".gdb_index shortcut table has invalid main language %u"),
+		   (unsigned) dw_lang);
+      return;
+    }
+  if (dw_lang == 0)
+    {
+      /* Don't bother if the language for the main symbol was not known or if
+       * there was no main symbol at all when the index was built. */
+      return;
+    }
+  ptr += 4;
+
+  const auto lang = dwarf_lang_to_enum_language (dw_lang);
+  const auto name_offset = extract_unsigned_integer (ptr,
+						     sizeof (offset_type),
+						     BFD_ENDIAN_LITTLE);
+  const auto name = (const char*) (index->constant_pool.data () + name_offset);
+
+  set_objfile_main_name (per_objfile->objfile, name, (enum language) lang);
+}
+
 /* See read-gdb-index.h.  */
 
 int
@@ -843,6 +893,8 @@ dwarf2_read_gdb_index
 
   create_addrmap_from_gdb_index (per_objfile, map.get ());
 
+  set_main_name_from_gdb_index (per_objfile, map.get ());
+
   per_bfd->index_table = std::move (map);
   per_bfd->quick_file_names_table =
     create_quick_file_names_table (per_bfd->all_units.size ());
diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index 5bbc8e24cf9..d4aec19d31d 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -17796,7 +17796,9 @@ leb128_size (const gdb_byte *buf)
     }
 }
 
-static enum language
+/* Converts DWARF language names to GDB language names. */
+
+enum language
 dwarf_lang_to_enum_language (unsigned int lang)
 {
   enum language language;
@@ -21725,6 +21727,7 @@ prepare_one_comp_unit (struct dwarf2_cu *cu, struct die_info *comp_unit_die,
   /* Set the language we're debugging.  */
   attr = dwarf2_attr (comp_unit_die, DW_AT_language, cu);
   enum language lang;
+  dwarf_source_language dw_lang = (dwarf_source_language)0;
   if (cu->producer != nullptr
       && strstr (cu->producer, "IBM XL C for OpenCL") != NULL)
     {
@@ -21733,18 +21736,24 @@ prepare_one_comp_unit (struct dwarf2_cu *cu, struct die_info *comp_unit_die,
 	 language detection we fall back to the DW_AT_producer
 	 string.  */
       lang = language_opencl;
+      dw_lang = DW_LANG_OpenCL;
     }
   else if (cu->producer != nullptr
 	   && strstr (cu->producer, "GNU Go ") != NULL)
     {
       /* Similar hack for Go.  */
       lang = language_go;
+      dw_lang = DW_LANG_Go;
     }
   else if (attr != nullptr)
-    lang = dwarf_lang_to_enum_language (attr->constant_value (0));
+    {
+      lang = dwarf_lang_to_enum_language (attr->constant_value (0));
+      dw_lang = (dwarf_source_language)attr->constant_value (0);
+    }
   else
     lang = pretend_language;
 
+  cu->per_cu->dw_lang = dw_lang;
   cu->language_defn = language_def (lang);
 
   switch (comp_unit_die->tag)
diff --git a/gdb/dwarf2/read.h b/gdb/dwarf2/read.h
index 9dfc435e861..1d9432c5c11 100644
--- a/gdb/dwarf2/read.h
+++ b/gdb/dwarf2/read.h
@@ -245,6 +245,14 @@ struct dwarf2_per_cu_data
      functions above.  */
   std::vector <dwarf2_per_cu_data *> *imported_symtabs = nullptr;
 
+  /* The original DW_LANG_* value of the CU, as provided to us by
+   * DW_AT_language. It is interesting to keep this value around in cases where
+   * we can't use the values from the language enum, as the mapping to them is
+   * lossy, and, while that is usually fine, things like the index have an
+   * understandable bias towards not exposing internal GDB structures to the
+   * outside world, and so prefer to use DWARF constants in their stead. */
+  dwarf_source_language dw_lang;
+
   /* Return true of IMPORTED_SYMTABS is empty or not yet allocated.  */
   bool imported_symtabs_empty () const
   {
@@ -764,6 +772,10 @@ struct dwarf2_per_objfile
 		     std::unique_ptr<dwarf2_cu>> m_dwarf2_cus;
 };
 
+/* Converts DWARF language names to GDB language names. */
+
+enum language dwarf_lang_to_enum_language (unsigned int lang);
+
 /* Get the dwarf2_per_objfile associated to OBJFILE.  */
 
 dwarf2_per_objfile *get_dwarf2_per_objfile (struct objfile *objfile);
-- 
2.35.3

next prev parent reply	other threads:[~2023-10-06 18:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-06 18:31 [PATCH 0/2] " Tom de Vries
2023-10-06 18:31 ` Tom de Vries [this message]
2023-10-10 19:19   ` [PATCH 1/2] [gdb/symtab] " Tom Tromey
2023-10-10 19:22     ` Tom Tromey
2023-10-11 15:37     ` Tom de Vries
2023-10-23 16:26       ` Aaron Merey
2023-10-24  7:43         ` Tom de Vries
2023-10-24  8:04           ` Tom de Vries
2023-10-24  8:20           ` Tom de Vries
2023-10-24 22:04           ` Aaron Merey
2023-10-06 18:31 ` [PATCH 2/2] [readelf] Handle .gdb_index section version 9 Tom de Vries
2023-10-06 18:33 ` [PATCH 0/2] Add name_of_main and language_of_main to the DWARF index Tom de Vries

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231006183104.27689-2-tdevries@suse.de \
    --to=tdevries@suse.de \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).