Re: [PATCH v3] Add name_of_main and language_of_main to the DWARF index

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

From: "Matheus Branco Borella (DarkRyu550)" <dark.ryu.550@gmail.com>
To: Tom de Vries <tdevries@suse.de>
Cc: gdb-patches@sourceware.org
Subject: Re: [PATCH v3] Add name_of_main and language_of_main to the DWARF index
Date: Mon, 25 Sep 2023 15:47:56 -0300	[thread overview]
Message-ID: <978D09EE-3B5C-4C29-9EF7-672FD45EE063@gmail.com> (raw)
In-Reply-To: <bd53743b-553b-4aed-8dd5-6fe060e91705@suse.de>

The patch should be mostly complete by this point, no? I think I've addressed all of the concerns that were raised.

> On Sep 13, 2023, at 4:09 AM, Tom de Vries <tdevries@suse.de> wrote:
> 
> On 8/14/23 09:31, Tom de Vries via Gdb-patches wrote:
>> On 8/11/23 20:21, Matheus Branco Borella wrote:
>>> This should hopefully be the final version. Has a more descriptive entry in the
>>> NEWS file, but is otherwise the same as v2. I believe the misunderstanding with
>>> my contributor status should have also been sorted out?
>>> 
>> Hi,
>> As per Eli's last comment in this thread, that's indeed the case.
>>> ---
>>> This patch adds a new section to the DWARF index containing the name
>>> and the language of the main function symbol, gathered from
>>> `cooked_index::get_main`, if available. Currently, for lack of a better name,
>>> this section is called the "shortcut table". The way this name is both saved and
>>> applied upon an index being loaded in mirrors how it is done in
>>> `cooked_index_functions`, more specifically, the full name of the main function
>>> symbol is saved and `set_objfile_main_name` is used to apply it after it is
>>> loaded.
>>> 
>>> The main use case for this patch is in improving startup times when dealing with
>>> large binaries. Currently, when an index is used, GDB has to expand symtabs
>>> until it finds out what the language of the main function symbol is. For some
>>> large executables, this may take a considerable amount of time to complete,
>>> slowing down startup. This patch bypasses that operation by having both the name
>>> and language of the main function symbol be provided ahead of time by the index.
>>> 
>>> In my testing (a binary with about 1.8GB worth of DWARF data) this change brings
>>> startup time down from about 34 seconds to about 1.5 seconds.
>>> 
>> I've reviewed the patch, and found a few nits.
>> There are two spots that look unintentional to me, one adding and one
>> removing an empty line.  You might want to remove those.
>> I found this bit:
>> ...
>> +      lang = dwarf_lang_to_enum_language (attr->constant_value (0));
>> +      dw_lang = (dwarf_source_language)attr->constant_value (0);
>> ...
>> I would reformulate as:
>> ...
>> +      dw_lang = (dwarf_source_language)attr->constant_value (0);
>> +      lang = dwarf_lang_to_enum_language (dw_lang);
>> ...
>> but I don't feel strongly about that.
>> I've looked over Tom Tromey's comments earlier in the thread, and I think they all have been addresses.  Furthermore, Eli already approved the documentation part, so I'd say: approved, but wait for one week in case somebody else has other comments.
>> Approved-By: Tom de Vries <tdevries@suse.de>
> 
> Hi,
> 
> any update on this?
> 
> Thanks,
> - Tom
> 
>> Thanks,
>> - Tom
>>> PR symtab/24549
>>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=24549
>>> ---
>>>   gdb/NEWS                    |  3 ++
>>>   gdb/doc/gdb.texinfo         | 23 +++++++++++++--
>>>   gdb/dwarf2/index-write.c    | 47 +++++++++++++++++++++++++++----
>>>   gdb/dwarf2/read-gdb-index.c | 56 +++++++++++++++++++++++++++++++++++--
>>>   gdb/dwarf2/read.c           | 13 +++++++--
>>>   gdb/dwarf2/read.h           | 12 ++++++++
>>>   6 files changed, 143 insertions(+), 11 deletions(-)
>>> 
>>> diff --git a/gdb/NEWS b/gdb/NEWS
>>> index d97e3c15a8..ac455f39f2 100644
>>> --- a/gdb/NEWS
>>> +++ b/gdb/NEWS
>>> @@ -3,6 +3,9 @@
>>>   *** Changes since GDB 13
>>> +* GDB index now contains information about the main function. This speeds up
>>> +  startup when it is being used for some large binaries.
>>> +
>>>   * The AArch64 'org.gnu.gdb.aarch64.pauth' Pointer Authentication feature string
>>>     has been deprecated in favor of the 'org.gnu.gdb.aarch64.pauth_v2' feature
>>>     string.
>>> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
>>> index d1059e0cb7..3b2fdcd19e 100644
>>> --- a/gdb/doc/gdb.texinfo
>>> +++ b/gdb/doc/gdb.texinfo
>>> @@ -49093,13 +49093,14 @@ unless otherwise noted:
>>>   @enumerate
>>>   @item
>>> -The version number, currently 8.  Versions 1, 2 and 3 are obsolete.
>>> +The version number, currently 9.  Versions 1, 2 and 3 are obsolete.
>>>   Version 4 uses a different hashing function from versions 5 and 6.
>>>   Version 6 includes symbols for inlined functions, whereas versions 4
>>>   and 5 do not.  Version 7 adds attributes to the CU indices in the
>>>   symbol table.  Version 8 specifies that symbols from DWARF type units
>>>   (@samp{DW_TAG_type_unit}) refer to the type unit's symbol table and not the
>>> -compilation unit (@samp{DW_TAG_comp_unit}) using the type.
>>> +compilation unit (@samp{DW_TAG_comp_unit}) using the type.  Version 9 adds
>>> +the name and the language of the main function to the index.
>>>   @value{GDBN} will only read version 4, 5, or 6 indices
>>>   by specifying @code{set use-deprecated-index-sections on}.
>>> @@ -49120,6 +49121,9 @@ The offset, from the start of the file, of the address area.
>>>   @item
>>>   The offset, from the start of the file, of the symbol table.
>>> +@item
>>> +The offset, from the start of the file, of the shortcut table.
>>> +
>>>   @item
>>>   The offset, from the start of the file, of the constant pool.
>>>   @end enumerate
>>> @@ -49196,6 +49200,21 @@ don't currently have a simple description of the canonicalization
>>>   algorithm; if you intend to create new index sections, you must read
>>>   the code.
>>> +@item The shortcut table
>>> +This is a data structure with the following fields:
>>> +
>>> +@table @asis
>>> +@item Language of main
>>> +A 32-bit little-endian value indicating the language of the main function as a
>>> +@code{DW_LANG_} constant.  This value will be zero if main function information
>>> +is not present.
>>> +
>>> +@item Name of main
>>> +An @code{offset_type} value indicating the offset of the main function's name
>>> +in the constant pool.  This value must be ignored if the value for the language
>>> +of main is zero.
>>> +@end table
>>> +
>>>   @item
>>>   The constant pool.  This is simply a bunch of bytes.  It is organized
>>>   so that alignment is correct: CU vectors are stored first, followed by
>>> diff --git a/gdb/dwarf2/index-write.c b/gdb/dwarf2/index-write.c
>>> index 62c2cc6ac7..7117a5184b 100644
>>> --- a/gdb/dwarf2/index-write.c
>>> +++ b/gdb/dwarf2/index-write.c
>>> @@ -1080,14 +1080,15 @@ write_gdbindex_1 (FILE *out_file,
>>>             const data_buf &types_cu_list,
>>>             const data_buf &addr_vec,
>>>             const data_buf &symtab_vec,
>>> -          const data_buf &constant_pool)
>>> +          const data_buf &constant_pool,
>>> +          const data_buf &shortcuts)
>>>   {
>>>     data_buf contents;
>>> -  const offset_type size_of_header = 6 * sizeof (offset_type);
>>> +  const offset_type size_of_header = 7 * sizeof (offset_type);
>>>     offset_type total_len = size_of_header;
>>>     /* The version number.  */
>>> -  contents.append_offset (8);
>>> +  contents.append_offset (9);
>>>     /* The offset of the CU list from the start of the file.  */
>>>     contents.append_offset (total_len);
>>> @@ -1105,6 +1106,10 @@ write_gdbindex_1 (FILE *out_file,
>>>     contents.append_offset (total_len);
>>>     total_len += symtab_vec.size ();
>>> +  /* The offset of the shortcut table from the start of the file.  */
>>> +  contents.append_offset (total_len);
>>> +  total_len += shortcuts.size ();
>>> +
>>>     /* The offset of the constant pool from the start of the file.  */
>>>     contents.append_offset (total_len);
>>>     total_len += constant_pool.size ();
>>> @@ -1116,6 +1121,7 @@ write_gdbindex_1 (FILE *out_file,
>>>     types_cu_list.file_write (out_file);
>>>     addr_vec.file_write (out_file);
>>>     symtab_vec.file_write (out_file);
>>> +  shortcuts.file_write (out_file);
>>>     constant_pool.file_write (out_file);
>>>     assert_file_size (out_file, total_len);
>>> @@ -1193,6 +1199,34 @@ write_cooked_index (cooked_index *table,
>>>       }
>>>   }
>>> +/* Write shortcut information. */
>>> +
>>> +static void
>>> +write_shortcuts_table (cooked_index *table, data_buf& shortcuts,
>>> +               data_buf& cpool)
>>> +{
>>> +  const auto main_info = table->get_main ();
>>> +  size_t main_name_offset = 0;
>>> +  dwarf_source_language dw_lang = (dwarf_source_language)0;
>>> +
>>> +  if (main_info != nullptr)
>>> +    {
>>> +      dw_lang = main_info->per_cu->dw_lang;
>>> +
>>> +      if (dw_lang != 0)
>>> +    {
>>> +      auto_obstack obstack;
>>> +      const auto main_name = main_info->full_name (&obstack, true);
>>> +
>>> +      main_name_offset = cpool.size ();
>>> +      cpool.append_cstr0 (main_name);
>>> +    }
>>> +    }
>>> +
>>> +  shortcuts.append_uint (4, BFD_ENDIAN_LITTLE, dw_lang);
>>> +  shortcuts.append_offset (main_name_offset);
>>> +}
>>> +
>>>   /* Write contents of a .gdb_index section for OBJFILE into OUT_FILE.
>>>      If OBJFILE has an associated dwz file, write contents of a .gdb_index
>>>      section for that dwz file into DWZ_OUT_FILE.  If OBJFILE does not have an
>>> @@ -1270,11 +1304,14 @@ write_gdbindex (dwarf2_per_bfd *per_bfd, cooked_index *table,
>>>     write_hash_table (&symtab, symtab_vec, constant_pool);
>>> +  data_buf shortcuts;
>>> +  write_shortcuts_table (table, shortcuts, constant_pool);
>>> +
>>>     write_gdbindex_1(out_file, objfile_cu_list, types_cu_list, addr_vec,
>>> -           symtab_vec, constant_pool);
>>> +           symtab_vec, constant_pool, shortcuts);
>>>     if (dwz_out_file != NULL)
>>> -    write_gdbindex_1 (dwz_out_file, dwz_cu_list, {}, {}, {}, {});
>>> +    write_gdbindex_1 (dwz_out_file, dwz_cu_list, {}, {}, {}, {}, {});
>>>     else
>>>       gdb_assert (dwz_cu_list.empty ());
>>>   }
>>> diff --git a/gdb/dwarf2/read-gdb-index.c b/gdb/dwarf2/read-gdb-index.c
>>> index 1006386cb2..f09c5ba234 100644
>>> --- a/gdb/dwarf2/read-gdb-index.c
>>> +++ b/gdb/dwarf2/read-gdb-index.c
>>> @@ -88,6 +88,9 @@ struct mapped_gdb_index final : public mapped_index_base
>>>     /* A pointer to the constant pool.  */
>>>     gdb::array_view<const gdb_byte> constant_pool;
>>> +  /* The shortcut table data. */
>>> +  gdb::array_view<const gdb_byte> shortcut_table;
>>> +
>>>     /* Return the index into the constant pool of the name of the IDXth
>>>        symbol in the symbol table.  */
>>>     offset_type symbol_name_index (offset_type idx) const
>>> @@ -166,6 +169,7 @@ dwarf2_gdb_index::dump (struct objfile *objfile)
>>>     mapped_gdb_index *index = (gdb::checked_static_cast<mapped_gdb_index *>
>>>                    (per_objfile->per_bfd->index_table.get ()));
>>> +
>>>     gdb_printf (".gdb_index: version %d\n", index->version);
>>>     gdb_printf ("\n");
>>>   }
>>> @@ -583,7 +587,7 @@ to use the section anyway."),
>>>     /* Indexes with higher version than the one supported by GDB may be no
>>>        longer backward compatible.  */
>>> -  if (version > 8)
>>> +  if (version > 9)
>>>       return 0;
>>>     map->version = version;
>>> @@ -608,8 +612,17 @@ to use the section anyway."),
>>>     map->symbol_table
>>>       = offset_view (gdb::array_view<const gdb_byte> (symbol_table,
>>>                               symbol_table_end));
>>> -
>>>     ++i;
>>> +
>>> +  if (version >= 9)
>>> +    {
>>> +      const gdb_byte *shortcut_table = addr + metadata[i];
>>> +      const gdb_byte *shortcut_table_end = addr + metadata[i + 1];
>>> +      map->shortcut_table
>>> +    = gdb::array_view<const gdb_byte> (shortcut_table, shortcut_table_end);
>>> +      ++i;
>>> +    }
>>> +
>>>     map->constant_pool = buffer.slice (metadata[i]);
>>>     if (map->constant_pool.empty () && !map->symbol_table.empty ())
>>> @@ -763,6 +776,43 @@ create_addrmap_from_gdb_index (dwarf2_per_objfile *per_objfile,
>>>       = new (&per_bfd->obstack) addrmap_fixed (&per_bfd->obstack, &mutable_map);
>>>   }
>>> +/* Sets the name and language of the main function from the shortcut table. */
>>> +
>>> +static void
>>> +set_main_name_from_gdb_index (dwarf2_per_objfile *per_objfile,
>>> +                  mapped_gdb_index *index)
>>> +{
>>> +  const auto expected_size = 4 + sizeof (offset_type);
>>> +  if (index->shortcut_table.size () < expected_size)
>>> +    /* The data in the section is not present, is corrupted or is in a version
>>> +     * we don't know about. Regardless, we can't make use of it. */
>>> +    return;
>>> +
>>> +  auto ptr = index->shortcut_table.data ();
>>> +  const auto dw_lang = extract_unsigned_integer (ptr, 4, BFD_ENDIAN_LITTLE);
>>> +  if (dw_lang >= DW_LANG_hi_user)
>>> +    {
>>> +      complaint (_(".gdb_index shortcut table has invalid main language %u"),
>>> +           (unsigned) dw_lang);
>>> +      return;
>>> +    }
>>> +  if (dw_lang == 0)
>>> +    {
>>> +      /* Don't bother if the language for the main symbol was not known or if
>>> +       * there was no main symbol at all when the index was built. */
>>> +      return;
>>> +    }
>>> +  ptr += 4;
>>> +
>>> +  const auto lang = dwarf_lang_to_enum_language (dw_lang);
>>> +  const auto name_offset = extract_unsigned_integer (ptr,
>>> +                             sizeof (offset_type),
>>> +                             BFD_ENDIAN_LITTLE);
>>> +  const auto name = (const char*) (index->constant_pool.data () + name_offset);
>>> +
>>> +  set_objfile_main_name (per_objfile->objfile, name, (enum language) lang);
>>> +}
>>> +
>>>   /* See read-gdb-index.h.  */
>>>   int
>>> @@ -848,6 +898,8 @@ dwarf2_read_gdb_index
>>>     create_addrmap_from_gdb_index (per_objfile, map.get ());
>>> +  set_main_name_from_gdb_index (per_objfile, map.get ());
>>> +
>>>     per_bfd->index_table = std::move (map);
>>>     per_bfd->quick_file_names_table =
>>>       create_quick_file_names_table (per_bfd->all_units.size ());
>>> diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
>>> index 4828409222..89acd94c05 100644
>>> --- a/gdb/dwarf2/read.c
>>> +++ b/gdb/dwarf2/read.c
>>> @@ -17745,7 +17745,9 @@ leb128_size (const gdb_byte *buf)
>>>       }
>>>   }
>>> -static enum language
>>> +/* Converts DWARF language names to GDB language names. */
>>> +
>>> +enum language
>>>   dwarf_lang_to_enum_language (unsigned int lang)
>>>   {
>>>     enum language language;
>>> @@ -21661,6 +21663,7 @@ prepare_one_comp_unit (struct dwarf2_cu *cu, struct die_info *comp_unit_die,
>>>     /* Set the language we're debugging.  */
>>>     attr = dwarf2_attr (comp_unit_die, DW_AT_language, cu);
>>>     enum language lang;
>>> +  dwarf_source_language dw_lang = (dwarf_source_language)0;
>>>     if (cu->producer != nullptr
>>>         && strstr (cu->producer, "IBM XL C for OpenCL") != NULL)
>>>       {
>>> @@ -21669,18 +21672,24 @@ prepare_one_comp_unit (struct dwarf2_cu *cu, struct die_info *comp_unit_die,
>>>        language detection we fall back to the DW_AT_producer
>>>        string.  */
>>>         lang = language_opencl;
>>> +      dw_lang = DW_LANG_OpenCL;
>>>       }
>>>     else if (cu->producer != nullptr
>>>          && strstr (cu->producer, "GNU Go ") != NULL)
>>>       {
>>>         /* Similar hack for Go.  */
>>>         lang = language_go;
>>> +      dw_lang = DW_LANG_Go;
>>>       }
>>>     else if (attr != nullptr)
>>> -    lang = dwarf_lang_to_enum_language (attr->constant_value (0));
>>> +    {
>>> +      lang = dwarf_lang_to_enum_language (attr->constant_value (0));
>>> +      dw_lang = (dwarf_source_language)attr->constant_value (0);
>>> +    }
>>>     else
>>>       lang = pretend_language;
>>> +  cu->per_cu->dw_lang = dw_lang;
>>>     cu->language_defn = language_def (lang);
>>>     switch (comp_unit_die->tag)
>>> diff --git a/gdb/dwarf2/read.h b/gdb/dwarf2/read.h
>>> index 37023a2070..6707c400cf 100644
>>> --- a/gdb/dwarf2/read.h
>>> +++ b/gdb/dwarf2/read.h
>>> @@ -245,6 +245,14 @@ struct dwarf2_per_cu_data
>>>        functions above.  */
>>>     std::vector <dwarf2_per_cu_data *> *imported_symtabs = nullptr;
>>> +  /* The original DW_LANG_* value of the CU, as provided to us by
>>> +   * DW_AT_language. It is interesting to keep this value around in cases where
>>> +   * we can't use the values from the language enum, as the mapping to them is
>>> +   * lossy, and, while that is usually fine, things like the index have an
>>> +   * understandable bias towards not exposing internal GDB structures to the
>>> +   * outside world, and so prefer to use DWARF constants in their stead. */
>>> +  dwarf_source_language dw_lang;
>>> +
>>>     /* Return true of IMPORTED_SYMTABS is empty or not yet allocated.  */
>>>     bool imported_symtabs_empty () const
>>>     {
>>> @@ -755,6 +763,10 @@ struct dwarf2_per_objfile
>>>                std::unique_ptr<dwarf2_cu>> m_dwarf2_cus;
>>>   };
>>> +/* Converts DWARF language names to GDB language names. */
>>> +
>>> +enum language dwarf_lang_to_enum_language (unsigned int lang);
>>> +
>>>   /* Get the dwarf2_per_objfile associated to OBJFILE.  */
>>>   dwarf2_per_objfile *get_dwarf2_per_objfile (struct objfile *objfile);

next prev parent reply	other threads:[~2023-09-25 18:48 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-08 21:40 [PATCH] " Matheus Branco Borella
2023-06-09 16:56 ` Tom Tromey
2023-06-30 20:36   ` Matheus Branco Borella
2023-07-01  5:47     ` Eli Zaretskii
2023-07-07 15:00       ` Matheus Branco Borella
2023-07-07 18:00         ` Eli Zaretskii
2023-08-04 20:55           ` Tom de Vries
2023-08-05  5:36             ` Eli Zaretskii
2023-08-03  7:12         ` Tom de Vries
2023-08-03  7:29         ` Tom de Vries
2023-08-04 18:09           ` [PATCH v2] " Matheus Branco Borella
2023-08-11 18:21 ` [PATCH v3] " Matheus Branco Borella
2023-08-14  7:31   ` Tom de Vries
2023-08-14  7:36     ` Tom de Vries
2023-09-13  7:09     ` Tom de Vries
2023-09-25 18:47       ` Matheus Branco Borella (DarkRyu550) [this message]
2023-09-26 14:07         ` Tom de Vries
2023-10-04 22:30           ` Tom de Vries

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=978D09EE-3B5C-4C29-9EF7-672FD45EE063@gmail.com \
    --to=dark.ryu.550@gmail.com \
    --cc=gdb-patches@sourceware.org \
    --cc=tdevries@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).