From: Tom de Vries <tdevries@suse.de>
To: Matheus Branco Borella <dark.ryu.550@gmail.com>,
gdb-patches@sourceware.org
Cc: tom@tromey.com
Subject: Re: [PATCH v3] Add name_of_main and language_of_main to the DWARF index
Date: Wed, 13 Sep 2023 09:09:12 +0200 [thread overview]
Message-ID: <bd53743b-553b-4aed-8dd5-6fe060e91705@suse.de> (raw)
In-Reply-To: <ad5e6d5c-2520-0458-1308-6ee1f7044389@suse.de>
On 8/14/23 09:31, Tom de Vries via Gdb-patches wrote:
> On 8/11/23 20:21, Matheus Branco Borella wrote:
>> This should hopefully be the final version. Has a more descriptive
>> entry in the
>> NEWS file, but is otherwise the same as v2. I believe the
>> misunderstanding with
>> my contributor status should have also been sorted out?
>>
>
> Hi,
>
> As per Eli's last comment in this thread, that's indeed the case.
>
>> ---
>> This patch adds a new section to the DWARF index containing the name
>> and the language of the main function symbol, gathered from
>> `cooked_index::get_main`, if available. Currently, for lack of a
>> better name,
>> this section is called the "shortcut table". The way this name is both
>> saved and
>> applied upon an index being loaded in mirrors how it is done in
>> `cooked_index_functions`, more specifically, the full name of the main
>> function
>> symbol is saved and `set_objfile_main_name` is used to apply it after
>> it is
>> loaded.
>>
>> The main use case for this patch is in improving startup times when
>> dealing with
>> large binaries. Currently, when an index is used, GDB has to expand
>> symtabs
>> until it finds out what the language of the main function symbol is.
>> For some
>> large executables, this may take a considerable amount of time to
>> complete,
>> slowing down startup. This patch bypasses that operation by having
>> both the name
>> and language of the main function symbol be provided ahead of time by
>> the index.
>>
>> In my testing (a binary with about 1.8GB worth of DWARF data) this
>> change brings
>> startup time down from about 34 seconds to about 1.5 seconds.
>>
>
> I've reviewed the patch, and found a few nits.
>
> There are two spots that look unintentional to me, one adding and one
> removing an empty line. You might want to remove those.
>
> I found this bit:
> ...
> + lang = dwarf_lang_to_enum_language (attr->constant_value (0));
> + dw_lang = (dwarf_source_language)attr->constant_value (0);
> ...
>
> I would reformulate as:
> ...
> + dw_lang = (dwarf_source_language)attr->constant_value (0);
> + lang = dwarf_lang_to_enum_language (dw_lang);
> ...
> but I don't feel strongly about that.
>
> I've looked over Tom Tromey's comments earlier in the thread, and I
> think they all have been addresses. Furthermore, Eli already approved
> the documentation part, so I'd say: approved, but wait for one week in
> case somebody else has other comments.
>
> Approved-By: Tom de Vries <tdevries@suse.de>
>
Hi,
any update on this?
Thanks,
- Tom
> Thanks,
> - Tom
>
>> PR symtab/24549
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=24549
>> ---
>> gdb/NEWS | 3 ++
>> gdb/doc/gdb.texinfo | 23 +++++++++++++--
>> gdb/dwarf2/index-write.c | 47 +++++++++++++++++++++++++++----
>> gdb/dwarf2/read-gdb-index.c | 56 +++++++++++++++++++++++++++++++++++--
>> gdb/dwarf2/read.c | 13 +++++++--
>> gdb/dwarf2/read.h | 12 ++++++++
>> 6 files changed, 143 insertions(+), 11 deletions(-)
>>
>> diff --git a/gdb/NEWS b/gdb/NEWS
>> index d97e3c15a8..ac455f39f2 100644
>> --- a/gdb/NEWS
>> +++ b/gdb/NEWS
>> @@ -3,6 +3,9 @@
>> *** Changes since GDB 13
>> +* GDB index now contains information about the main function. This
>> speeds up
>> + startup when it is being used for some large binaries.
>> +
>> * The AArch64 'org.gnu.gdb.aarch64.pauth' Pointer Authentication
>> feature string
>> has been deprecated in favor of the 'org.gnu.gdb.aarch64.pauth_v2'
>> feature
>> string.
>> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
>> index d1059e0cb7..3b2fdcd19e 100644
>> --- a/gdb/doc/gdb.texinfo
>> +++ b/gdb/doc/gdb.texinfo
>> @@ -49093,13 +49093,14 @@ unless otherwise noted:
>> @enumerate
>> @item
>> -The version number, currently 8. Versions 1, 2 and 3 are obsolete.
>> +The version number, currently 9. Versions 1, 2 and 3 are obsolete.
>> Version 4 uses a different hashing function from versions 5 and 6.
>> Version 6 includes symbols for inlined functions, whereas versions 4
>> and 5 do not. Version 7 adds attributes to the CU indices in the
>> symbol table. Version 8 specifies that symbols from DWARF type units
>> (@samp{DW_TAG_type_unit}) refer to the type unit's symbol table and
>> not the
>> -compilation unit (@samp{DW_TAG_comp_unit}) using the type.
>> +compilation unit (@samp{DW_TAG_comp_unit}) using the type. Version 9
>> adds
>> +the name and the language of the main function to the index.
>> @value{GDBN} will only read version 4, 5, or 6 indices
>> by specifying @code{set use-deprecated-index-sections on}.
>> @@ -49120,6 +49121,9 @@ The offset, from the start of the file, of the
>> address area.
>> @item
>> The offset, from the start of the file, of the symbol table.
>> +@item
>> +The offset, from the start of the file, of the shortcut table.
>> +
>> @item
>> The offset, from the start of the file, of the constant pool.
>> @end enumerate
>> @@ -49196,6 +49200,21 @@ don't currently have a simple description of
>> the canonicalization
>> algorithm; if you intend to create new index sections, you must read
>> the code.
>> +@item The shortcut table
>> +This is a data structure with the following fields:
>> +
>> +@table @asis
>> +@item Language of main
>> +A 32-bit little-endian value indicating the language of the main
>> function as a
>> +@code{DW_LANG_} constant. This value will be zero if main function
>> information
>> +is not present.
>> +
>> +@item Name of main
>> +An @code{offset_type} value indicating the offset of the main
>> function's name
>> +in the constant pool. This value must be ignored if the value for
>> the language
>> +of main is zero.
>> +@end table
>> +
>> @item
>> The constant pool. This is simply a bunch of bytes. It is organized
>> so that alignment is correct: CU vectors are stored first, followed by
>> diff --git a/gdb/dwarf2/index-write.c b/gdb/dwarf2/index-write.c
>> index 62c2cc6ac7..7117a5184b 100644
>> --- a/gdb/dwarf2/index-write.c
>> +++ b/gdb/dwarf2/index-write.c
>> @@ -1080,14 +1080,15 @@ write_gdbindex_1 (FILE *out_file,
>> const data_buf &types_cu_list,
>> const data_buf &addr_vec,
>> const data_buf &symtab_vec,
>> - const data_buf &constant_pool)
>> + const data_buf &constant_pool,
>> + const data_buf &shortcuts)
>> {
>> data_buf contents;
>> - const offset_type size_of_header = 6 * sizeof (offset_type);
>> + const offset_type size_of_header = 7 * sizeof (offset_type);
>> offset_type total_len = size_of_header;
>> /* The version number. */
>> - contents.append_offset (8);
>> + contents.append_offset (9);
>> /* The offset of the CU list from the start of the file. */
>> contents.append_offset (total_len);
>> @@ -1105,6 +1106,10 @@ write_gdbindex_1 (FILE *out_file,
>> contents.append_offset (total_len);
>> total_len += symtab_vec.size ();
>> + /* The offset of the shortcut table from the start of the file. */
>> + contents.append_offset (total_len);
>> + total_len += shortcuts.size ();
>> +
>> /* The offset of the constant pool from the start of the file. */
>> contents.append_offset (total_len);
>> total_len += constant_pool.size ();
>> @@ -1116,6 +1121,7 @@ write_gdbindex_1 (FILE *out_file,
>> types_cu_list.file_write (out_file);
>> addr_vec.file_write (out_file);
>> symtab_vec.file_write (out_file);
>> + shortcuts.file_write (out_file);
>> constant_pool.file_write (out_file);
>> assert_file_size (out_file, total_len);
>> @@ -1193,6 +1199,34 @@ write_cooked_index (cooked_index *table,
>> }
>> }
>> +/* Write shortcut information. */
>> +
>> +static void
>> +write_shortcuts_table (cooked_index *table, data_buf& shortcuts,
>> + data_buf& cpool)
>> +{
>> + const auto main_info = table->get_main ();
>> + size_t main_name_offset = 0;
>> + dwarf_source_language dw_lang = (dwarf_source_language)0;
>> +
>> + if (main_info != nullptr)
>> + {
>> + dw_lang = main_info->per_cu->dw_lang;
>> +
>> + if (dw_lang != 0)
>> + {
>> + auto_obstack obstack;
>> + const auto main_name = main_info->full_name (&obstack, true);
>> +
>> + main_name_offset = cpool.size ();
>> + cpool.append_cstr0 (main_name);
>> + }
>> + }
>> +
>> + shortcuts.append_uint (4, BFD_ENDIAN_LITTLE, dw_lang);
>> + shortcuts.append_offset (main_name_offset);
>> +}
>> +
>> /* Write contents of a .gdb_index section for OBJFILE into OUT_FILE.
>> If OBJFILE has an associated dwz file, write contents of a
>> .gdb_index
>> section for that dwz file into DWZ_OUT_FILE. If OBJFILE does not
>> have an
>> @@ -1270,11 +1304,14 @@ write_gdbindex (dwarf2_per_bfd *per_bfd,
>> cooked_index *table,
>> write_hash_table (&symtab, symtab_vec, constant_pool);
>> + data_buf shortcuts;
>> + write_shortcuts_table (table, shortcuts, constant_pool);
>> +
>> write_gdbindex_1(out_file, objfile_cu_list, types_cu_list, addr_vec,
>> - symtab_vec, constant_pool);
>> + symtab_vec, constant_pool, shortcuts);
>> if (dwz_out_file != NULL)
>> - write_gdbindex_1 (dwz_out_file, dwz_cu_list, {}, {}, {}, {});
>> + write_gdbindex_1 (dwz_out_file, dwz_cu_list, {}, {}, {}, {}, {});
>> else
>> gdb_assert (dwz_cu_list.empty ());
>> }
>> diff --git a/gdb/dwarf2/read-gdb-index.c b/gdb/dwarf2/read-gdb-index.c
>> index 1006386cb2..f09c5ba234 100644
>> --- a/gdb/dwarf2/read-gdb-index.c
>> +++ b/gdb/dwarf2/read-gdb-index.c
>> @@ -88,6 +88,9 @@ struct mapped_gdb_index final : public
>> mapped_index_base
>> /* A pointer to the constant pool. */
>> gdb::array_view<const gdb_byte> constant_pool;
>> + /* The shortcut table data. */
>> + gdb::array_view<const gdb_byte> shortcut_table;
>> +
>> /* Return the index into the constant pool of the name of the IDXth
>> symbol in the symbol table. */
>> offset_type symbol_name_index (offset_type idx) const
>> @@ -166,6 +169,7 @@ dwarf2_gdb_index::dump (struct objfile *objfile)
>> mapped_gdb_index *index =
>> (gdb::checked_static_cast<mapped_gdb_index *>
>> (per_objfile->per_bfd->index_table.get ()));
>> +
>> gdb_printf (".gdb_index: version %d\n", index->version);
>> gdb_printf ("\n");
>> }
>> @@ -583,7 +587,7 @@ to use the section anyway."),
>> /* Indexes with higher version than the one supported by GDB may
>> be no
>> longer backward compatible. */
>> - if (version > 8)
>> + if (version > 9)
>> return 0;
>> map->version = version;
>> @@ -608,8 +612,17 @@ to use the section anyway."),
>> map->symbol_table
>> = offset_view (gdb::array_view<const gdb_byte> (symbol_table,
>> symbol_table_end));
>> -
>> ++i;
>> +
>> + if (version >= 9)
>> + {
>> + const gdb_byte *shortcut_table = addr + metadata[i];
>> + const gdb_byte *shortcut_table_end = addr + metadata[i + 1];
>> + map->shortcut_table
>> + = gdb::array_view<const gdb_byte> (shortcut_table,
>> shortcut_table_end);
>> + ++i;
>> + }
>> +
>> map->constant_pool = buffer.slice (metadata[i]);
>> if (map->constant_pool.empty () && !map->symbol_table.empty ())
>> @@ -763,6 +776,43 @@ create_addrmap_from_gdb_index (dwarf2_per_objfile
>> *per_objfile,
>> = new (&per_bfd->obstack) addrmap_fixed (&per_bfd->obstack,
>> &mutable_map);
>> }
>> +/* Sets the name and language of the main function from the shortcut
>> table. */
>> +
>> +static void
>> +set_main_name_from_gdb_index (dwarf2_per_objfile *per_objfile,
>> + mapped_gdb_index *index)
>> +{
>> + const auto expected_size = 4 + sizeof (offset_type);
>> + if (index->shortcut_table.size () < expected_size)
>> + /* The data in the section is not present, is corrupted or is in
>> a version
>> + * we don't know about. Regardless, we can't make use of it. */
>> + return;
>> +
>> + auto ptr = index->shortcut_table.data ();
>> + const auto dw_lang = extract_unsigned_integer (ptr, 4,
>> BFD_ENDIAN_LITTLE);
>> + if (dw_lang >= DW_LANG_hi_user)
>> + {
>> + complaint (_(".gdb_index shortcut table has invalid main
>> language %u"),
>> + (unsigned) dw_lang);
>> + return;
>> + }
>> + if (dw_lang == 0)
>> + {
>> + /* Don't bother if the language for the main symbol was not
>> known or if
>> + * there was no main symbol at all when the index was built. */
>> + return;
>> + }
>> + ptr += 4;
>> +
>> + const auto lang = dwarf_lang_to_enum_language (dw_lang);
>> + const auto name_offset = extract_unsigned_integer (ptr,
>> + sizeof (offset_type),
>> + BFD_ENDIAN_LITTLE);
>> + const auto name = (const char*) (index->constant_pool.data () +
>> name_offset);
>> +
>> + set_objfile_main_name (per_objfile->objfile, name, (enum language)
>> lang);
>> +}
>> +
>> /* See read-gdb-index.h. */
>> int
>> @@ -848,6 +898,8 @@ dwarf2_read_gdb_index
>> create_addrmap_from_gdb_index (per_objfile, map.get ());
>> + set_main_name_from_gdb_index (per_objfile, map.get ());
>> +
>> per_bfd->index_table = std::move (map);
>> per_bfd->quick_file_names_table =
>> create_quick_file_names_table (per_bfd->all_units.size ());
>> diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
>> index 4828409222..89acd94c05 100644
>> --- a/gdb/dwarf2/read.c
>> +++ b/gdb/dwarf2/read.c
>> @@ -17745,7 +17745,9 @@ leb128_size (const gdb_byte *buf)
>> }
>> }
>> -static enum language
>> +/* Converts DWARF language names to GDB language names. */
>> +
>> +enum language
>> dwarf_lang_to_enum_language (unsigned int lang)
>> {
>> enum language language;
>> @@ -21661,6 +21663,7 @@ prepare_one_comp_unit (struct dwarf2_cu *cu,
>> struct die_info *comp_unit_die,
>> /* Set the language we're debugging. */
>> attr = dwarf2_attr (comp_unit_die, DW_AT_language, cu);
>> enum language lang;
>> + dwarf_source_language dw_lang = (dwarf_source_language)0;
>> if (cu->producer != nullptr
>> && strstr (cu->producer, "IBM XL C for OpenCL") != NULL)
>> {
>> @@ -21669,18 +21672,24 @@ prepare_one_comp_unit (struct dwarf2_cu *cu,
>> struct die_info *comp_unit_die,
>> language detection we fall back to the DW_AT_producer
>> string. */
>> lang = language_opencl;
>> + dw_lang = DW_LANG_OpenCL;
>> }
>> else if (cu->producer != nullptr
>> && strstr (cu->producer, "GNU Go ") != NULL)
>> {
>> /* Similar hack for Go. */
>> lang = language_go;
>> + dw_lang = DW_LANG_Go;
>> }
>> else if (attr != nullptr)
>> - lang = dwarf_lang_to_enum_language (attr->constant_value (0));
>> + {
>> + lang = dwarf_lang_to_enum_language (attr->constant_value (0));
>> + dw_lang = (dwarf_source_language)attr->constant_value (0);
>> + }
>> else
>> lang = pretend_language;
>> + cu->per_cu->dw_lang = dw_lang;
>> cu->language_defn = language_def (lang);
>> switch (comp_unit_die->tag)
>> diff --git a/gdb/dwarf2/read.h b/gdb/dwarf2/read.h
>> index 37023a2070..6707c400cf 100644
>> --- a/gdb/dwarf2/read.h
>> +++ b/gdb/dwarf2/read.h
>> @@ -245,6 +245,14 @@ struct dwarf2_per_cu_data
>> functions above. */
>> std::vector <dwarf2_per_cu_data *> *imported_symtabs = nullptr;
>> + /* The original DW_LANG_* value of the CU, as provided to us by
>> + * DW_AT_language. It is interesting to keep this value around in
>> cases where
>> + * we can't use the values from the language enum, as the mapping
>> to them is
>> + * lossy, and, while that is usually fine, things like the index
>> have an
>> + * understandable bias towards not exposing internal GDB structures
>> to the
>> + * outside world, and so prefer to use DWARF constants in their
>> stead. */
>> + dwarf_source_language dw_lang;
>> +
>> /* Return true of IMPORTED_SYMTABS is empty or not yet allocated. */
>> bool imported_symtabs_empty () const
>> {
>> @@ -755,6 +763,10 @@ struct dwarf2_per_objfile
>> std::unique_ptr<dwarf2_cu>> m_dwarf2_cus;
>> };
>> +/* Converts DWARF language names to GDB language names. */
>> +
>> +enum language dwarf_lang_to_enum_language (unsigned int lang);
>> +
>> /* Get the dwarf2_per_objfile associated to OBJFILE. */
>> dwarf2_per_objfile *get_dwarf2_per_objfile (struct objfile *objfile);
>
next prev parent reply other threads:[~2023-09-13 7:09 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-08 21:40 [PATCH] " Matheus Branco Borella
2023-06-09 16:56 ` Tom Tromey
2023-06-30 20:36 ` Matheus Branco Borella
2023-07-01 5:47 ` Eli Zaretskii
2023-07-07 15:00 ` Matheus Branco Borella
2023-07-07 18:00 ` Eli Zaretskii
2023-08-04 20:55 ` Tom de Vries
2023-08-05 5:36 ` Eli Zaretskii
2023-08-03 7:12 ` Tom de Vries
2023-08-03 7:29 ` Tom de Vries
2023-08-04 18:09 ` [PATCH v2] " Matheus Branco Borella
2023-08-11 18:21 ` [PATCH v3] " Matheus Branco Borella
2023-08-14 7:31 ` Tom de Vries
2023-08-14 7:36 ` Tom de Vries
2023-09-13 7:09 ` Tom de Vries [this message]
2023-09-25 18:47 ` Matheus Branco Borella (DarkRyu550)
2023-09-26 14:07 ` Tom de Vries
2023-10-04 22:30 ` Tom de Vries
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bd53743b-553b-4aed-8dd5-6fe060e91705@suse.de \
--to=tdevries@suse.de \
--cc=dark.ryu.550@gmail.com \
--cc=gdb-patches@sourceware.org \
--cc=tom@tromey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).