From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=vDdf=E5=suse.de=tdevries@sourceware.org>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
	by sourceware.org (Postfix) with ESMTPS id 9B8633858412
	for <gdb-patches@sourceware.org>; Wed, 13 Sep 2023 07:09:23 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9B8633858412
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512)
	(No client certificate requested)
	by smtp-out1.suse.de (Postfix) with ESMTPS id BA049218E8;
	Wed, 13 Sep 2023 07:09:22 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1694588962; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=3HOJiXkVkCw+8bpuo17cQhK/aD3xJa8bg3u25a4Vqhc=;
	b=GrcYCUWeFa4cJkVt29/0VCoPSmZ/6XNnx5vy7ofkGlFqhEOLyYc5b+8//yw7pIAzE8ar8N
	jBm51l0LgdPPWeRHJu2QGsycdR/jfukRY7+RAHBtBvEiKFPm721RVkf3IlbythHBZmMyZn
	xMRTgx/cPSds/I5qnzfLLzaP/XcKmiY=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1694588962;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=3HOJiXkVkCw+8bpuo17cQhK/aD3xJa8bg3u25a4Vqhc=;
	b=rz6YJShjjtL0T05sEASDZjyfwTcyEszUh67fl1IszpuabBg8axoEry5vSGjrljIKAq2jJh
	ZmW8dj/KS6rH70Aw==
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512)
	(No client certificate requested)
	by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 971DF13582;
	Wed, 13 Sep 2023 07:09:22 +0000 (UTC)
Received: from dovecot-director2.suse.de ([192.168.254.65])
	by imap2.suse-dmz.suse.de with ESMTPSA
	id PDZZIyJgAWUoewAAMHmgww
	(envelope-from <tdevries@suse.de>); Wed, 13 Sep 2023 07:09:22 +0000
Message-ID: <bd53743b-553b-4aed-8dd5-6fe060e91705@suse.de>
Date: Wed, 13 Sep 2023 09:09:12 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH v3] Add name_of_main and language_of_main to the DWARF
 index
Content-Language: en-US
To: Matheus Branco Borella <dark.ryu.550@gmail.com>,
 gdb-patches@sourceware.org
Cc: tom@tromey.com
References: <20230608214012.1561-1-dark.ryu.550@gmail.com>
 <20230811182131.492-1-dark.ryu.550@gmail.com>
 <ad5e6d5c-2520-0458-1308-6ee1f7044389@suse.de>
From: Tom de Vries <tdevries@suse.de>
In-Reply-To: <ad5e6d5c-2520-0458-1308-6ee1f7044389@suse.de>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gdb-patches.sourceware.org>

On 8/14/23 09:31, Tom de Vries via Gdb-patches wrote:
> On 8/11/23 20:21, Matheus Branco Borella wrote:
>> This should hopefully be the final version. Has a more descriptive 
>> entry in the
>> NEWS file, but is otherwise the same as v2. I believe the 
>> misunderstanding with
>> my contributor status should have also been sorted out?
>>
> 
> Hi,
> 
> As per Eli's last comment in this thread, that's indeed the case.
> 
>> ---
>> This patch adds a new section to the DWARF index containing the name
>> and the language of the main function symbol, gathered from
>> `cooked_index::get_main`, if available. Currently, for lack of a 
>> better name,
>> this section is called the "shortcut table". The way this name is both 
>> saved and
>> applied upon an index being loaded in mirrors how it is done in
>> `cooked_index_functions`, more specifically, the full name of the main 
>> function
>> symbol is saved and `set_objfile_main_name` is used to apply it after 
>> it is
>> loaded.
>>
>> The main use case for this patch is in improving startup times when 
>> dealing with
>> large binaries. Currently, when an index is used, GDB has to expand 
>> symtabs
>> until it finds out what the language of the main function symbol is. 
>> For some
>> large executables, this may take a considerable amount of time to 
>> complete,
>> slowing down startup. This patch bypasses that operation by having 
>> both the name
>> and language of the main function symbol be provided ahead of time by 
>> the index.
>>
>> In my testing (a binary with about 1.8GB worth of DWARF data) this 
>> change brings
>> startup time down from about 34 seconds to about 1.5 seconds.
>>
> 
> I've reviewed the patch, and found a few nits.
> 
> There are two spots that look unintentional to me, one adding and one
> removing an empty line.  You might want to remove those.
> 
> I found this bit:
> ...
> +      lang = dwarf_lang_to_enum_language (attr->constant_value (0));
> +      dw_lang = (dwarf_source_language)attr->constant_value (0);
> ...
> 
> I would reformulate as:
> ...
> +      dw_lang = (dwarf_source_language)attr->constant_value (0);
> +      lang = dwarf_lang_to_enum_language (dw_lang);
> ...
> but I don't feel strongly about that.
> 
> I've looked over Tom Tromey's comments earlier in the thread, and I 
> think they all have been addresses.  Furthermore, Eli already approved 
> the documentation part, so I'd say: approved, but wait for one week in 
> case somebody else has other comments.
> 
> Approved-By: Tom de Vries <tdevries@suse.de>
> 

Hi,

any update on this?

Thanks,
- Tom

> Thanks,
> - Tom
> 
>> PR symtab/24549
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=24549
>> ---
>>   gdb/NEWS                    |  3 ++
>>   gdb/doc/gdb.texinfo         | 23 +++++++++++++--
>>   gdb/dwarf2/index-write.c    | 47 +++++++++++++++++++++++++++----
>>   gdb/dwarf2/read-gdb-index.c | 56 +++++++++++++++++++++++++++++++++++--
>>   gdb/dwarf2/read.c           | 13 +++++++--
>>   gdb/dwarf2/read.h           | 12 ++++++++
>>   6 files changed, 143 insertions(+), 11 deletions(-)
>>
>> diff --git a/gdb/NEWS b/gdb/NEWS
>> index d97e3c15a8..ac455f39f2 100644
>> --- a/gdb/NEWS
>> +++ b/gdb/NEWS
>> @@ -3,6 +3,9 @@
>>   *** Changes since GDB 13
>> +* GDB index now contains information about the main function. This 
>> speeds up
>> +  startup when it is being used for some large binaries.
>> +
>>   * The AArch64 'org.gnu.gdb.aarch64.pauth' Pointer Authentication 
>> feature string
>>     has been deprecated in favor of the 'org.gnu.gdb.aarch64.pauth_v2' 
>> feature
>>     string.
>> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
>> index d1059e0cb7..3b2fdcd19e 100644
>> --- a/gdb/doc/gdb.texinfo
>> +++ b/gdb/doc/gdb.texinfo
>> @@ -49093,13 +49093,14 @@ unless otherwise noted:
>>   @enumerate
>>   @item
>> -The version number, currently 8.  Versions 1, 2 and 3 are obsolete.
>> +The version number, currently 9.  Versions 1, 2 and 3 are obsolete.
>>   Version 4 uses a different hashing function from versions 5 and 6.
>>   Version 6 includes symbols for inlined functions, whereas versions 4
>>   and 5 do not.  Version 7 adds attributes to the CU indices in the
>>   symbol table.  Version 8 specifies that symbols from DWARF type units
>>   (@samp{DW_TAG_type_unit}) refer to the type unit's symbol table and 
>> not the
>> -compilation unit (@samp{DW_TAG_comp_unit}) using the type.
>> +compilation unit (@samp{DW_TAG_comp_unit}) using the type.  Version 9 
>> adds
>> +the name and the language of the main function to the index.
>>   @value{GDBN} will only read version 4, 5, or 6 indices
>>   by specifying @code{set use-deprecated-index-sections on}.
>> @@ -49120,6 +49121,9 @@ The offset, from the start of the file, of the 
>> address area.
>>   @item
>>   The offset, from the start of the file, of the symbol table.
>> +@item
>> +The offset, from the start of the file, of the shortcut table.
>> +
>>   @item
>>   The offset, from the start of the file, of the constant pool.
>>   @end enumerate
>> @@ -49196,6 +49200,21 @@ don't currently have a simple description of 
>> the canonicalization
>>   algorithm; if you intend to create new index sections, you must read
>>   the code.
>> +@item The shortcut table
>> +This is a data structure with the following fields:
>> +
>> +@table @asis
>> +@item Language of main
>> +A 32-bit little-endian value indicating the language of the main 
>> function as a
>> +@code{DW_LANG_} constant.  This value will be zero if main function 
>> information
>> +is not present.
>> +
>> +@item Name of main
>> +An @code{offset_type} value indicating the offset of the main 
>> function's name
>> +in the constant pool.  This value must be ignored if the value for 
>> the language
>> +of main is zero.
>> +@end table
>> +
>>   @item
>>   The constant pool.  This is simply a bunch of bytes.  It is organized
>>   so that alignment is correct: CU vectors are stored first, followed by
>> diff --git a/gdb/dwarf2/index-write.c b/gdb/dwarf2/index-write.c
>> index 62c2cc6ac7..7117a5184b 100644
>> --- a/gdb/dwarf2/index-write.c
>> +++ b/gdb/dwarf2/index-write.c
>> @@ -1080,14 +1080,15 @@ write_gdbindex_1 (FILE *out_file,
>>             const data_buf &types_cu_list,
>>             const data_buf &addr_vec,
>>             const data_buf &symtab_vec,
>> -          const data_buf &constant_pool)
>> +          const data_buf &constant_pool,
>> +          const data_buf &shortcuts)
>>   {
>>     data_buf contents;
>> -  const offset_type size_of_header = 6 * sizeof (offset_type);
>> +  const offset_type size_of_header = 7 * sizeof (offset_type);
>>     offset_type total_len = size_of_header;
>>     /* The version number.  */
>> -  contents.append_offset (8);
>> +  contents.append_offset (9);
>>     /* The offset of the CU list from the start of the file.  */
>>     contents.append_offset (total_len);
>> @@ -1105,6 +1106,10 @@ write_gdbindex_1 (FILE *out_file,
>>     contents.append_offset (total_len);
>>     total_len += symtab_vec.size ();
>> +  /* The offset of the shortcut table from the start of the file.  */
>> +  contents.append_offset (total_len);
>> +  total_len += shortcuts.size ();
>> +
>>     /* The offset of the constant pool from the start of the file.  */
>>     contents.append_offset (total_len);
>>     total_len += constant_pool.size ();
>> @@ -1116,6 +1121,7 @@ write_gdbindex_1 (FILE *out_file,
>>     types_cu_list.file_write (out_file);
>>     addr_vec.file_write (out_file);
>>     symtab_vec.file_write (out_file);
>> +  shortcuts.file_write (out_file);
>>     constant_pool.file_write (out_file);
>>     assert_file_size (out_file, total_len);
>> @@ -1193,6 +1199,34 @@ write_cooked_index (cooked_index *table,
>>       }
>>   }
>> +/* Write shortcut information. */
>> +
>> +static void
>> +write_shortcuts_table (cooked_index *table, data_buf& shortcuts,
>> +               data_buf& cpool)
>> +{
>> +  const auto main_info = table->get_main ();
>> +  size_t main_name_offset = 0;
>> +  dwarf_source_language dw_lang = (dwarf_source_language)0;
>> +
>> +  if (main_info != nullptr)
>> +    {
>> +      dw_lang = main_info->per_cu->dw_lang;
>> +
>> +      if (dw_lang != 0)
>> +    {
>> +      auto_obstack obstack;
>> +      const auto main_name = main_info->full_name (&obstack, true);
>> +
>> +      main_name_offset = cpool.size ();
>> +      cpool.append_cstr0 (main_name);
>> +    }
>> +    }
>> +
>> +  shortcuts.append_uint (4, BFD_ENDIAN_LITTLE, dw_lang);
>> +  shortcuts.append_offset (main_name_offset);
>> +}
>> +
>>   /* Write contents of a .gdb_index section for OBJFILE into OUT_FILE.
>>      If OBJFILE has an associated dwz file, write contents of a 
>> .gdb_index
>>      section for that dwz file into DWZ_OUT_FILE.  If OBJFILE does not 
>> have an
>> @@ -1270,11 +1304,14 @@ write_gdbindex (dwarf2_per_bfd *per_bfd, 
>> cooked_index *table,
>>     write_hash_table (&symtab, symtab_vec, constant_pool);
>> +  data_buf shortcuts;
>> +  write_shortcuts_table (table, shortcuts, constant_pool);
>> +
>>     write_gdbindex_1(out_file, objfile_cu_list, types_cu_list, addr_vec,
>> -           symtab_vec, constant_pool);
>> +           symtab_vec, constant_pool, shortcuts);
>>     if (dwz_out_file != NULL)
>> -    write_gdbindex_1 (dwz_out_file, dwz_cu_list, {}, {}, {}, {});
>> +    write_gdbindex_1 (dwz_out_file, dwz_cu_list, {}, {}, {}, {}, {});
>>     else
>>       gdb_assert (dwz_cu_list.empty ());
>>   }
>> diff --git a/gdb/dwarf2/read-gdb-index.c b/gdb/dwarf2/read-gdb-index.c
>> index 1006386cb2..f09c5ba234 100644
>> --- a/gdb/dwarf2/read-gdb-index.c
>> +++ b/gdb/dwarf2/read-gdb-index.c
>> @@ -88,6 +88,9 @@ struct mapped_gdb_index final : public 
>> mapped_index_base
>>     /* A pointer to the constant pool.  */
>>     gdb::array_view<const gdb_byte> constant_pool;
>> +  /* The shortcut table data. */
>> +  gdb::array_view<const gdb_byte> shortcut_table;
>> +
>>     /* Return the index into the constant pool of the name of the IDXth
>>        symbol in the symbol table.  */
>>     offset_type symbol_name_index (offset_type idx) const
>> @@ -166,6 +169,7 @@ dwarf2_gdb_index::dump (struct objfile *objfile)
>>     mapped_gdb_index *index = 
>> (gdb::checked_static_cast<mapped_gdb_index *>
>>                    (per_objfile->per_bfd->index_table.get ()));
>> +
>>     gdb_printf (".gdb_index: version %d\n", index->version);
>>     gdb_printf ("\n");
>>   }
>> @@ -583,7 +587,7 @@ to use the section anyway."),
>>     /* Indexes with higher version than the one supported by GDB may 
>> be no
>>        longer backward compatible.  */
>> -  if (version > 8)
>> +  if (version > 9)
>>       return 0;
>>     map->version = version;
>> @@ -608,8 +612,17 @@ to use the section anyway."),
>>     map->symbol_table
>>       = offset_view (gdb::array_view<const gdb_byte> (symbol_table,
>>                               symbol_table_end));
>> -
>>     ++i;
>> +
>> +  if (version >= 9)
>> +    {
>> +      const gdb_byte *shortcut_table = addr + metadata[i];
>> +      const gdb_byte *shortcut_table_end = addr + metadata[i + 1];
>> +      map->shortcut_table
>> +    = gdb::array_view<const gdb_byte> (shortcut_table, 
>> shortcut_table_end);
>> +      ++i;
>> +    }
>> +
>>     map->constant_pool = buffer.slice (metadata[i]);
>>     if (map->constant_pool.empty () && !map->symbol_table.empty ())
>> @@ -763,6 +776,43 @@ create_addrmap_from_gdb_index (dwarf2_per_objfile 
>> *per_objfile,
>>       = new (&per_bfd->obstack) addrmap_fixed (&per_bfd->obstack, 
>> &mutable_map);
>>   }
>> +/* Sets the name and language of the main function from the shortcut 
>> table. */
>> +
>> +static void
>> +set_main_name_from_gdb_index (dwarf2_per_objfile *per_objfile,
>> +                  mapped_gdb_index *index)
>> +{
>> +  const auto expected_size = 4 + sizeof (offset_type);
>> +  if (index->shortcut_table.size () < expected_size)
>> +    /* The data in the section is not present, is corrupted or is in 
>> a version
>> +     * we don't know about. Regardless, we can't make use of it. */
>> +    return;
>> +
>> +  auto ptr = index->shortcut_table.data ();
>> +  const auto dw_lang = extract_unsigned_integer (ptr, 4, 
>> BFD_ENDIAN_LITTLE);
>> +  if (dw_lang >= DW_LANG_hi_user)
>> +    {
>> +      complaint (_(".gdb_index shortcut table has invalid main 
>> language %u"),
>> +           (unsigned) dw_lang);
>> +      return;
>> +    }
>> +  if (dw_lang == 0)
>> +    {
>> +      /* Don't bother if the language for the main symbol was not 
>> known or if
>> +       * there was no main symbol at all when the index was built. */
>> +      return;
>> +    }
>> +  ptr += 4;
>> +
>> +  const auto lang = dwarf_lang_to_enum_language (dw_lang);
>> +  const auto name_offset = extract_unsigned_integer (ptr,
>> +                             sizeof (offset_type),
>> +                             BFD_ENDIAN_LITTLE);
>> +  const auto name = (const char*) (index->constant_pool.data () + 
>> name_offset);
>> +
>> +  set_objfile_main_name (per_objfile->objfile, name, (enum language) 
>> lang);
>> +}
>> +
>>   /* See read-gdb-index.h.  */
>>   int
>> @@ -848,6 +898,8 @@ dwarf2_read_gdb_index
>>     create_addrmap_from_gdb_index (per_objfile, map.get ());
>> +  set_main_name_from_gdb_index (per_objfile, map.get ());
>> +
>>     per_bfd->index_table = std::move (map);
>>     per_bfd->quick_file_names_table =
>>       create_quick_file_names_table (per_bfd->all_units.size ());
>> diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
>> index 4828409222..89acd94c05 100644
>> --- a/gdb/dwarf2/read.c
>> +++ b/gdb/dwarf2/read.c
>> @@ -17745,7 +17745,9 @@ leb128_size (const gdb_byte *buf)
>>       }
>>   }
>> -static enum language
>> +/* Converts DWARF language names to GDB language names. */
>> +
>> +enum language
>>   dwarf_lang_to_enum_language (unsigned int lang)
>>   {
>>     enum language language;
>> @@ -21661,6 +21663,7 @@ prepare_one_comp_unit (struct dwarf2_cu *cu, 
>> struct die_info *comp_unit_die,
>>     /* Set the language we're debugging.  */
>>     attr = dwarf2_attr (comp_unit_die, DW_AT_language, cu);
>>     enum language lang;
>> +  dwarf_source_language dw_lang = (dwarf_source_language)0;
>>     if (cu->producer != nullptr
>>         && strstr (cu->producer, "IBM XL C for OpenCL") != NULL)
>>       {
>> @@ -21669,18 +21672,24 @@ prepare_one_comp_unit (struct dwarf2_cu *cu, 
>> struct die_info *comp_unit_die,
>>        language detection we fall back to the DW_AT_producer
>>        string.  */
>>         lang = language_opencl;
>> +      dw_lang = DW_LANG_OpenCL;
>>       }
>>     else if (cu->producer != nullptr
>>          && strstr (cu->producer, "GNU Go ") != NULL)
>>       {
>>         /* Similar hack for Go.  */
>>         lang = language_go;
>> +      dw_lang = DW_LANG_Go;
>>       }
>>     else if (attr != nullptr)
>> -    lang = dwarf_lang_to_enum_language (attr->constant_value (0));
>> +    {
>> +      lang = dwarf_lang_to_enum_language (attr->constant_value (0));
>> +      dw_lang = (dwarf_source_language)attr->constant_value (0);
>> +    }
>>     else
>>       lang = pretend_language;
>> +  cu->per_cu->dw_lang = dw_lang;
>>     cu->language_defn = language_def (lang);
>>     switch (comp_unit_die->tag)
>> diff --git a/gdb/dwarf2/read.h b/gdb/dwarf2/read.h
>> index 37023a2070..6707c400cf 100644
>> --- a/gdb/dwarf2/read.h
>> +++ b/gdb/dwarf2/read.h
>> @@ -245,6 +245,14 @@ struct dwarf2_per_cu_data
>>        functions above.  */
>>     std::vector <dwarf2_per_cu_data *> *imported_symtabs = nullptr;
>> +  /* The original DW_LANG_* value of the CU, as provided to us by
>> +   * DW_AT_language. It is interesting to keep this value around in 
>> cases where
>> +   * we can't use the values from the language enum, as the mapping 
>> to them is
>> +   * lossy, and, while that is usually fine, things like the index 
>> have an
>> +   * understandable bias towards not exposing internal GDB structures 
>> to the
>> +   * outside world, and so prefer to use DWARF constants in their 
>> stead. */
>> +  dwarf_source_language dw_lang;
>> +
>>     /* Return true of IMPORTED_SYMTABS is empty or not yet allocated.  */
>>     bool imported_symtabs_empty () const
>>     {
>> @@ -755,6 +763,10 @@ struct dwarf2_per_objfile
>>                std::unique_ptr<dwarf2_cu>> m_dwarf2_cus;
>>   };
>> +/* Converts DWARF language names to GDB language names. */
>> +
>> +enum language dwarf_lang_to_enum_language (unsigned int lang);
>> +
>>   /* Get the dwarf2_per_objfile associated to OBJFILE.  */
>>   dwarf2_per_objfile *get_dwarf2_per_objfile (struct objfile *objfile);
>