[PATCH 0/2] Fix .gdb_index with Ada

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [PATCH 0/2] Fix .gdb_index with Ada
@ 2022-09-22 20:20 Tom Tromey
  2022-09-22 20:20 ` [PATCH 1/2] Improve Ada support in .gdb_index Tom Tromey
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Tom Tromey @ 2022-09-22 20:20 UTC (permalink / raw)
  To: gdb-patches

This series improves .gdb_index support for Ada, fixing a regression
introduced by the DWARF reader rewrite.

I regression tested this using the cc-with-gdb-index target board on
x86-64 Fedora 34.  There were no regressions, only improvements.

Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] Improve Ada support in .gdb_index
  2022-09-22 20:20 [PATCH 0/2] Fix .gdb_index with Ada Tom Tromey
@ 2022-09-22 20:20 ` Tom Tromey
  2022-09-22 20:20 ` [PATCH 2/2] Change .gdb_index de-duplication implementation Tom Tromey
  2022-09-28  2:00 ` [PATCH 0/2] Fix .gdb_index with Ada Tom de Vries
  2 siblings, 0 replies; 8+ messages in thread
From: Tom Tromey @ 2022-09-22 20:20 UTC (permalink / raw)
  To: gdb-patches; +Cc: Tom Tromey

The cooked index work changed how .gdb_index is constructed, and in
the process broke .gdb_index support.  This is PR symtab/29179.

This patch partially fixes the problem.  It arranges for Ada names to
be encoded in the form expected by the index code.  In particular,
linkage names for Ada are emitted, including the "main" name; names
are Ada-encoded; and names are no longer case-folded, something that
prevented operator names from round-tripping correctly.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29179
---
 gdb/ada-lang.c           |  8 +++++---
 gdb/ada-lang.h           |  2 +-
 gdb/dwarf2/index-write.c | 42 ++++++++++++++++++++++++++++++++--------
 3 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/gdb/ada-lang.c b/gdb/ada-lang.c
index d539c50602c..88248b5c7a2 100644
--- a/gdb/ada-lang.c
+++ b/gdb/ada-lang.c
@@ -1146,12 +1146,14 @@ ada_fold_name (gdb::string_view name, bool throw_on_error = false)
   return fold_storage.c_str ();
 }
 
-/* The "encoded" form of DECODED, according to GNAT conventions.  */
+/* The "encoded" form of DECODED, according to GNAT conventions.  If
+   FOLD is true (the default), case-fold any ordinary symbol.  Symbols
+   with <...> quoting are not folded in any case.  */
 
 std::string
-ada_encode (const char *decoded)
+ada_encode (const char *decoded, bool fold)
 {
-  if (decoded[0] != '<')
+  if (fold && decoded[0] != '<')
     decoded = ada_fold_name (decoded);
   return ada_encode_1 (decoded, true);
 }
diff --git a/gdb/ada-lang.h b/gdb/ada-lang.h
index 0dcdb56c198..cc8db6b5723 100644
--- a/gdb/ada-lang.h
+++ b/gdb/ada-lang.h
@@ -316,7 +316,7 @@ extern struct type *ada_get_base_type (struct type *);
 
 extern struct type *ada_check_typedef (struct type *);
 
-extern std::string ada_encode (const char *);
+extern std::string ada_encode (const char *, bool fold = true);
 
 extern const char *ada_enum_name (const char *);
 
diff --git a/gdb/dwarf2/index-write.c b/gdb/dwarf2/index-write.c
index 4cc0ee53070..6940a0ce3be 100644
--- a/gdb/dwarf2/index-write.c
+++ b/gdb/dwarf2/index-write.c
@@ -1112,21 +1112,47 @@ write_cooked_index (cooked_index_vector *table,
   htab_up var_names (htab_create_alloc (10, htab_hash_string, htab_eq_string,
 					nullptr, xcalloc, xfree));
 
+  const char *main_for_ada = main_name ();
+
   for (const cooked_index_entry *entry : table->all_entries ())
     {
-      /* GDB never put linkage names into .gdb_index.  The theory here
-	 is that a linkage name will normally be in the minimal
-	 symbols anyway, so including it in the index is usually
-	 redundant -- and the cases where it would not be redundant
-	 are rare and not worth supporting.  */
-      if ((entry->flags & IS_LINKAGE) != 0)
-	continue;
-
       const auto it = cu_index_htab.find (entry->per_cu);
       gdb_assert (it != cu_index_htab.cend ());
 
       const char *name = entry->full_name (&symtab->m_string_obstack);
 
+      if (entry->per_cu->lang () == language_ada)
+	{
+	  /* We want to ensure that the Ada main function's name
+	     appears verbatim in the index.  However, this name will
+	     be of the form "_ada_mumble", and will be rewritten by
+	     ada_decode.  So, recognize it specially here and add it
+	     to the index by hand.  */
+	  if (entry->tag == DW_TAG_subprogram
+	      && strcmp (main_for_ada, name) == 0)
+	    {
+	      /* Leave it alone.  */
+	    }
+	  else
+	    {
+	      /* In order for the index to work when read back into
+		 gdb, it has to use the encoded name, with any
+		 suffixes stripped.  */
+	      std::string encoded = ada_encode (name, false);
+	      name = obstack_strdup (&symtab->m_string_obstack,
+				     encoded.c_str ());
+	    }
+	}
+      else if ((entry->flags & IS_LINKAGE) != 0)
+	{
+	  /* GDB never put linkage names into .gdb_index.  The theory
+	     here is that a linkage name will normally be in the
+	     minimal symbols anyway, so including it in the index is
+	     usually redundant -- and the cases where it would not be
+	     redundant are rare and not worth supporting.  */
+	  continue;
+	}
+
       gdb_index_symbol_kind kind;
       if (entry->tag == DW_TAG_subprogram)
 	kind = GDB_INDEX_SYMBOL_KIND_FUNCTION;
-- 
2.34.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] Change .gdb_index de-duplication implementation
  2022-09-22 20:20 [PATCH 0/2] Fix .gdb_index with Ada Tom Tromey
  2022-09-22 20:20 ` [PATCH 1/2] Improve Ada support in .gdb_index Tom Tromey
@ 2022-09-22 20:20 ` Tom Tromey
  2022-09-28  2:00 ` [PATCH 0/2] Fix .gdb_index with Ada Tom de Vries
  2 siblings, 0 replies; 8+ messages in thread
From: Tom Tromey @ 2022-09-22 20:20 UTC (permalink / raw)
  To: gdb-patches; +Cc: Tom Tromey

While investigating PR symtab/29179, I found that one Ada test failed
because, although a certain symbol was present in the index, with the
new DWARF reader it pointed to a different CU than was chosen by
earlier versions of gdb.

This patch changes how symbol de-duplication is done, deferring the
process until the entire symbol table has been constructed.  This way,
it's possible to always choose the lower-numbered CU among duplicates,
which is how gdb (implicitly) previously worked.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29179
---
 gdb/dwarf2/index-write.c | 77 +++++++++++++++++++++-------------------
 1 file changed, 41 insertions(+), 36 deletions(-)

diff --git a/gdb/dwarf2/index-write.c b/gdb/dwarf2/index-write.c
index 6940a0ce3be..ae05946e790 100644
--- a/gdb/dwarf2/index-write.c
+++ b/gdb/dwarf2/index-write.c
@@ -176,6 +176,10 @@ struct symtab_index_entry
   /* A sorted vector of the indices of all the CUs that hold an object
      of this name.  */
   std::vector<offset_type> cu_indices;
+
+  /* Minimize CU_INDICES, sorting them and removing duplicates as
+     appropriate.  */
+  void minimize ();
 };
 
 /* The symbol table.  This is a power-of-2-sized hash table.  */
@@ -186,6 +190,13 @@ struct mapped_symtab
     data.resize (1024);
   }
 
+  /* Minimize each entry in the symbol table, removing duplicates.  */
+  void minimize ()
+  {
+    for (symtab_index_entry &item : data)
+      item.minimize ();
+  }
+
   offset_type n_elements = 0;
   std::vector<symtab_index_entry> data;
 
@@ -271,21 +282,36 @@ add_index_entry (struct mapped_symtab *symtab, const char *name,
   slot.cu_indices.push_back (cu_index_and_attrs);
 }
 
-/* Sort and remove duplicates of all symbols' cu_indices lists.  */
+/* See symtab_index_entry.  */
 
-static void
-uniquify_cu_indices (struct mapped_symtab *symtab)
+void
+symtab_index_entry::minimize ()
 {
-  for (auto &entry : symtab->data)
+  if (name == nullptr || cu_indices.empty ())
+    return;
+
+  std::sort (cu_indices.begin (), cu_indices.end ());
+  auto from = std::unique (cu_indices.begin (), cu_indices.end ());
+  cu_indices.erase (from, cu_indices.end ());
+
+  /* We don't want to enter a variable or type more than once, so
+     remove any such duplicates from the list as well.  When doing
+     this, we want to keep the entry from the first CU -- but this is
+     implicit due to the sort.  This choice is done because it's
+     similar to what gdb historically did for partial symbols.  */
+  std::unordered_set<offset_type> seen;
+  from = std::remove_if (cu_indices.begin (), cu_indices.end (),
+			 [&] (offset_type val)
     {
-      if (entry.name != NULL && !entry.cu_indices.empty ())
-	{
-	  auto &cu_indices = entry.cu_indices;
-	  std::sort (cu_indices.begin (), cu_indices.end ());
-	  auto from = std::unique (cu_indices.begin (), cu_indices.end ());
-	  cu_indices.erase (from, cu_indices.end ());
-	}
-    }
+      gdb_index_symbol_kind kind = GDB_INDEX_SYMBOL_KIND_VALUE (val);
+      if (kind != GDB_INDEX_SYMBOL_KIND_TYPE
+	  && kind != GDB_INDEX_SYMBOL_KIND_VARIABLE)
+	return false;
+
+      val &= ~GDB_INDEX_CU_MASK;
+      return !seen.insert (val).second;
+    });
+  cu_indices.erase (from, cu_indices.end ());
 }
 
 /* A form of 'const char *' suitable for container keys.  Only the
@@ -1103,15 +1129,6 @@ write_cooked_index (cooked_index_vector *table,
 		    const cu_index_map &cu_index_htab,
 		    struct mapped_symtab *symtab)
 {
-  /* We track type names and only enter a given type once.  */
-  htab_up type_names (htab_create_alloc (10, htab_hash_string, htab_eq_string,
-					 nullptr, xcalloc, xfree));
-  /* Same with variable names.  However, if a type and variable share
-     a name, we want both, which is why there are two hash tables
-     here.  */
-  htab_up var_names (htab_create_alloc (10, htab_hash_string, htab_eq_string,
-					nullptr, xcalloc, xfree));
-
   const char *main_for_ada = main_name ();
 
   for (const cooked_index_entry *entry : table->all_entries ())
@@ -1159,24 +1176,12 @@ write_cooked_index (cooked_index_vector *table,
       else if (entry->tag == DW_TAG_variable
 	       || entry->tag == DW_TAG_constant
 	       || entry->tag == DW_TAG_enumerator)
-	{
-	  kind = GDB_INDEX_SYMBOL_KIND_VARIABLE;
-	  void **slot = htab_find_slot (var_names.get (), name, INSERT);
-	  if (*slot != nullptr)
-	    continue;
-	  *slot = (void *) name;
-	}
+	kind = GDB_INDEX_SYMBOL_KIND_VARIABLE;
       else if (entry->tag == DW_TAG_module
 	       || entry->tag == DW_TAG_common_block)
 	kind = GDB_INDEX_SYMBOL_KIND_OTHER;
       else
-	{
-	  kind = GDB_INDEX_SYMBOL_KIND_TYPE;
-	  void **slot = htab_find_slot (type_names.get (), name, INSERT);
-	  if (*slot != nullptr)
-	    continue;
-	  *slot = (void *) name;
-	}
+	kind = GDB_INDEX_SYMBOL_KIND_TYPE;
 
       add_index_entry (symtab, name, (entry->flags & IS_STATIC) != 0,
 		       kind, it->second);
@@ -1254,7 +1259,7 @@ write_gdbindex (dwarf2_per_objfile *per_objfile,
 
   /* Now that we've processed all symbols we can shrink their cu_indices
      lists.  */
-  uniquify_cu_indices (&symtab);
+  symtab.minimize ();
 
   data_buf symtab_vec, constant_pool;
   if (symtab.n_elements == 0)
-- 
2.34.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Fix .gdb_index with Ada
  2022-09-22 20:20 [PATCH 0/2] Fix .gdb_index with Ada Tom Tromey
  2022-09-22 20:20 ` [PATCH 1/2] Improve Ada support in .gdb_index Tom Tromey
  2022-09-22 20:20 ` [PATCH 2/2] Change .gdb_index de-duplication implementation Tom Tromey
@ 2022-09-28  2:00 ` Tom de Vries
  2022-10-11 19:59   ` Tom Tromey
  2 siblings, 1 reply; 8+ messages in thread
From: Tom de Vries @ 2022-09-28  2:00 UTC (permalink / raw)
  To: Tom Tromey, gdb-patches

On 9/22/22 22:20, Tom Tromey via Gdb-patches wrote:
> This series improves .gdb_index support for Ada, fixing a regression
> introduced by the DWARF reader rewrite.
> 
> I regression tested this using the cc-with-gdb-index target board on
> x86-64 Fedora 34.  There were no regressions, only improvements.
> 

Hi,

I've tested this series with target board cc-with-gdb-index.

The only FAILs I found were:
...
$ grep ^FAIL: gdb.sum
FAIL: gdb.base/c-linkage-name.exp: print symada__cS before partial 
symtab expansion
FAIL: gdb.dwarf2/dw2-zero-range.exp: ranges_sect=ranges: Zero address 
complaint - relocated - psymtab
FAIL: gdb.dwarf2/dw2-zero-range.exp: ranges_sect=ranges: Zero address 
complaint - unrelocated - psymtab
FAIL: gdb.dwarf2/dw2-zero-range.exp: ranges_sect=rnglists: Zero address 
complaint - relocated - psymtab
FAIL: gdb.dwarf2/dw2-zero-range.exp: ranges_sect=rnglists: Zero address 
complaint - unrelocated - psymtab
FAIL: gdb.python/py-symbol.exp: print (len (gdb.lookup_static_symbols 
('rr')))
...

The FAILs from gdb.python/py-symbol.exp and 
gdb.dwarf2/dw2-zero-range.exp are know, they also fail with target board 
cc-with-debug-names.

The gdb.base/c-linkage-name.exp FAIL does look relevant to this series, 
and probably was regressed by the same offending commit.

Doing:
...
diff --git a/gdb/dwarf2/index-write.c b/gdb/dwarf2/index-write.c
index ae05946e790..0268371ec2e 100644
--- a/gdb/dwarf2/index-write.c
+++ b/gdb/dwarf2/index-write.c
@@ -1167,7 +1167,6 @@ write_cooked_index (cooked_index_vector *table,
              minimal symbols anyway, so including it in the index is
              usually redundant -- and the cases where it would not be
              redundant are rare and not worth supporting.  */
-         continue;
         }

        gdb_index_symbol_kind kind;
...
fixes the FAIL, so is this one of the "rare and not worth supporting" 
cases you're referring to?

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Fix .gdb_index with Ada
  2022-09-28  2:00 ` [PATCH 0/2] Fix .gdb_index with Ada Tom de Vries
@ 2022-10-11 19:59   ` Tom Tromey
  2022-10-13 20:40     ` Tom Tromey
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Tromey @ 2022-10-11 19:59 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Tom Tromey, gdb-patches

>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:

Tom> The gdb.base/c-linkage-name.exp FAIL does look relevant to this
Tom> series, and probably was regressed by the same offending commit.

I see this was failing before my series -- but it works in gdb 12.
So, it's one of the regressions the series was intended to fix :(

Tom> Doing:
Tom> ...
Tom> diff --git a/gdb/dwarf2/index-write.c b/gdb/dwarf2/index-write.c
Tom> index ae05946e790..0268371ec2e 100644
Tom> --- a/gdb/dwarf2/index-write.c
Tom> +++ b/gdb/dwarf2/index-write.c
Tom> @@ -1167,7 +1167,6 @@ write_cooked_index (cooked_index_vector *table,
Tom>              minimal symbols anyway, so including it in the index is
Tom>              usually redundant -- and the cases where it would not be
Tom>              redundant are rare and not worth supporting.  */
Tom> -         continue;
Tom>         }

Tom>        gdb_index_symbol_kind kind;
Tom> ...
Tom> fixes the FAIL, so is this one of the "rare and not worth supporting"
Tom> cases you're referring to?

I thought that was necessary to avoid redundancy in the index, but I see
now it isn't, or at least not in that way.  I'm looking again at why the
new indices are larger in general.

Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Fix .gdb_index with Ada
  2022-10-11 19:59   ` Tom Tromey
@ 2022-10-13 20:40     ` Tom Tromey
  2022-10-13 21:44       ` Tom de Vries
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Tromey @ 2022-10-13 20:40 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Tom de Vries, gdb-patches

Tom> [ ... patch ... ]
Tom> fixes the FAIL, so is this one of the "rare and not worth supporting"
Tom> cases you're referring to?

> I thought that was necessary to avoid redundancy in the index, but I see
> now it isn't, or at least not in that way.  I'm looking again at why the
> new indices are larger in general.

I looked into this more.

Older versions of gdb don't add C++ symbols to the index, so when I
diff'd the indexes I saw a lot of "_Z" additions.  Locally I've changed
this code to skip linkage names for C++ only.

I compared the symbols from old and new indexes.  In every case (except
the one below) I checked, the new gdb seemed more correct.  In
particular it added inlined functions to the index, and it used the
correct name for "enum class" enumerator constants.

I did find out that the new index included entries for the linkage names
of classes.  This isn't generally useful, and they have weird names like
"6mumble", so I also have a patch to drop these entries from the cooked
index entirely.

Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Fix .gdb_index with Ada
  2022-10-13 20:40     ` Tom Tromey
@ 2022-10-13 21:44       ` Tom de Vries
  2022-10-14 13:24         ` Tom Tromey
  0 siblings, 1 reply; 8+ messages in thread
From: Tom de Vries @ 2022-10-13 21:44 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb-patches

On 10/13/22 22:40, Tom Tromey wrote:
> Tom> [ ... patch ... ]
> Tom> fixes the FAIL, so is this one of the "rare and not worth supporting"
> Tom> cases you're referring to?
> 
>> I thought that was necessary to avoid redundancy in the index, but I see
>> now it isn't, or at least not in that way.  I'm looking again at why the
>> new indices are larger in general.
> 
> I looked into this more.
> 
> Older versions of gdb don't add C++ symbols to the index, so when I
> diff'd the indexes I saw a lot of "_Z" additions.  Locally I've changed
> this code to skip linkage names for C++ only.
> 

Ah, does that then fix the c-linkage-name.exp regression?

Thanks,
- Tom


> I compared the symbols from old and new indexes.  In every case (except
> the one below) I checked, the new gdb seemed more correct.  In
> particular it added inlined functions to the index, and it used the
> correct name for "enum class" enumerator constants.
> 
> I did find out that the new index included entries for the linkage names
> of classes.  This isn't generally useful, and they have weird names like
> "6mumble", so I also have a patch to drop these entries from the cooked
> index entirely.
> 
> Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Fix .gdb_index with Ada
  2022-10-13 21:44       ` Tom de Vries
@ 2022-10-14 13:24         ` Tom Tromey
  0 siblings, 0 replies; 8+ messages in thread
From: Tom Tromey @ 2022-10-14 13:24 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Tom Tromey, gdb-patches

>> I looked into this more.
>> Older versions of gdb don't add C++ symbols to the index, so when I
>> diff'd the indexes I saw a lot of "_Z" additions.  Locally I've changed
>> this code to skip linkage names for C++ only.
>> 

Tom> Ah, does that then fix the c-linkage-name.exp regression?

Yeah.  I'm testing the updated series now, I hope to send it today.

Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-10-14 13:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-22 20:20 [PATCH 0/2] Fix .gdb_index with Ada Tom Tromey
2022-09-22 20:20 ` [PATCH 1/2] Improve Ada support in .gdb_index Tom Tromey
2022-09-22 20:20 ` [PATCH 2/2] Change .gdb_index de-duplication implementation Tom Tromey
2022-09-28  2:00 ` [PATCH 0/2] Fix .gdb_index with Ada Tom de Vries
2022-10-11 19:59   ` Tom Tromey
2022-10-13 20:40     ` Tom Tromey
2022-10-13 21:44       ` Tom de Vries
2022-10-14 13:24         ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).