[PATCH] gdb: remove some caching from the dwarf reader

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [PATCH] gdb: remove some caching from the dwarf reader
@ 2021-04-19 13:39 Andrew Burgess
  2021-04-22 16:36 ` Tom Tromey
  0 siblings, 1 reply; 2+ messages in thread
From: Andrew Burgess @ 2021-04-19 13:39 UTC (permalink / raw)
  To: gdb-patches

While working on some changes to 'info sources' I ran into a situation
where I was seeing the same source files reported twice in the output
of the 'info sources' command when using either .gdb_index or the
.debug_name index.

I traced the problem back to some caching in
dwarf2_base_index_functions::map_symbol_filenames; when called GDB
caches the set of filenames, but, filesnames are not removed as the
index entries are expanded into full symtabs.  As a result we can end
up seeing filenames reported both from a full symtab _and_ from
a (stale) previously cached index entry.

Now, obviously, when seeing a problem like this the "correct" fix is
to remove the stale entries from the cache, however, I ran a few
experiments to see why this wasn't really hitting us anywhere, and, as
far as I can tell, ::map_symbol_filenames is only called from three
places:

  1. The mi command -file-list-exec-source-files,
  2. The 'info sources' command, and
  3. Filename completion

However, the result of this "bug" is that we will see duplicate
filenames, and readline's completion mechanism already removes
duplicates, so for case #3 we will never see any problems.

Cases #1 and #2 are basically the same, and in each case, to see a
problem we need to ensure we craft the test in a particular way, start
up ensuring we have some unexpected symtabs, then run one of the
commands to populate the cache, then expand one of the symtabs, and
list the sources again.  At this point you'll see duplicate entries in
the results.  Hardly surprising we haven't randomly hit this situation
in testing.

So, considering that use cases #1 and #2 are certainly not "high
performance" code (i.e. I don't think these justify the need for
caching) this leaves use case #3.  Does this use justify the need for
caching?  Well the psymbol_functions::map_symbol_filenames function
doesn't seem to do any extra caching, and within
dwarf2_base_index_functions::map_symbol_filenames, the only expensive
bit appears to be the call to dw2_get_file_names, and this already
does its own caching via this_cu->v.quick->file_names.

The upshot of all this analysis was that I'm not convinced the need
for the additional caching is justified, and so, I propose that to fix
the bug in GDB, I just remove the extra caching (for now).

If we later find that the caching _was_ useful, then we can
reintroduce it, but add it back such that it doesn't reintroduce this
bug.

As I was changing dwarf2_base_index_functions::map_symbol_filenames I
replaced the use of htab_up with std::unordered_set.

Tested using target_boards cc-with-debug-names and dwarf4-gdb-index.

gdb/ChangeLog:

	* dwarf2/read.c: Add 'unordered_set' include.
	(dwarf2_base_index_functions::map_symbol_filenames): Replace
	'visited' hash table with 'qfn_cache' unordered_set.  Remove use
	of per_Bfd->filenames_cache cache, and use function local
	filenames_cache instead.  Reindent.
	* dwarf2/read.h (struct dwarf2_per_bfd) <filenames_cache>: Delete.

gdb/testsuite/ChangeLog:

	* gdb.base/info_sources.exp: Add new tests.
---
 gdb/ChangeLog                           |  9 ++++
 gdb/dwarf2/read.c                       | 71 ++++++++++---------------
 gdb/dwarf2/read.h                       |  4 --
 gdb/testsuite/ChangeLog                 |  4 ++
 gdb/testsuite/gdb.base/info_sources.exp | 17 +++---
 5 files changed, 53 insertions(+), 52 deletions(-)

diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index 40aeb298c14..306971ef1b7 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -88,6 +88,7 @@
 #include "rust-lang.h"
 #include "gdbsupport/pathstuff.h"
 #include "count-one-bits.h"
+#include <unordered_set>
 
 /* When == 1, print basic high level tracing messages.
    When > 1, be more verbose.
@@ -4713,58 +4714,44 @@ dwarf2_base_index_functions::map_symbol_filenames
 {
   dwarf2_per_objfile *per_objfile = get_dwarf2_per_objfile (objfile);
 
-  if (!per_objfile->per_bfd->filenames_cache)
-    {
-      per_objfile->per_bfd->filenames_cache.emplace ();
-
-      htab_up visited (htab_create_alloc (10,
-					  htab_hash_pointer, htab_eq_pointer,
-					  NULL, xcalloc, xfree));
+  /* Use caches to ensure we only call FUN once for each filename.  */
+  filename_seen_cache filenames_cache;
+  std::unordered_set<quick_file_names *> qfn_cache;
 
-      /* The rule is CUs specify all the files, including those used
-	 by any TU, so there's no need to scan TUs here.  We can
-	 ignore file names coming from already-expanded CUs.  */
+  /* The rule is CUs specify all the files, including those used by any TU,
+     so there's no need to scan TUs here.  We can ignore file names coming
+     from already-expanded CUs.  It is possible that an expanded CU might
+     reuse the file names data from a currently unexpanded CU, in this
+     case we don't want to report the files from the unexpanded CU.  */
 
-      for (dwarf2_per_cu_data *per_cu : per_objfile->per_bfd->all_comp_units)
+  for (dwarf2_per_cu_data *per_cu : per_objfile->per_bfd->all_comp_units)
+    {
+      if (per_objfile->symtab_set_p (per_cu))
 	{
-	  if (per_objfile->symtab_set_p (per_cu))
-	    {
-	      void **slot = htab_find_slot (visited.get (),
-					    per_cu->v.quick->file_names,
-					    INSERT);
-
-	      *slot = per_cu->v.quick->file_names;
-	    }
+	  if (per_cu->v.quick->file_names != nullptr)
+	    qfn_cache.insert (per_cu->v.quick->file_names);
 	}
+    }
 
-      for (dwarf2_per_cu_data *per_cu : per_objfile->per_bfd->all_comp_units)
-	{
-	  /* We only need to look at symtabs not already expanded.  */
-	  if (per_objfile->symtab_set_p (per_cu))
-	    continue;
-
-	  quick_file_names *file_data
-	    = dw2_get_file_names (per_cu, per_objfile);
-	  if (file_data == NULL)
-	    continue;
+  for (dwarf2_per_cu_data *per_cu : per_objfile->per_bfd->all_comp_units)
+    {
+      /* We only need to look at symtabs not already expanded.  */
+      if (per_objfile->symtab_set_p (per_cu))
+	continue;
 
-	  void **slot = htab_find_slot (visited.get (), file_data, INSERT);
-	  if (*slot)
-	    {
-	      /* Already visited.  */
-	      continue;
-	    }
-	  *slot = file_data;
+      quick_file_names *file_data = dw2_get_file_names (per_cu, per_objfile);
+      if (file_data == nullptr
+	  || qfn_cache.find (file_data) != qfn_cache.end ())
+	continue;
 
-	  for (int j = 0; j < file_data->num_file_names; ++j)
-	    {
-	      const char *filename = file_data->file_names[j];
-	      per_objfile->per_bfd->filenames_cache->seen (filename);
-	    }
+      for (int j = 0; j < file_data->num_file_names; ++j)
+	{
+	  const char *filename = file_data->file_names[j];
+	  filenames_cache.seen (filename);
 	}
     }
 
-  per_objfile->per_bfd->filenames_cache->traverse ([&] (const char *filename)
+  filenames_cache.traverse ([&] (const char *filename)
     {
       gdb::unique_xmalloc_ptr<char> this_real_name;
 
diff --git a/gdb/dwarf2/read.h b/gdb/dwarf2/read.h
index 416d8eae959..b33dc1b5964 100644
--- a/gdb/dwarf2/read.h
+++ b/gdb/dwarf2/read.h
@@ -235,10 +235,6 @@ struct dwarf2_per_bfd
   /* The CUs we recently read.  */
   std::vector<dwarf2_per_cu_data *> just_read_cus;
 
-  /* Table containing all filenames.  This is an optional because the
-     table is lazily constructed on first access.  */
-  gdb::optional<filename_seen_cache> filenames_cache;
-
   /* If we loaded the index from an external file, this contains the
      resources associated to the open file, memory mapping, etc.  */
   std::unique_ptr<index_cache_resource> index_cache_res;
diff --git a/gdb/testsuite/gdb.base/info_sources.exp b/gdb/testsuite/gdb.base/info_sources.exp
index 81edb3da8d7..a4f7d1966e6 100644
--- a/gdb/testsuite/gdb.base/info_sources.exp
+++ b/gdb/testsuite/gdb.base/info_sources.exp
@@ -28,18 +28,18 @@ if {[prepare_for_testing $testfile.exp $testfile \
 # in the output.  Similarly, EXPECT_SEEN_INFO_SOURCES_BASE indicates that the source file
 # info_sources_base.c must be seen in the output.
 proc test_info_sources {args expect_seen_info_sources expect_seen_info_sources_base} {
-    global gdb_prompt
+    global gdb_prompt srcfile srcfile2
 
     set seen_info_sources 0
     set seen_info_sources_base 0
     set cmd [concat "info sources " $args]
     gdb_test_multiple $cmd $cmd {
-	-re "^\[^,\]*info_sources.c(, |\[\r\n\]+)" {
-	    set seen_info_sources 1
+	-re "^\[^,\]*${srcfile}(, |\[\r\n\]+)" {
+	    incr seen_info_sources
 	    exp_continue
 	}
-	-re "^\[^,\]*info_sources_base.c(, |\[\r\n\]+)" {
-	    set seen_info_sources_base 1
+	-re "^\[^,\]*${srcfile2}(, |\[\r\n\]+)" {
+	    incr seen_info_sources_base 1
 	    exp_continue
 	}
 	-re ", " {
@@ -60,8 +60,13 @@ if ![runto_main] {
     untested $testfile.exp
     return -1
 }
-gdb_test "break some_other_func" ""
 
+# List both files with no regexp:
+with_test_prefix "in main" {
+    test_info_sources "" 1 1
+}
+
+gdb_test "break some_other_func" ""
 gdb_test "continue"
 
 # List both files with no regexp:
-- 
2.25.4


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] gdb: remove some caching from the dwarf reader
  2021-04-19 13:39 [PATCH] gdb: remove some caching from the dwarf reader Andrew Burgess
@ 2021-04-22 16:36 ` Tom Tromey
  0 siblings, 0 replies; 2+ messages in thread
From: Tom Tromey @ 2021-04-22 16:36 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches

>>>>> "Andrew" == Andrew Burgess <andrew.burgess@embecosm.com> writes:

Andrew> So, considering that use cases #1 and #2 are certainly not "high
Andrew> performance" code (i.e. I don't think these justify the need for
Andrew> caching) this leaves use case #3.  Does this use justify the need for
Andrew> caching?  Well the psymbol_functions::map_symbol_filenames function
Andrew> doesn't seem to do any extra caching, and within
Andrew> dwarf2_base_index_functions::map_symbol_filenames, the only expensive
Andrew> bit appears to be the call to dw2_get_file_names, and this already
Andrew> does its own caching via this_cu->v.quick->file_names.

This makes sense to me.

Andrew> gdb/ChangeLog:

Andrew> 	* dwarf2/read.c: Add 'unordered_set' include.
Andrew> 	(dwarf2_base_index_functions::map_symbol_filenames): Replace
Andrew> 	'visited' hash table with 'qfn_cache' unordered_set.  Remove use
Andrew> 	of per_Bfd->filenames_cache cache, and use function local
Andrew> 	filenames_cache instead.  Reindent.
Andrew> 	* dwarf2/read.h (struct dwarf2_per_bfd) <filenames_cache>: Delete.

Looks good, thank you.

Tom

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-04-22 16:36 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-19 13:39 [PATCH] gdb: remove some caching from the dwarf reader Andrew Burgess
2021-04-22 16:36 ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).