On 7/20/20 9:41 PM, Simon Marchi via Gdb-patches wrote:
> One regcache object is created for each stopped thread and is stored in
> the regcache::regcaches linked list.  Looking up a regcache for a given
> thread is therefore in O(number of threads).  Stopping all threads then
> becomes O((number of threads) ^ 2).  It becomes noticeable when
> debugging thousands of threads, which is typical with GPU targets.  This
> patch replaces the linked list with an std::unordered_multimap, indexed
> by (target, ptid).
> 
> I originally designed it using an std::unordered_map with (target, ptid,
> arch) as the key, because that's how lookups are done (in
> get_thread_arch_aspace_regcache).  However, the registers_changed_ptid
> function, also somewhat on the hot path (it is used when resuming
> threads), needs to delete all regcaches associated to a given (target,
> ptid) tuple.  Using (target, ptid) as a key allows to do this more
> efficiently (see exception below).  If the key of the map was (target,
> ptid, arch), we'd have to walk all items of the map.
> 
> The lookup (in get_thread_arch_aspace_regcache), walks over all existing
> regcaches belonging to this (target, ptid), looking to find the one with
> the right arch.  This is ok, as there will be very few regcaches for a
> given key (typically one).  Lookups become faster when the number of
> threads grows, compared to the linked list.  With a small number of
> threads, it will probably be a bit slower to do a map lookup than to
> walk a few linked list nodes, but I don't think it will be noticeable in
> practice.
> 
> The function registers_changed_ptid deletes all regcaches related to a
> given (target, ptid).  We must now handle the different cases
> separately:
> 
> - NULL target and minus_one_ptid: we delete all the entries
> - NULL target and non-minus_one_ptid: invalid (checked by assert)
> - non-NULL target and non-minus_one_ptid: we delete all the entries
>   associated to that tuple, this is done efficiently
> - a non-NULL target and minus_one_ptid: we delete all the entries
>   associated to that target, whatever the ptid.  This is the slightly
>   annoying case, as we can't easily look up all items having this target
>   in their key.  I implemented it by walking the list, which is not
>   ideal.

Given the last case, did you consider using a two-level map instead?

  using ptid_regcache_map 
    = std::unordered_multimap<ptid_t, regcache *>;

  using target_ptid_regcache_map 
    = std::unordered_map<process_stratum_target *, ptid_regcache_map>;

  static target_ptid_regcache_map regcaches;

The result for registers_changed_ptid would be:

 - NULL target and minus_one_ptid: we delete all the REGCACHES entries,

 - NULL target and non-minus_one_ptid: invalid (checked by assert)

 - non-NULL target and non-minus_one_ptid: we lookup the target
   in REGCACHES, and then delete all the entries in that map
   associated to that ptid.

 - a non-NULL target and minus_one_ptid: we delete the map entry
   in REGCACHES associated to that target.

Looking up a regcache does two map lookups instead of one, but
that is 2x amortized O(1), so should be a constant hit.

I gave this a try, to see how feasible it would be.  See attached
patches on top of yours (the first two address comments I'll make below).
I've also put these in the users/palves/regcache-map branch on
sourceware.org.

I did not run performance tests, though.  It may be that in
registers_changed_ptid, it would be more efficient to
not clear the first level map entry for a given target (i.e.,
destroy the second level map object completely), but instead
clear the second level map entry.  I.e., avoid destroying the
unordered_multimap for a given target, only to recreate it soon
enough.  But I kept it simple in case that isn't noticeable.

A similar two-level data structure would be to put an instance of:

  using ptid_regcache_map 
    = std::unordered_multimap<ptid_t, regcache *>;

in process_stratum_target directly.

IOW, target-connection.c:process_targets has a list of open targets,
which could be used to walk over all targets in the
"NULL target and minus_one_ptid" scenario.

> 
> The function regcache_thread_ptid_changed is called when a thread
> changes ptid.  It is implemented efficiently using the map, although
> that's not very important: it is not called often, mostly when creating
> an inferior, on some specific platforms.
> 
> Note: In hash_target_ptid, I am combining hash values from std::hash by
> summing them.  I don't think it's ideal, since std::hash is just the
> identity function for base types.  But I don't know what would be better
> to reduce the change of collisions.  If anybody has a better idea, I'd
> be interested.

I'd maybe look in some kernel sources, e.g., Linux or BSD, what they
use as hash function for pids.

> 
> This patch is a tiny bit from ROCm GDB [1] we would like to merge
> upstream.  Laurent Morichetti gave be these performance numbers:
> 
> The benchmark used is:
> 
>   time ./gdb --data-directory=data-directory /extra/lmoriche/hip/samples/0_Intro/bit_extract/bit_extract -ex "set pagination off" -ex "set breakpoint pending on" -ex "b bit_extract_kernel if \$_thread == 5" -ex run -ex c -batch
> 
> It measures the time it takes to continue from a conditional breakpoint with
> 2048 threads at that breakpoint, one of them reporting the breakpoint.
> 
> baseline:
> real    0m10.227s
> real    0m10.177s
> real    0m10.362s
> 
> with patch:
> real    0m8.356s
> real    0m8.424s
> real    0m8.494s
> 
> [1] https://github.com/ROCm-Developer-Tools/ROCgdb
> 
> gdb/ChangeLog:
> 
> 	* regcache.c (struct target_ptid): New struct.
> 	(hash_target_ptid): New struct.
> 	(target_ptid_regcache_map): New type.
> 	(regcaches): Change type to target_ptid_regcache_map.
> 	(get_thread_arch_aspace_regcache): Update to regcaches' new
> 	type.
> 	(regcache_thread_ptid_changed): Likewise.
> 	(registers_changed_ptid): Likewise.
> 	(regcaches_size): Likewise.
> 	(regcaches_test): Update.
> 	(regcache_thread_ptid_changed): Update.
> 	* gdbsupport/ptid.h (hash_ptid): New struct.
> 
> Change-Id: Iabb0a1111707936ca111ddb13f3b09efa83d3402
> ---
>  gdb/regcache.c    | 192 ++++++++++++++++++++++++++++++++--------------
>  gdbsupport/ptid.h |  16 ++++
>  2 files changed, 152 insertions(+), 56 deletions(-)
> 
> diff --git a/gdb/regcache.c b/gdb/regcache.c
> index fb20250d20f0..eed8a8bde6e5 100644
> --- a/gdb/regcache.c
> +++ b/gdb/regcache.c
> @@ -29,7 +29,7 @@
>  #include "reggroups.h"
>  #include "observable.h"
>  #include "regset.h"
> -#include <forward_list>
> +#include <unordered_map>
>  
>  /*
>   * DATA STRUCTURE
> @@ -313,32 +313,73 @@ reg_buffer::assert_regnum (int regnum) const
>      gdb_assert (regnum < gdbarch_num_regs (arch ()));
>  }
>  
> -/* Global structure containing the current regcache.  */
> +/* Key type for target_ptid_regcache_map.  */
> +
> +struct target_ptid
> +{
> +  target_ptid (process_stratum_target *target, ptid_t ptid)
> +    : target (target), ptid (ptid)
> +  {}
> +
> +  process_stratum_target *target;
> +  ptid_t ptid;
> +
> +  bool operator== (const target_ptid &other) const
> +  {
> +    return (this->target == other.target
> +	    && this->ptid == other.ptid);
> +  }
> +};
> +
> +/* Hash function for target_ptid.  */
> +
> +struct hash_target_ptid
> +{
> +  size_t operator() (const target_ptid &val) const
> +  {
> +    std::hash<long> h_long;
> +    hash_ptid h_ptid;
> +
> +    return h_long ((long) val.target) + h_ptid (val.ptid);

Note that on 64-bit Windows, long is 32-bit, so pointers don't
fit.  It just means you're only hashing half the bits.
I wonder whether using libiberty's htab_hash_pointer would be
a good idea here.

> +  }
> +};
> +
> +/* Type to map a (target, ptid) tuple to a regcache.  */
> +
> +using target_ptid_regcache_map
> +  = std::unordered_multimap<target_ptid, regcache *, hash_target_ptid>;
> +
> +/* Global structure containing the existing regcaches.  */
>  
>  /* NOTE: this is a write-through cache.  There is no "dirty" bit for
>     recording if the register values have been changed (eg. by the
>     user).  Therefore all registers must be written back to the
>     target when appropriate.  */
> -static std::forward_list<regcache *> regcaches;
> +static target_ptid_regcache_map regcaches;
>  
>  struct regcache *
>  get_thread_arch_aspace_regcache (process_stratum_target *target,
> -				 ptid_t ptid, struct gdbarch *gdbarch,
> +				 ptid_t ptid, gdbarch *arch,
>  				 struct address_space *aspace)
>  {
>    gdb_assert (target != nullptr);
>  
> -  for (const auto &regcache : regcaches)
> -    if (regcache->target () == target
> -	&& regcache->ptid () == ptid
> -	&& regcache->arch () == gdbarch)
> -      return regcache;
> -
> -  regcache *new_regcache = new regcache (target, gdbarch, aspace);
> +  /* Look up the regcache for this (target, ptid, arch).  */
> +  target_ptid key (target, ptid);
> +  auto range = regcaches.equal_range (key);
> +  for (auto it = range.first; it != range.second; it++)
> +    {
> +      if (it->second->arch () == arch)
> +	return it->second;
> +    }
>  
> -  regcaches.push_front (new_regcache);
> +  /* It does not exist, create it.  */
> +  regcache *new_regcache = new regcache (target, arch, aspace);
>    new_regcache->set_ptid (ptid);
>  
> +  /* Insert it in the map.  */
> +  regcaches.insert (std::make_pair (key, new_regcache));
> +
>    return new_regcache;
>  }
>  
> @@ -417,10 +458,25 @@ static void
>  regcache_thread_ptid_changed (process_stratum_target *target,
>  			      ptid_t old_ptid, ptid_t new_ptid)
>  {
> -  for (auto &regcache : regcaches)
> +  /* Find all the regcaches to updates.  */

typo: to update.

> +  std::vector<regcache *> regcaches_to_update;
> +  target_ptid old_key (target, old_ptid);
> +  auto range = regcaches.equal_range (old_key);
> +  for (auto it = range.first; it != range.second; it++)
> +    regcaches_to_update.push_back (it->second);
> +
> +  /* Remove all entries associated to OLD_KEY.  We could have erased the items
> +     in the previous `for`, but it is only safe to erase items while iterating
> +     starting with C++14.  */
> +  regcaches.erase (old_key);

I don't think that's really true in practice.  The idiom used pretty
much by everyone is:

  for (auto it = range.first; it != range.second;)
    {
      regcaches_to_update.push_back (it->second);
      it = regcaches.erase (it);
    }

Note that even the resolution of the C++ issue at:

 https://wg21.cmeerw.net/lwg/issue2356 

says:

 "not that any actual implementation does that, anyway"

There's no need to be super pedantic wrt to an issue
that no compiler is going to miscompile.

See here as well:
 https://stackoverflow.com/questions/25047241/c11-is-it-safe-to-remove-individual-elements-from-stdunordered-map-while-it

And then, given we know that there are no entries for
new_ptid in the map, I think we could instead do it all in
one iteration, like:

  target_ptid old_key (target, old_ptid);
  target_ptid new_key (target, new_ptid);

  auto range = regcaches.equal_range (old_key);
  for (auto it = range.first; it != range.second;)
    {
      regcache *rc = it->second;
      rc->set_ptid (new_ptid);

      /* Remove old before inserting new, to avoid rehashing, which
         would invalidate iterators.  */
      it = regcaches.erase (it);
      regcaches.insert (std::make_pair (new_key, rc));
    }

> +
> +  /* Update the regcaches' ptid, insert them back in the map with an updated
> +     key.  */
> +  target_ptid new_key (target, new_ptid);
> +  for (regcache *rc : regcaches_to_update)
>      {
> -      if (regcache->ptid () == old_ptid && regcache->target () == target)
> -	regcache->set_ptid (new_ptid);
> +      rc->set_ptid (new_ptid);
> +      regcaches.insert (std::make_pair (new_key, rc));
>      }
>  }
>  
> @@ -438,19 +494,53 @@ regcache_thread_ptid_changed (process_stratum_target *target,
>  void
>  registers_changed_ptid (process_stratum_target *target, ptid_t ptid)
>  {
> -  for (auto oit = regcaches.before_begin (), it = std::next (oit);
> -       it != regcaches.end (); )
> +  if (target == nullptr)
>      {
> -      struct regcache *regcache = *it;
> -      if ((target == nullptr || regcache->target () == target)
> -	  && regcache->ptid ().matches (ptid))
> -	{
> -	  delete regcache;
> -	  it = regcaches.erase_after (oit);
> -	}
> -      else
> -	oit = it++;
> +      /* Since there can be ptid clashes between targets, it's not valid to
> +         pass a ptid without saying to which target it belongs.  */
> +      gdb_assert (ptid == minus_one_ptid);
> +
> +      /* Delete all the regcaches.  */
> +      for (auto pair : regcaches)
> +	delete pair.second;

We could store a std::unique_ptr<regcache> in the map instead
to avoid all these manual deletes.

> +
> +      regcaches.clear ();
>      }
> +  else if (ptid != minus_one_ptid)
> +    {
> +      /* Non-NULL target and non-minus_one_ptid, delete all regcaches belonging
> +         to this (TARGET, PTID).  */
> +      target_ptid key (target, ptid);
> +      auto range = regcaches.equal_range (key);
> +      for (auto it = range.first; it != range.second; it++)
> +	delete it->second;

Like this one.

> +
> +      regcaches.erase (key);

Here as well could do:

      for (auto it = range.first; it != range.second;)
        {
	  delete it->second;
	  it = regcaches.erase (it);
        }

> +    }
> +  else
> +    {
> +       /* Non-NULL target and minus_one_ptid, delete all regcaches associated
> +          to this target.
> +
> +          We unfortunately don't have an efficient way to do this.  Fall back
> +          to iterating all items to find all those belonging to TARGET.
> +
> +          Note that in C++11, it's not safe to erase map entries while
> +          iterating, so we keep track of them and delete them at the end.  */

Here as well.

> +       std::vector<target_ptid> keys;
> +
> +       for (auto pair : regcaches)
> +	 {
> +	   if (pair.second->target () == target)
> +	     {
> +	       keys.push_back (pair.first);
> +	       delete pair.second;
> +	     }
> +	 }
> +
> +       for (auto key : keys)
> +	 regcaches.erase (key);
> +     }
>  
>    if ((target == nullptr || current_thread_target == target)
>        && current_thread_ptid.matches (ptid))
> @@ -1431,13 +1521,6 @@ register_dump::dump (ui_file *file)
>  
>  namespace selftests {
>  
> -static size_t
> -regcaches_size ()
> -{
> -  return std::distance (regcaches.begin (),
> -			  regcaches.end ());
> -}
> -
>  /* Wrapper around get_thread_arch_aspace_regcache that does some self checks.  */
>  
>  static void
> @@ -1457,7 +1540,7 @@ static void
>  regcaches_test (void)
>  {
>    /* It is empty at the start.  */
> -  SELF_CHECK (regcaches_size () == 0);
> +  SELF_CHECK (regcaches.size () == 0);
>  
>    ptid_t ptid1 (1), ptid2 (2), ptid3 (3);
>  
> @@ -1469,52 +1552,52 @@ regcaches_test (void)
>    test_get_thread_arch_aspace_regcache (&test_target1, ptid1,
>  					target_gdbarch (),
>  					NULL);
> -  SELF_CHECK (regcaches_size () == 1);
> +  SELF_CHECK (regcaches.size () == 1);
>  
>    /* Get regcache from (target1,ptid2), a new regcache is added to
>       REGCACHES.  */
>    test_get_thread_arch_aspace_regcache (&test_target1, ptid2,
>  					target_gdbarch (),
>  					NULL);
> -  SELF_CHECK (regcaches_size () == 2);
> +  SELF_CHECK (regcaches.size () == 2);
>  
>    /* Get regcache from (target1,ptid3), a new regcache is added to
>       REGCACHES.  */
>    test_get_thread_arch_aspace_regcache (&test_target1, ptid3,
>  					target_gdbarch (),
>  					NULL);
> -  SELF_CHECK (regcaches_size () == 3);
> +  SELF_CHECK (regcaches.size () == 3);
>  
>    /* Get regcache from (target1,ptid2) again, nothing is added to
>       REGCACHES.  */
>    test_get_thread_arch_aspace_regcache (&test_target1, ptid2,
>  					target_gdbarch (),
>  					NULL);
> -  SELF_CHECK (regcaches_size () == 3);
> +  SELF_CHECK (regcaches.size () == 3);
>  
>    /* Get regcache from (target2,ptid2), a new regcache is added to
>       REGCACHES, since this time we're using a different target.  */
>    test_get_thread_arch_aspace_regcache (&test_target2, ptid2,
>  					target_gdbarch (),
>  					NULL);
> -  SELF_CHECK (regcaches_size () == 4);
> +  SELF_CHECK (regcaches.size () == 4);
>  
>    /* Mark that (target1,ptid2) changed.  The regcache of (target1,
>       ptid2) should be removed from REGCACHES.  */
>    registers_changed_ptid (&test_target1, ptid2);
> -  SELF_CHECK (regcaches_size () == 3);
> +  SELF_CHECK (regcaches.size () == 3);
>  
>    /* Get the regcache from (target2,ptid2) again, confirming the
>       registers_changed_ptid call above did not delete it.  */
>    test_get_thread_arch_aspace_regcache (&test_target2, ptid2,
>  					target_gdbarch (),
>  					NULL);
> -  SELF_CHECK (regcaches_size () == 3);
> +  SELF_CHECK (regcaches.size () == 3);
>  
>    /* Confirm that marking all regcaches of all targets as changed
>       clears REGCACHES.  */
>    registers_changed_ptid (nullptr, minus_one_ptid);
> -  SELF_CHECK (regcaches_size () == 0);
> +  SELF_CHECK (regcaches.size () == 0);
>  }
>  
>  class target_ops_no_register : public test_target_ops
> @@ -1847,27 +1930,24 @@ regcache_thread_ptid_changed ()
>    get_thread_arch_aspace_regcache (&target1.mock_target, old_ptid, arch, NULL);
>    get_thread_arch_aspace_regcache (&target2.mock_target, old_ptid, arch, NULL);
>  
> -  /* Return whether a regcache for (TARGET, PTID) exists in REGCACHES.  */
> -  auto regcache_exists = [] (process_stratum_target *target, ptid_t ptid)
> +  /* Return the count of regcaches for (TARGET, PTID) in REGCACHES.  */
> +  auto regcache_count = [] (process_stratum_target *target, ptid_t ptid)
>      {
> -      for (regcache *rc : regcaches)
> -	{
> -	  if (rc->target () == target && rc->ptid () == ptid)
> -	    return true;
> -	}
> +      target_ptid key (target, ptid);
>  
> -      return false;
> +      auto range = regcaches.equal_range (key);
> +      return std::distance (range.first, range.second);
>      };
>  
> -  gdb_assert (regcaches_size () == 2);
> -  gdb_assert (regcache_exists (&target1.mock_target, old_ptid));
> -  gdb_assert (regcache_exists (&target2.mock_target, old_ptid));
> +  gdb_assert (regcaches.size () == 2);
> +  gdb_assert (regcache_count (&target1.mock_target, old_ptid) == 1);
> +  gdb_assert (regcache_count (&target2.mock_target, old_ptid) == 1);
>  
>    thread_change_ptid (&target1.mock_target, old_ptid, new_ptid);
>  
> -  gdb_assert (regcaches_size () == 2);
> -  gdb_assert (regcache_exists (&target1.mock_target, new_ptid));
> -  gdb_assert (regcache_exists (&target2.mock_target, old_ptid));
> +  gdb_assert (regcaches.size () == 2);
> +  gdb_assert (regcache_count (&target1.mock_target, new_ptid) == 1);
> +  gdb_assert (regcache_count (&target2.mock_target, old_ptid) == 1);
>  }
>  
>  } // namespace selftests
> diff --git a/gdbsupport/ptid.h b/gdbsupport/ptid.h
> index ef52d5551260..a528312bad5e 100644
> --- a/gdbsupport/ptid.h
> +++ b/gdbsupport/ptid.h
> @@ -32,6 +32,8 @@
>     thread_stratum target that might want to sit on top.
>  */
>  
> +#include <functional>
> +
>  class ptid_t
>  {
>  public:
> @@ -143,6 +145,20 @@ class ptid_t
>    long m_tid;
>  };
>  
> +/* Functor to hash a ptid.  */
> +
> +struct hash_ptid
> +{
> +  size_t operator() (const ptid_t &ptid) const
> +  {
> +    std::hash<long> long_hash;
> +
> +    return (long_hash (ptid.pid ())
> +	    + long_hash (ptid.lwp ())
> +	    + long_hash (ptid.tid ()));
> +  }
> +};
> +
>  /* The null or zero ptid, often used to indicate no process. */
>  
>  extern const ptid_t null_ptid;
> 

Thanks,
Pedro Alves