On 7/20/20 9:41 PM, Simon Marchi via Gdb-patches wrote: > One regcache object is created for each stopped thread and is stored in > the regcache::regcaches linked list. Looking up a regcache for a given > thread is therefore in O(number of threads). Stopping all threads then > becomes O((number of threads) ^ 2). It becomes noticeable when > debugging thousands of threads, which is typical with GPU targets. This > patch replaces the linked list with an std::unordered_multimap, indexed > by (target, ptid). > > I originally designed it using an std::unordered_map with (target, ptid, > arch) as the key, because that's how lookups are done (in > get_thread_arch_aspace_regcache). However, the registers_changed_ptid > function, also somewhat on the hot path (it is used when resuming > threads), needs to delete all regcaches associated to a given (target, > ptid) tuple. Using (target, ptid) as a key allows to do this more > efficiently (see exception below). If the key of the map was (target, > ptid, arch), we'd have to walk all items of the map. > > The lookup (in get_thread_arch_aspace_regcache), walks over all existing > regcaches belonging to this (target, ptid), looking to find the one with > the right arch. This is ok, as there will be very few regcaches for a > given key (typically one). Lookups become faster when the number of > threads grows, compared to the linked list. With a small number of > threads, it will probably be a bit slower to do a map lookup than to > walk a few linked list nodes, but I don't think it will be noticeable in > practice. > > The function registers_changed_ptid deletes all regcaches related to a > given (target, ptid). We must now handle the different cases > separately: > > - NULL target and minus_one_ptid: we delete all the entries > - NULL target and non-minus_one_ptid: invalid (checked by assert) > - non-NULL target and non-minus_one_ptid: we delete all the entries > associated to that tuple, this is done efficiently > - a non-NULL target and minus_one_ptid: we delete all the entries > associated to that target, whatever the ptid. This is the slightly > annoying case, as we can't easily look up all items having this target > in their key. I implemented it by walking the list, which is not > ideal. Given the last case, did you consider using a two-level map instead? using ptid_regcache_map = std::unordered_multimap; using target_ptid_regcache_map = std::unordered_map; static target_ptid_regcache_map regcaches; The result for registers_changed_ptid would be: - NULL target and minus_one_ptid: we delete all the REGCACHES entries, - NULL target and non-minus_one_ptid: invalid (checked by assert) - non-NULL target and non-minus_one_ptid: we lookup the target in REGCACHES, and then delete all the entries in that map associated to that ptid. - a non-NULL target and minus_one_ptid: we delete the map entry in REGCACHES associated to that target. Looking up a regcache does two map lookups instead of one, but that is 2x amortized O(1), so should be a constant hit. I gave this a try, to see how feasible it would be. See attached patches on top of yours (the first two address comments I'll make below). I've also put these in the users/palves/regcache-map branch on sourceware.org. I did not run performance tests, though. It may be that in registers_changed_ptid, it would be more efficient to not clear the first level map entry for a given target (i.e., destroy the second level map object completely), but instead clear the second level map entry. I.e., avoid destroying the unordered_multimap for a given target, only to recreate it soon enough. But I kept it simple in case that isn't noticeable. A similar two-level data structure would be to put an instance of: using ptid_regcache_map = std::unordered_multimap; in process_stratum_target directly. IOW, target-connection.c:process_targets has a list of open targets, which could be used to walk over all targets in the "NULL target and minus_one_ptid" scenario. > > The function regcache_thread_ptid_changed is called when a thread > changes ptid. It is implemented efficiently using the map, although > that's not very important: it is not called often, mostly when creating > an inferior, on some specific platforms. > > Note: In hash_target_ptid, I am combining hash values from std::hash by > summing them. I don't think it's ideal, since std::hash is just the > identity function for base types. But I don't know what would be better > to reduce the change of collisions. If anybody has a better idea, I'd > be interested. I'd maybe look in some kernel sources, e.g., Linux or BSD, what they use as hash function for pids. > > This patch is a tiny bit from ROCm GDB [1] we would like to merge > upstream. Laurent Morichetti gave be these performance numbers: > > The benchmark used is: > > time ./gdb --data-directory=data-directory /extra/lmoriche/hip/samples/0_Intro/bit_extract/bit_extract -ex "set pagination off" -ex "set breakpoint pending on" -ex "b bit_extract_kernel if \$_thread == 5" -ex run -ex c -batch > > It measures the time it takes to continue from a conditional breakpoint with > 2048 threads at that breakpoint, one of them reporting the breakpoint. > > baseline: > real 0m10.227s > real 0m10.177s > real 0m10.362s > > with patch: > real 0m8.356s > real 0m8.424s > real 0m8.494s > > [1] https://github.com/ROCm-Developer-Tools/ROCgdb > > gdb/ChangeLog: > > * regcache.c (struct target_ptid): New struct. > (hash_target_ptid): New struct. > (target_ptid_regcache_map): New type. > (regcaches): Change type to target_ptid_regcache_map. > (get_thread_arch_aspace_regcache): Update to regcaches' new > type. > (regcache_thread_ptid_changed): Likewise. > (registers_changed_ptid): Likewise. > (regcaches_size): Likewise. > (regcaches_test): Update. > (regcache_thread_ptid_changed): Update. > * gdbsupport/ptid.h (hash_ptid): New struct. > > Change-Id: Iabb0a1111707936ca111ddb13f3b09efa83d3402 > --- > gdb/regcache.c | 192 ++++++++++++++++++++++++++++++++-------------- > gdbsupport/ptid.h | 16 ++++ > 2 files changed, 152 insertions(+), 56 deletions(-) > > diff --git a/gdb/regcache.c b/gdb/regcache.c > index fb20250d20f0..eed8a8bde6e5 100644 > --- a/gdb/regcache.c > +++ b/gdb/regcache.c > @@ -29,7 +29,7 @@ > #include "reggroups.h" > #include "observable.h" > #include "regset.h" > -#include > +#include > > /* > * DATA STRUCTURE > @@ -313,32 +313,73 @@ reg_buffer::assert_regnum (int regnum) const > gdb_assert (regnum < gdbarch_num_regs (arch ())); > } > > -/* Global structure containing the current regcache. */ > +/* Key type for target_ptid_regcache_map. */ > + > +struct target_ptid > +{ > + target_ptid (process_stratum_target *target, ptid_t ptid) > + : target (target), ptid (ptid) > + {} > + > + process_stratum_target *target; > + ptid_t ptid; > + > + bool operator== (const target_ptid &other) const > + { > + return (this->target == other.target > + && this->ptid == other.ptid); > + } > +}; > + > +/* Hash function for target_ptid. */ > + > +struct hash_target_ptid > +{ > + size_t operator() (const target_ptid &val) const > + { > + std::hash h_long; > + hash_ptid h_ptid; > + > + return h_long ((long) val.target) + h_ptid (val.ptid); Note that on 64-bit Windows, long is 32-bit, so pointers don't fit. It just means you're only hashing half the bits. I wonder whether using libiberty's htab_hash_pointer would be a good idea here. > + } > +}; > + > +/* Type to map a (target, ptid) tuple to a regcache. */ > + > +using target_ptid_regcache_map > + = std::unordered_multimap; > + > +/* Global structure containing the existing regcaches. */ > > /* NOTE: this is a write-through cache. There is no "dirty" bit for > recording if the register values have been changed (eg. by the > user). Therefore all registers must be written back to the > target when appropriate. */ > -static std::forward_list regcaches; > +static target_ptid_regcache_map regcaches; > > struct regcache * > get_thread_arch_aspace_regcache (process_stratum_target *target, > - ptid_t ptid, struct gdbarch *gdbarch, > + ptid_t ptid, gdbarch *arch, > struct address_space *aspace) > { > gdb_assert (target != nullptr); > > - for (const auto ®cache : regcaches) > - if (regcache->target () == target > - && regcache->ptid () == ptid > - && regcache->arch () == gdbarch) > - return regcache; > - > - regcache *new_regcache = new regcache (target, gdbarch, aspace); > + /* Look up the regcache for this (target, ptid, arch). */ > + target_ptid key (target, ptid); > + auto range = regcaches.equal_range (key); > + for (auto it = range.first; it != range.second; it++) > + { > + if (it->second->arch () == arch) > + return it->second; > + } > > - regcaches.push_front (new_regcache); > + /* It does not exist, create it. */ > + regcache *new_regcache = new regcache (target, arch, aspace); > new_regcache->set_ptid (ptid); > > + /* Insert it in the map. */ > + regcaches.insert (std::make_pair (key, new_regcache)); > + > return new_regcache; > } > > @@ -417,10 +458,25 @@ static void > regcache_thread_ptid_changed (process_stratum_target *target, > ptid_t old_ptid, ptid_t new_ptid) > { > - for (auto ®cache : regcaches) > + /* Find all the regcaches to updates. */ typo: to update. > + std::vector regcaches_to_update; > + target_ptid old_key (target, old_ptid); > + auto range = regcaches.equal_range (old_key); > + for (auto it = range.first; it != range.second; it++) > + regcaches_to_update.push_back (it->second); > + > + /* Remove all entries associated to OLD_KEY. We could have erased the items > + in the previous `for`, but it is only safe to erase items while iterating > + starting with C++14. */ > + regcaches.erase (old_key); I don't think that's really true in practice. The idiom used pretty much by everyone is: for (auto it = range.first; it != range.second;) { regcaches_to_update.push_back (it->second); it = regcaches.erase (it); } Note that even the resolution of the C++ issue at: https://wg21.cmeerw.net/lwg/issue2356 says: "not that any actual implementation does that, anyway" There's no need to be super pedantic wrt to an issue that no compiler is going to miscompile. See here as well: https://stackoverflow.com/questions/25047241/c11-is-it-safe-to-remove-individual-elements-from-stdunordered-map-while-it And then, given we know that there are no entries for new_ptid in the map, I think we could instead do it all in one iteration, like: target_ptid old_key (target, old_ptid); target_ptid new_key (target, new_ptid); auto range = regcaches.equal_range (old_key); for (auto it = range.first; it != range.second;) { regcache *rc = it->second; rc->set_ptid (new_ptid); /* Remove old before inserting new, to avoid rehashing, which would invalidate iterators. */ it = regcaches.erase (it); regcaches.insert (std::make_pair (new_key, rc)); } > + > + /* Update the regcaches' ptid, insert them back in the map with an updated > + key. */ > + target_ptid new_key (target, new_ptid); > + for (regcache *rc : regcaches_to_update) > { > - if (regcache->ptid () == old_ptid && regcache->target () == target) > - regcache->set_ptid (new_ptid); > + rc->set_ptid (new_ptid); > + regcaches.insert (std::make_pair (new_key, rc)); > } > } > > @@ -438,19 +494,53 @@ regcache_thread_ptid_changed (process_stratum_target *target, > void > registers_changed_ptid (process_stratum_target *target, ptid_t ptid) > { > - for (auto oit = regcaches.before_begin (), it = std::next (oit); > - it != regcaches.end (); ) > + if (target == nullptr) > { > - struct regcache *regcache = *it; > - if ((target == nullptr || regcache->target () == target) > - && regcache->ptid ().matches (ptid)) > - { > - delete regcache; > - it = regcaches.erase_after (oit); > - } > - else > - oit = it++; > + /* Since there can be ptid clashes between targets, it's not valid to > + pass a ptid without saying to which target it belongs. */ > + gdb_assert (ptid == minus_one_ptid); > + > + /* Delete all the regcaches. */ > + for (auto pair : regcaches) > + delete pair.second; We could store a std::unique_ptr in the map instead to avoid all these manual deletes. > + > + regcaches.clear (); > } > + else if (ptid != minus_one_ptid) > + { > + /* Non-NULL target and non-minus_one_ptid, delete all regcaches belonging > + to this (TARGET, PTID). */ > + target_ptid key (target, ptid); > + auto range = regcaches.equal_range (key); > + for (auto it = range.first; it != range.second; it++) > + delete it->second; Like this one. > + > + regcaches.erase (key); Here as well could do: for (auto it = range.first; it != range.second;) { delete it->second; it = regcaches.erase (it); } > + } > + else > + { > + /* Non-NULL target and minus_one_ptid, delete all regcaches associated > + to this target. > + > + We unfortunately don't have an efficient way to do this. Fall back > + to iterating all items to find all those belonging to TARGET. > + > + Note that in C++11, it's not safe to erase map entries while > + iterating, so we keep track of them and delete them at the end. */ Here as well. > + std::vector keys; > + > + for (auto pair : regcaches) > + { > + if (pair.second->target () == target) > + { > + keys.push_back (pair.first); > + delete pair.second; > + } > + } > + > + for (auto key : keys) > + regcaches.erase (key); > + } > > if ((target == nullptr || current_thread_target == target) > && current_thread_ptid.matches (ptid)) > @@ -1431,13 +1521,6 @@ register_dump::dump (ui_file *file) > > namespace selftests { > > -static size_t > -regcaches_size () > -{ > - return std::distance (regcaches.begin (), > - regcaches.end ()); > -} > - > /* Wrapper around get_thread_arch_aspace_regcache that does some self checks. */ > > static void > @@ -1457,7 +1540,7 @@ static void > regcaches_test (void) > { > /* It is empty at the start. */ > - SELF_CHECK (regcaches_size () == 0); > + SELF_CHECK (regcaches.size () == 0); > > ptid_t ptid1 (1), ptid2 (2), ptid3 (3); > > @@ -1469,52 +1552,52 @@ regcaches_test (void) > test_get_thread_arch_aspace_regcache (&test_target1, ptid1, > target_gdbarch (), > NULL); > - SELF_CHECK (regcaches_size () == 1); > + SELF_CHECK (regcaches.size () == 1); > > /* Get regcache from (target1,ptid2), a new regcache is added to > REGCACHES. */ > test_get_thread_arch_aspace_regcache (&test_target1, ptid2, > target_gdbarch (), > NULL); > - SELF_CHECK (regcaches_size () == 2); > + SELF_CHECK (regcaches.size () == 2); > > /* Get regcache from (target1,ptid3), a new regcache is added to > REGCACHES. */ > test_get_thread_arch_aspace_regcache (&test_target1, ptid3, > target_gdbarch (), > NULL); > - SELF_CHECK (regcaches_size () == 3); > + SELF_CHECK (regcaches.size () == 3); > > /* Get regcache from (target1,ptid2) again, nothing is added to > REGCACHES. */ > test_get_thread_arch_aspace_regcache (&test_target1, ptid2, > target_gdbarch (), > NULL); > - SELF_CHECK (regcaches_size () == 3); > + SELF_CHECK (regcaches.size () == 3); > > /* Get regcache from (target2,ptid2), a new regcache is added to > REGCACHES, since this time we're using a different target. */ > test_get_thread_arch_aspace_regcache (&test_target2, ptid2, > target_gdbarch (), > NULL); > - SELF_CHECK (regcaches_size () == 4); > + SELF_CHECK (regcaches.size () == 4); > > /* Mark that (target1,ptid2) changed. The regcache of (target1, > ptid2) should be removed from REGCACHES. */ > registers_changed_ptid (&test_target1, ptid2); > - SELF_CHECK (regcaches_size () == 3); > + SELF_CHECK (regcaches.size () == 3); > > /* Get the regcache from (target2,ptid2) again, confirming the > registers_changed_ptid call above did not delete it. */ > test_get_thread_arch_aspace_regcache (&test_target2, ptid2, > target_gdbarch (), > NULL); > - SELF_CHECK (regcaches_size () == 3); > + SELF_CHECK (regcaches.size () == 3); > > /* Confirm that marking all regcaches of all targets as changed > clears REGCACHES. */ > registers_changed_ptid (nullptr, minus_one_ptid); > - SELF_CHECK (regcaches_size () == 0); > + SELF_CHECK (regcaches.size () == 0); > } > > class target_ops_no_register : public test_target_ops > @@ -1847,27 +1930,24 @@ regcache_thread_ptid_changed () > get_thread_arch_aspace_regcache (&target1.mock_target, old_ptid, arch, NULL); > get_thread_arch_aspace_regcache (&target2.mock_target, old_ptid, arch, NULL); > > - /* Return whether a regcache for (TARGET, PTID) exists in REGCACHES. */ > - auto regcache_exists = [] (process_stratum_target *target, ptid_t ptid) > + /* Return the count of regcaches for (TARGET, PTID) in REGCACHES. */ > + auto regcache_count = [] (process_stratum_target *target, ptid_t ptid) > { > - for (regcache *rc : regcaches) > - { > - if (rc->target () == target && rc->ptid () == ptid) > - return true; > - } > + target_ptid key (target, ptid); > > - return false; > + auto range = regcaches.equal_range (key); > + return std::distance (range.first, range.second); > }; > > - gdb_assert (regcaches_size () == 2); > - gdb_assert (regcache_exists (&target1.mock_target, old_ptid)); > - gdb_assert (regcache_exists (&target2.mock_target, old_ptid)); > + gdb_assert (regcaches.size () == 2); > + gdb_assert (regcache_count (&target1.mock_target, old_ptid) == 1); > + gdb_assert (regcache_count (&target2.mock_target, old_ptid) == 1); > > thread_change_ptid (&target1.mock_target, old_ptid, new_ptid); > > - gdb_assert (regcaches_size () == 2); > - gdb_assert (regcache_exists (&target1.mock_target, new_ptid)); > - gdb_assert (regcache_exists (&target2.mock_target, old_ptid)); > + gdb_assert (regcaches.size () == 2); > + gdb_assert (regcache_count (&target1.mock_target, new_ptid) == 1); > + gdb_assert (regcache_count (&target2.mock_target, old_ptid) == 1); > } > > } // namespace selftests > diff --git a/gdbsupport/ptid.h b/gdbsupport/ptid.h > index ef52d5551260..a528312bad5e 100644 > --- a/gdbsupport/ptid.h > +++ b/gdbsupport/ptid.h > @@ -32,6 +32,8 @@ > thread_stratum target that might want to sit on top. > */ > > +#include > + > class ptid_t > { > public: > @@ -143,6 +145,20 @@ class ptid_t > long m_tid; > }; > > +/* Functor to hash a ptid. */ > + > +struct hash_ptid > +{ > + size_t operator() (const ptid_t &ptid) const > + { > + std::hash long_hash; > + > + return (long_hash (ptid.pid ()) > + + long_hash (ptid.lwp ()) > + + long_hash (ptid.tid ())); > + } > +}; > + > /* The null or zero ptid, often used to indicate no process. */ > > extern const ptid_t null_ptid; > Thanks, Pedro Alves