From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 1F9C6382DCEB for ; Fri, 16 Sep 2022 14:49:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1F9C6382DCEB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1663339763; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=41LY3ZJo6TRWA41FXlG1oPLab5q1bri505WYc3pIVE0=; b=GZLpcGybJMGS0bonfe/ZlDqsGOLzxvXpaG6utU7Q48GUV9GwVmKKj3PJvOb+S2E6uhO9Q9 I9y3GZeaoGo+diszGu05D+V+OWkZVPjDkZmTcvIzvoCDNIPiSQjZ6pmfgC1JLfnbYTBGjs QefPHgHjke4Oc8AqQB9+qSvH+BQnGzg= Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-412-MJ1ME8WAN4yAccK8tL5zxA-1; Fri, 16 Sep 2022 10:49:21 -0400 X-MC-Unique: MJ1ME8WAN4yAccK8tL5zxA-1 Received: by mail-ed1-f72.google.com with SMTP id v11-20020a056402348b00b004516e0b7eedso12436492edc.8 for ; Fri, 16 Sep 2022 07:49:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date; bh=ZTPszIAz8hjSa1ON/blnrgwYmIpvViGqd/7UcRi43oM=; b=p5QBJp7sjBEZUo8kRqikBSAnFC0i0bVcygURjSaL24fTlXfAmr+BvDg6agTBe2RqV1 7DCJ33R95++mg/0YxyLZnR+MXtrOTYvD33cMHDc039+i3G4AUy4kyvft05cR5HGaahcn sas3I7D7cTeAYYNLzJgjcwvPRkzsBBO/NS2pT0yw8FI5pIhCBSToqx31x853YC2QzmqO m3gpy0eWMjtejP/8R+L6UpQ6knFkFCuQtgRx68Y0a20xW5gZ0pK/XNCeyorDgGf/yY6b KtepNP26xp7x1BMHtp2dRluATTKyauCFZjmTWC0j2BZZPoNTFOl1vFfCctDPVbBos8RD JxWw== X-Gm-Message-State: ACrzQf0AxgtM+mYflc19hB5GbenVmOUmn4UE4vl8sUBzURRHYdPMt48D 7Ba1Eo1NW3QfALi8hVdtj1CU4ybA22UBgGdQcApf5OavaOci0MbGxsxuReayvs00EcnUjMlhkB0 6eWM6pzqhUSBvpQq5Cw== X-Received: by 2002:a17:907:70a:b0:750:bf91:caa3 with SMTP id xb10-20020a170907070a00b00750bf91caa3mr3875241ejb.711.1663339759932; Fri, 16 Sep 2022 07:49:19 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4g0kRAveVnBqZxLn7GlSVVaJAxWYH9PopDO/p0T7c2dk9EyBsGu7ygAIHRGIxzvPw3iq0MaQ== X-Received: by 2002:a17:907:70a:b0:750:bf91:caa3 with SMTP id xb10-20020a170907070a00b00750bf91caa3mr3875189ejb.711.1663339759055; Fri, 16 Sep 2022 07:49:19 -0700 (PDT) Received: from [10.9.7.96] (vpn-konference.ms.mff.cuni.cz. [195.113.20.101]) by smtp.gmail.com with ESMTPSA id 9-20020a170906310900b00779cde476e4sm10462118ejx.62.2022.09.16.07.49.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 16 Sep 2022 07:49:18 -0700 (PDT) Message-ID: <0f94cd2c-53e3-df07-ecaf-091266731639@redhat.com> Date: Fri, 16 Sep 2022 16:49:17 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.13.0 Subject: Re: [PATCH v4] eliminate mutex in fast path of __register_frame To: Thomas Neumann , gcc-patches@gcc.gnu.org Cc: Florian Weimer , Jakub Jelinek , Jonathan Wakely , Thomas Rodgers References: <2a4776b9-9271-bb3c-a626-d5ec22dae6f3@in.tum.de> From: Jason Merrill In-Reply-To: <2a4776b9-9271-bb3c-a626-d5ec22dae6f3@in.tum.de> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-25.3 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_INVALID,DKIM_SIGNED,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 9/16/22 06:19, Thomas Neumann wrote: > The __register_frame/__deregister_frame functions are used to register > unwinding frames from JITed code in a sorted list. That list itself > is protected by object_mutex, which leads to terrible performance > in multi-threaded code and is somewhat expensive even if single-threaded. > There was already a fast-path that avoided taking the mutex if no > frame was registered at all. > > This commit eliminates both the mutex and the sorted list from > the atomic fast path, and replaces it with a btree that uses > optimistic lock coupling during lookup. This allows for fully parallel > unwinding and is essential to scale exception handling to large > core counts. > > Changes since v3: > - Avoid code duplication by adding query mode to classify_object_over_fdes > - Adjust all comments as requested > > libgcc/ChangeLog: > >         * unwind-dw2-fde.c (release_registered_frames): Cleanup at > shutdown. >         (__register_frame_info_table_bases): Use btree in atomic fast > path. >         (__deregister_frame_info_bases): Likewise. >         (_Unwind_Find_FDE): Likewise. >         (base_from_object): Make parameter const. >         (classify_object_over_fdes): Add query-only mode. >         (get_pc_range): Compute PC range for lookup. >         * unwind-dw2-fde.h (last_fde): Make parameter const. >         * unwind-dw2-btree.h: New file. > --- >  libgcc/unwind-dw2-btree.h | 953 ++++++++++++++++++++++++++++++++++++++ >  libgcc/unwind-dw2-fde.c   | 195 ++++++-- >  libgcc/unwind-dw2-fde.h   |   2 +- >  3 files changed, 1098 insertions(+), 52 deletions(-) >  create mode 100644 libgcc/unwind-dw2-btree.h > > diff --git a/libgcc/unwind-dw2-btree.h b/libgcc/unwind-dw2-btree.h > new file mode 100644 > index 00000000000..8853f0eab48 > --- /dev/null > +++ b/libgcc/unwind-dw2-btree.h > @@ -0,0 +1,953 @@ > +/* Lock-free btree for manually registered unwind frames.  */ > +/* Copyright (C) 2022 Free Software Foundation, Inc. > +   Contributed by Thomas Neumann > + > +This file is part of GCC. > + > +GCC is free software; you can redistribute it and/or modify it under > +the terms of the GNU General Public License as published by the Free > +Software Foundation; either version 3, or (at your option) any later > +version. > + > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY > +WARRANTY; without even the implied warranty of MERCHANTABILITY or > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License > +for more details. > + > +Under Section 7 of GPL version 3, you are granted additional > +permissions described in the GCC Runtime Library Exception, version > +3.1, as published by the Free Software Foundation. > + > +You should have received a copy of the GNU General Public License and > +a copy of the GCC Runtime Library Exception along with this program; > +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see > +.  */ > + > +#ifndef GCC_UNWIND_DW2_BTREE_H > +#define GCC_UNWIND_DW2_BTREE_H > + > +#include > + > +// Common logic for version locks. > +struct version_lock > +{ > +  // The lock itself. The lowest bit indicates an exclusive lock, > +  // the second bit indicates waiting threads. All other bits are > +  // used as counter to recognize changes. > +  // Overflows are okay here, we must only prevent overflow to the > +  // same value within one lock_optimistic/validate > +  // range. Even on 32 bit platforms that would require 1 billion > +  // frame registrations within the time span of a few assembler > +  // instructions. > +  uintptr_t version_lock; > +}; > + > +#ifdef __GTHREAD_HAS_COND > +// We should never get contention within the tree as it rarely changes. > +// But if we ever do get contention we use these for waiting. > +static __gthread_mutex_t version_lock_mutex = __GTHREAD_MUTEX_INIT; > +static __gthread_cond_t version_lock_cond = __GTHREAD_COND_INIT; > +#endif > + > +// Initialize in locked state. > +static inline void > +version_lock_initialize_locked_exclusive (struct version_lock *vl) > +{ > +  vl->version_lock = 1; > +} > + > +// Try to lock the node exclusive. > +static inline bool > +version_lock_try_lock_exclusive (struct version_lock *vl) > +{ > +  uintptr_t state = __atomic_load_n (&(vl->version_lock), > __ATOMIC_SEQ_CST); > +  if (state & 1) > +    return false; > +  return __atomic_compare_exchange_n (&(vl->version_lock), &state, > state | 1, > +                      false, __ATOMIC_SEQ_CST, > +                      __ATOMIC_SEQ_CST); > +} > + > +// Lock the node exclusive, blocking as needed. > +static void > +version_lock_lock_exclusive (struct version_lock *vl) > +{ > +#ifndef __GTHREAD_HAS_COND > +restart: > +#endif > + > +  // We should virtually never get contention here, as frame > +  // changes are rare. > +  uintptr_t state = __atomic_load_n (&(vl->version_lock), > __ATOMIC_SEQ_CST); > +  if (!(state & 1)) > +    { > +      if (__atomic_compare_exchange_n (&(vl->version_lock), &state, > state | 1, > +                       false, __ATOMIC_SEQ_CST, > +                       __ATOMIC_SEQ_CST)) > +    return; > +    } > + > +    // We did get contention, wait properly. > +#ifdef __GTHREAD_HAS_COND > +  __gthread_mutex_lock (&version_lock_mutex); > +  state = __atomic_load_n (&(vl->version_lock), __ATOMIC_SEQ_CST); > +  while (true) > +    { > +      // Check if the lock is still held. > +      if (!(state & 1)) > +    { > +      if (__atomic_compare_exchange_n (&(vl->version_lock), &state, > +                       state | 1, false, __ATOMIC_SEQ_CST, > +                       __ATOMIC_SEQ_CST)) > +        { > +          __gthread_mutex_unlock (&version_lock_mutex); > +          return; > +        } > +      else > +        { > +          continue; > +        } > +    } > + > +      // Register waiting thread. > +      if (!(state & 2)) > +    { > +      if (!__atomic_compare_exchange_n (&(vl->version_lock), &state, > +                        state | 2, false, __ATOMIC_SEQ_CST, > +                        __ATOMIC_SEQ_CST)) > +        continue; > +    } > + > +      // And sleep. > +      __gthread_cond_wait (&version_lock_cond, &version_lock_mutex); > +      state = __atomic_load_n (&(vl->version_lock), __ATOMIC_SEQ_CST); > +    } > +#else > +  // Spin if we do not have condition variables available. > +  // We expect no contention here, spinning should be okay. > +  goto restart; > +#endif > +} > + > +// Release a locked node and increase the version lock. > +static void > +version_lock_unlock_exclusive (struct version_lock *vl) > +{ > +  // increase version, reset exclusive lock bits > +  uintptr_t state = __atomic_load_n (&(vl->version_lock), > __ATOMIC_SEQ_CST); > +  uintptr_t ns = (state + 4) & (~((uintptr_t) 3)); > +  state = __atomic_exchange_n (&(vl->version_lock), ns, __ATOMIC_SEQ_CST); > + > +#ifdef __GTHREAD_HAS_COND > +  if (state & 2) > +    { > +      // Wake up waiting threads. This should be extremely rare. > +      __gthread_mutex_lock (&version_lock_mutex); > +      __gthread_cond_broadcast (&version_lock_cond); > +      __gthread_mutex_unlock (&version_lock_mutex); > +    } > +#endif > +} > + > +// Acquire an optimistic "lock". Note that this does not lock at all, it > +// only allows for validation later. > +static inline bool > +version_lock_lock_optimistic (const struct version_lock *vl, uintptr_t > *lock) > +{ > +  uintptr_t state = __atomic_load_n (&(vl->version_lock), > __ATOMIC_SEQ_CST); > +  *lock = state; > + > +  // Acquiring the lock fails when there is currently an exclusive lock. > +  return !(state & 1); > +} > + > +// Validate a previously acquired "lock". > +static inline bool > +version_lock_validate (const struct version_lock *vl, uintptr_t lock) > +{ > +  // Prevent the reordering of non-atomic loads behind the atomic load. > +  // Hans Boehm, Can Seqlocks Get Along with Programming Language Memory > +  // Models?, Section 4. > +  __atomic_thread_fence (__ATOMIC_ACQUIRE); > + > +  // Check that the node is still in the same state. > +  uintptr_t state = __atomic_load_n (&(vl->version_lock), > __ATOMIC_SEQ_CST); > +  return (state == lock); > +} > + > +// The largest possible separator value. > +static const uintptr_t max_separator = ~((uintptr_t) (0)); > + > +struct btree_node; > + > +// Inner entry. The child tree contains all entries <= separator. > +struct inner_entry > +{ > +  uintptr_t separator; > +  struct btree_node *child; > +}; > + > +// Leaf entry. Stores an object entry. > +struct leaf_entry > +{ > +  uintptr_t base, size; > +  struct object *ob; > +}; > + > +// Node types. > +enum node_type > +{ > +  btree_node_inner, > +  btree_node_leaf, > +  btree_node_free > +}; > + > +// Node sizes. Chosen such that the result size is roughly 256 bytes. > +#define max_fanout_inner 15 > +#define max_fanout_leaf 10 > + > +// A btree node. > +struct btree_node > +{ > +  // The version lock used for optimistic lock coupling. > +  struct version_lock version_lock; > +  // The number of entries. > +  unsigned entry_count; > +  // The type. > +  enum node_type type; > +  // The payload. > +  union > +  { > +    // The inner nodes have fence keys, i.e., the right-most entry > includes a > +    // separator. > +    struct inner_entry children[max_fanout_inner]; > +    struct leaf_entry entries[max_fanout_leaf]; > +  } content; > +}; > + > +// Is an inner node? > +static inline bool > +btree_node_is_inner (const struct btree_node *n) > +{ > +  return n->type == btree_node_inner; > +} > + > +// Is a leaf node? > +static inline bool > +btree_node_is_leaf (const struct btree_node *n) > +{ > +  return n->type == btree_node_leaf; > +} > + > +// Should the node be merged? > +static inline bool > +btree_node_needs_merge (const struct btree_node *n) > +{ > +  return n->entry_count < (btree_node_is_inner (n) ? (max_fanout_inner > / 2) > +                           : (max_fanout_leaf / 2)); > +} > + > +// Get the fence key for inner nodes. > +static inline uintptr_t > +btree_node_get_fence_key (const struct btree_node *n) > +{ > +  // For inner nodes we just return our right-most entry. > +  return n->content.children[n->entry_count - 1].separator; > +} > + > +// Find the position for a slot in an inner node. > +static unsigned > +btree_node_find_inner_slot (const struct btree_node *n, uintptr_t value) > +{ > +  for (unsigned index = 0, ec = n->entry_count; index != ec; ++index) > +    if (n->content.children[index].separator >= value) > +      return index; > +  return n->entry_count; > +} > + > +// Find the position for a slot in a leaf node. > +static unsigned > +btree_node_find_leaf_slot (const struct btree_node *n, uintptr_t value) > +{ > +  for (unsigned index = 0, ec = n->entry_count; index != ec; ++index) > +    if (n->content.entries[index].base + n->content.entries[index].size > > value) > +      return index; > +  return n->entry_count; > +} > + > +// Try to lock the node exclusive. > +static inline bool > +btree_node_try_lock_exclusive (struct btree_node *n) > +{ > +  return version_lock_try_lock_exclusive (&(n->version_lock)); > +} > + > +// Lock the node exclusive, blocking as needed. > +static inline void > +btree_node_lock_exclusive (struct btree_node *n) > +{ > +  version_lock_lock_exclusive (&(n->version_lock)); > +} > + > +// Release a locked node and increase the version lock. > +static inline void > +btree_node_unlock_exclusive (struct btree_node *n) > +{ > +  version_lock_unlock_exclusive (&(n->version_lock)); > +} > + > +// Acquire an optimistic "lock". Note that this does not lock at all, it > +// only allows for validation later. > +static inline bool > +btree_node_lock_optimistic (const struct btree_node *n, uintptr_t *lock) > +{ > +  return version_lock_lock_optimistic (&(n->version_lock), lock); > +} > + > +// Validate a previously acquire lock. > +static inline bool > +btree_node_validate (const struct btree_node *n, uintptr_t lock) > +{ > +  return version_lock_validate (&(n->version_lock), lock); > +} > + > +// Insert a new separator after splitting. > +static void > +btree_node_update_separator_after_split (struct btree_node *n, > +                     uintptr_t old_separator, > +                     uintptr_t new_separator, > +                     struct btree_node *new_right) > +{ > +  unsigned slot = btree_node_find_inner_slot (n, old_separator); > +  for (unsigned index = n->entry_count; index > slot; --index) > +    n->content.children[index] = n->content.children[index - 1]; > +  n->content.children[slot].separator = new_separator; > +  n->content.children[slot + 1].child = new_right; > +  n->entry_count++; > +} > + > +// A btree. Suitable for static initialization, all members are zero at > the > +// beginning. > +struct btree > +{ > +  // The root of the btree. > +  struct btree_node *root; > +  // The free list of released node. > +  struct btree_node *free_list; > +  // The version lock used to protect the root. > +  struct version_lock root_lock; > +}; > + > +// Initialize a btree. Not actually used, just for exposition. > +static inline void > +btree_init (struct btree *t) > +{ > +  t->root = NULL; > +  t->free_list = NULL; > +  t->root_lock.version_lock = 0; > +}; > + > +static void > +btree_release_tree_recursively (struct btree *t, struct btree_node *n); > + > +// Destroy a tree and release all nodes. > +static void > +btree_destroy (struct btree *t) > +{ > +  // Disable the mechanism before cleaning up. > +  struct btree_node *old_root > +    = __atomic_exchange_n (&(t->root), NULL, __ATOMIC_SEQ_CST); > +  if (old_root) > +    btree_release_tree_recursively (t, old_root); > + > +  // Release all free nodes. > +  while (t->free_list) > +    { > +      struct btree_node *next = t->free_list->content.children[0].child; > +      free (t->free_list); > +      t->free_list = next; > +    } > +} > + > +// Allocate a node. This node will be returned in locked exclusive state. > +static struct btree_node * > +btree_allocate_node (struct btree *t, bool inner) > +{ > +  while (true) > +    { > +      // Try the free list first. > +      struct btree_node *next_free > +    = __atomic_load_n (&(t->free_list), __ATOMIC_SEQ_CST); > +      if (next_free) > +    { > +      if (!btree_node_try_lock_exclusive (next_free)) > +        continue; > +      // The node might no longer be free, check that again after > acquiring > +      // the exclusive lock. > +      if (next_free->type == btree_node_free) > +        { > +          struct btree_node *ex = next_free; > +          if (__atomic_compare_exchange_n ( > +            &(t->free_list), &ex, next_free->content.children[0].child, > +            false, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST)) > +        { > +          next_free->entry_count = 0; > +          next_free->type = inner ? btree_node_inner : btree_node_leaf; > +          return next_free; > +        } > +        } > +      btree_node_unlock_exclusive (next_free); > +      continue; > +    } > + > +      // No free node available, allocate a new one. > +      struct btree_node *new_node > +    = (struct btree_node *) (malloc (sizeof (struct btree_node))); > +      version_lock_initialize_locked_exclusive ( > +    &(new_node->version_lock)); // initialize the node in locked state. > +      new_node->entry_count = 0; > +      new_node->type = inner ? btree_node_inner : btree_node_leaf; > +      return new_node; > +    } > +} > + > +// Release a node. This node must be currently locked exclusively and will > +// be placed in the free list. > +static void > +btree_release_node (struct btree *t, struct btree_node *node) > +{ > +  // We cannot release the memory immediately because there might still be > +  // concurrent readers on that node. Put it in the free list instead. > +  node->type = btree_node_free; > +  struct btree_node *next_free > +    = __atomic_load_n (&(t->free_list), __ATOMIC_SEQ_CST); > +  do > +    { > +      node->content.children[0].child = next_free; > +  } while (!__atomic_compare_exchange_n (&(t->free_list), &next_free, > node, > +                     false, __ATOMIC_SEQ_CST, > +                     __ATOMIC_SEQ_CST)); > +  btree_node_unlock_exclusive (node); > +} > + > +// Recursively release a tree. The btree is by design very shallow, thus > +// we can risk recursion here. > +static void > +btree_release_tree_recursively (struct btree *t, struct btree_node *node) > +{ > +  btree_node_lock_exclusive (node); > +  if (btree_node_is_inner (node)) > +    { > +      for (unsigned index = 0; index < node->entry_count; ++index) > +    btree_release_tree_recursively (t, > node->content.children[index].child); > +    } > +  btree_release_node (t, node); > +} > + > +// Check if we are splitting the root. > +static void > +btree_handle_root_split (struct btree *t, struct btree_node **node, > +             struct btree_node **parent) > +{ > +  // We want to keep the root pointer stable to allow for contention > +  // free reads. Thus, we split the root by first moving the content > +  // of the root node to a new node, and then split that new node. > +  if (!*parent) > +    { > +      // Allocate a new node, this guarantees us that we will have a > parent > +      // afterwards. > +      struct btree_node *new_node > +    = btree_allocate_node (t, btree_node_is_inner (*node)); > +      struct btree_node *old_node = *node; > +      new_node->entry_count = old_node->entry_count; > +      new_node->content = old_node->content; > +      old_node->content.children[0].separator = max_separator; > +      old_node->content.children[0].child = new_node; > +      old_node->entry_count = 1; > +      old_node->type = btree_node_inner; > + > +      *parent = old_node; > +      *node = new_node; > +    } > +} > + > +// Split an inner node. > +static void > +btree_split_inner (struct btree *t, struct btree_node **inner, > +           struct btree_node **parent, uintptr_t target) > +{ > +  // Check for the root. > +  btree_handle_root_split (t, inner, parent); > + > +  // Create two inner node. > +  uintptr_t right_fence = btree_node_get_fence_key (*inner); > +  struct btree_node *left_inner = *inner; > +  struct btree_node *right_inner = btree_allocate_node (t, true); > +  unsigned split = left_inner->entry_count / 2; > +  right_inner->entry_count = left_inner->entry_count - split; > +  for (unsigned index = 0; index < right_inner->entry_count; ++index) > +    right_inner->content.children[index] > +      = left_inner->content.children[split + index]; > +  left_inner->entry_count = split; > +  uintptr_t left_fence = btree_node_get_fence_key (left_inner); > +  btree_node_update_separator_after_split (*parent, right_fence, > left_fence, > +                       right_inner); > +  if (target <= left_fence) > +    { > +      *inner = left_inner; > +      btree_node_unlock_exclusive (right_inner); > +    } > +  else > +    { > +      *inner = right_inner; > +      btree_node_unlock_exclusive (left_inner); > +    } > +} > + > +// Split a leaf node. > +static void > +btree_split_leaf (struct btree *t, struct btree_node **leaf, > +          struct btree_node **parent, uintptr_t fence, uintptr_t target) > +{ > +  // Check for the root. > +  btree_handle_root_split (t, leaf, parent); > + > +  // Create two leaf nodes. > +  uintptr_t right_fence = fence; > +  struct btree_node *left_leaf = *leaf; > +  struct btree_node *right_leaf = btree_allocate_node (t, false); > +  unsigned split = left_leaf->entry_count / 2; > +  right_leaf->entry_count = left_leaf->entry_count - split; > +  for (unsigned index = 0; index != right_leaf->entry_count; ++index) > +    right_leaf->content.entries[index] > +      = left_leaf->content.entries[split + index]; > +  left_leaf->entry_count = split; > +  uintptr_t left_fence = right_leaf->content.entries[0].base - 1; > +  btree_node_update_separator_after_split (*parent, right_fence, > left_fence, > +                       right_leaf); > +  if (target <= left_fence) > +    { > +      *leaf = left_leaf; > +      btree_node_unlock_exclusive (right_leaf); > +    } > +  else > +    { > +      *leaf = right_leaf; > +      btree_node_unlock_exclusive (left_leaf); > +    } > +} > + > +// Merge (or balance) child nodes. > +static struct btree_node * > +btree_merge_node (struct btree *t, unsigned child_slot, > +          struct btree_node *parent, uintptr_t target) > +{ > +  // Choose the emptiest neighbor and lock both. The target child is > already > +  // locked. > +  unsigned left_slot; > +  struct btree_node *left_node, *right_node; > +  if ((child_slot == 0) > +      || (((child_slot + 1) < parent->entry_count) > +      && (parent->content.children[child_slot + 1].child->entry_count > +          < parent->content.children[child_slot - 1].child->entry_count))) > +    { > +      left_slot = child_slot; > +      left_node = parent->content.children[left_slot].child; > +      right_node = parent->content.children[left_slot + 1].child; > +      btree_node_lock_exclusive (right_node); > +    } > +  else > +    { > +      left_slot = child_slot - 1; > +      left_node = parent->content.children[left_slot].child; > +      right_node = parent->content.children[left_slot + 1].child; > +      btree_node_lock_exclusive (left_node); > +    } > + > +  // Can we merge both nodes into one node? > +  unsigned total_count = left_node->entry_count + right_node->entry_count; > +  unsigned max_count > +    = btree_node_is_inner (left_node) ? max_fanout_inner : > max_fanout_leaf; > +  if (total_count <= max_count) > +    { > +      // Merge into the parent? > +      if (parent->entry_count == 2) > +    { > +      // Merge children into parent. This can only happen at the root. > +      if (btree_node_is_inner (left_node)) > +        { > +          for (unsigned index = 0; index != left_node->entry_count; > ++index) > +        parent->content.children[index] > +          = left_node->content.children[index]; > +          for (unsigned index = 0; index != right_node->entry_count; > +           ++index) > +        parent->content.children[index + left_node->entry_count] > +          = right_node->content.children[index]; > +        } > +      else > +        { > +          parent->type = btree_node_leaf; > +          for (unsigned index = 0; index != left_node->entry_count; > ++index) > +        parent->content.entries[index] > +          = left_node->content.entries[index]; > +          for (unsigned index = 0; index != right_node->entry_count; > +           ++index) > +        parent->content.entries[index + left_node->entry_count] > +          = right_node->content.entries[index]; > +        } > +      parent->entry_count = total_count; > +      btree_release_node (t, left_node); > +      btree_release_node (t, right_node); > +      return parent; > +    } > +      else > +    { > +      // Regular merge. > +      if (btree_node_is_inner (left_node)) > +        { > +          for (unsigned index = 0; index != right_node->entry_count; > +           ++index) > +        left_node->content.children[left_node->entry_count++] > +          = right_node->content.children[index]; > +        } > +      else > +        { > +          for (unsigned index = 0; index != right_node->entry_count; > +           ++index) > +        left_node->content.entries[left_node->entry_count++] > +          = right_node->content.entries[index]; > +        } > +      parent->content.children[left_slot].separator > +        = parent->content.children[left_slot + 1].separator; > +      for (unsigned index = left_slot + 1; index + 1 < > parent->entry_count; > +           ++index) > +        parent->content.children[index] > +          = parent->content.children[index + 1]; > +      parent->entry_count--; > +      btree_release_node (t, right_node); > +      btree_node_unlock_exclusive (parent); > +      return left_node; > +    } > +    } > + > +  // No merge possible, rebalance instead. > +  if (left_node->entry_count > right_node->entry_count) > +    { > +      // Shift from left to right. > +      unsigned to_shift > +    = (left_node->entry_count - right_node->entry_count) / 2; > +      if (btree_node_is_inner (left_node)) > +    { > +      for (unsigned index = 0; index != right_node->entry_count; ++index) > +        { > +          unsigned pos = right_node->entry_count - 1 - index; > +          right_node->content.children[pos + to_shift] > +        = right_node->content.children[pos]; > +        } > +      for (unsigned index = 0; index != to_shift; ++index) > +        right_node->content.children[index] > +          = left_node->content > +          .children[left_node->entry_count - to_shift + index]; > +    } > +      else > +    { > +      for (unsigned index = 0; index != right_node->entry_count; ++index) > +        { > +          unsigned pos = right_node->entry_count - 1 - index; > +          right_node->content.entries[pos + to_shift] > +        = right_node->content.entries[pos]; > +        } > +      for (unsigned index = 0; index != to_shift; ++index) > +        right_node->content.entries[index] > +          = left_node->content > +          .entries[left_node->entry_count - to_shift + index]; > +    } > +      left_node->entry_count -= to_shift; > +      right_node->entry_count += to_shift; > +    } > +  else > +    { > +      // Shift from right to left. > +      unsigned to_shift > +    = (right_node->entry_count - left_node->entry_count) / 2; > +      if (btree_node_is_inner (left_node)) > +    { > +      for (unsigned index = 0; index != to_shift; ++index) > +        left_node->content.children[left_node->entry_count + index] > +          = right_node->content.children[index]; > +      for (unsigned index = 0; index != right_node->entry_count - > to_shift; > +           ++index) > +        right_node->content.children[index] > +          = right_node->content.children[index + to_shift]; > +    } > +      else > +    { > +      for (unsigned index = 0; index != to_shift; ++index) > +        left_node->content.entries[left_node->entry_count + index] > +          = right_node->content.entries[index]; > +      for (unsigned index = 0; index != right_node->entry_count - > to_shift; > +           ++index) > +        right_node->content.entries[index] > +          = right_node->content.entries[index + to_shift]; > +    } > +      left_node->entry_count += to_shift; > +      right_node->entry_count -= to_shift; > +    } > +  uintptr_t left_fence; > +  if (btree_node_is_leaf (left_node)) > +    { > +      left_fence = right_node->content.entries[0].base - 1; > +    } > +  else > +    { > +      left_fence = btree_node_get_fence_key (left_node); > +    } > +  parent->content.children[left_slot].separator = left_fence; > +  btree_node_unlock_exclusive (parent); > +  if (target <= left_fence) > +    { > +      btree_node_unlock_exclusive (right_node); > +      return left_node; > +    } > +  else > +    { > +      btree_node_unlock_exclusive (left_node); > +      return right_node; > +    } > +} > + > +// Insert an entry. > +static bool > +btree_insert (struct btree *t, uintptr_t base, uintptr_t size, > +          struct object *ob) > +{ > +  // Sanity check. > +  if (!size) > +    return false; > + > +  // Access the root. > +  struct btree_node *iter, *parent = NULL; > +  { > +    version_lock_lock_exclusive (&(t->root_lock)); > +    iter = t->root; > +    if (iter) > +      { > +    btree_node_lock_exclusive (iter); > +      } > +    else > +      { > +    t->root = iter = btree_allocate_node (t, false); > +      } > +    version_lock_unlock_exclusive (&(t->root_lock)); > +  } > + > +  // Walk down the btree with classic lock coupling and eager splits. > +  // Strictly speaking this is not performance optimal, we could use > +  // optimistic lock coupling until we hit a node that has to be modified. > +  // But that is more difficult to implement and frame registration is > +  // rare anyway, we use simple locking for now. > + > +  uintptr_t fence = max_separator; > +  while (btree_node_is_inner (iter)) > +    { > +      // Use eager splits to avoid lock coupling up. > +      if (iter->entry_count == max_fanout_inner) > +    btree_split_inner (t, &iter, &parent, base); > + > +      unsigned slot = btree_node_find_inner_slot (iter, base); > +      if (parent) > +    btree_node_unlock_exclusive (parent); > +      parent = iter; > +      fence = iter->content.children[slot].separator; > +      iter = iter->content.children[slot].child; > +      btree_node_lock_exclusive (iter); > +    } > + > +  // Make sure we have space. > +  if (iter->entry_count == max_fanout_leaf) > +    btree_split_leaf (t, &iter, &parent, fence, base); > +  if (parent) > +    btree_node_unlock_exclusive (parent); > + > +  // Insert in node. > +  unsigned slot = btree_node_find_leaf_slot (iter, base); > +  if ((slot < iter->entry_count) && (iter->content.entries[slot].base > == base)) > +    { > +      // Duplicate entry, this should never happen. > +      btree_node_unlock_exclusive (iter); > +      return false; > +    } > +  for (unsigned index = iter->entry_count; index > slot; --index) > +    iter->content.entries[index] = iter->content.entries[index - 1]; > +  struct leaf_entry *e = &(iter->content.entries[slot]); > +  e->base = base; > +  e->size = size; > +  e->ob = ob; > +  iter->entry_count++; > +  btree_node_unlock_exclusive (iter); > +  return true; > +} > + > +// Remove an entry. > +static struct object * > +btree_remove (struct btree *t, uintptr_t base) > +{ > +  // Access the root. > +  version_lock_lock_exclusive (&(t->root_lock)); > +  struct btree_node *iter = t->root; > +  if (iter) > +    btree_node_lock_exclusive (iter); > +  version_lock_unlock_exclusive (&(t->root_lock)); > +  if (!iter) > +    return NULL; > + > +  // Same strategy as with insert, walk down with lock coupling and > +  // merge eagerly. > +  while (btree_node_is_inner (iter)) > +    { > +      unsigned slot = btree_node_find_inner_slot (iter, base); > +      struct btree_node *next = iter->content.children[slot].child; > +      btree_node_lock_exclusive (next); > +      if (btree_node_needs_merge (next)) > +    { > +      // Use eager merges to avoid lock coupling up. > +      iter = btree_merge_node (t, slot, iter, base); > +    } > +      else > +    { > +      btree_node_unlock_exclusive (iter); > +      iter = next; > +    } > +    } > + > +  // Remove existing entry. > +  unsigned slot = btree_node_find_leaf_slot (iter, base); > +  if ((slot >= iter->entry_count) || (iter->content.entries[slot].base > != base)) > +    { > +      // Not found, this should never happen. > +      btree_node_unlock_exclusive (iter); > +      return NULL; > +    } > +  struct object *ob = iter->content.entries[slot].ob; > +  for (unsigned index = slot; index + 1 < iter->entry_count; ++index) > +    iter->content.entries[index] = iter->content.entries[index + 1]; > +  iter->entry_count--; > +  btree_node_unlock_exclusive (iter); > +  return ob; > +} > + > +// Find the corresponding entry for the given address. > +static struct object * > +btree_lookup (const struct btree *t, uintptr_t target_addr) > +{ > +  // Within this function many loads are relaxed atomic loads. > +  // Use a macro to keep the code reasonable. > +#define RLOAD(x) __atomic_load_n (&(x), __ATOMIC_RELAXED) > + > +  // For targets where unwind info is usually not registered through these > +  // APIs anymore, avoid any sequential consistent atomics. > +  // Use relaxed MO here, it is up to the app to ensure that the library > +  // loading/initialization happens-before using that library in other > +  // threads (in particular unwinding with that library's functions > +  // appearing in the backtraces).  Calling that library's functions > +  // without waiting for the library to initialize would be racy. > +  if (__builtin_expect (!RLOAD (t->root), 1)) > +    return NULL; > + > +  // The unwinding tables are mostly static, they only change when > +  // frames are added or removed. This makes it extremely unlikely that > they > +  // change during a given unwinding sequence. Thus, we optimize for the > +  // contention free case and use optimistic lock coupling. This does not > +  // require any writes to shared state, instead we validate every > read. It is > +  // important that we do not trust any value that we have read until > we call > +  // validate again. Data can change at arbitrary points in time, thus > we always > +  // copy something into a local variable and validate again before > acting on > +  // the read. In the unlikely event that we encounter a concurrent > change we > +  // simply restart and try again. > + > +restart: > +  struct btree_node *iter; > +  uintptr_t lock; > +  { > +    // Accessing the root node requires defending against concurrent > pointer > +    // changes Thus we couple rootLock -> lock on root node -> validate > rootLock > +    if (!version_lock_lock_optimistic (&(t->root_lock), &lock)) > +      goto restart; > +    iter = RLOAD (t->root); > +    if (!version_lock_validate (&(t->root_lock), lock)) > +      goto restart; > +    if (!iter) > +      return NULL; > +    uintptr_t child_lock; > +    if ((!btree_node_lock_optimistic (iter, &child_lock)) > +    || (!version_lock_validate (&(t->root_lock), lock))) > +      goto restart; > +    lock = child_lock; > +  } > + > +  // Now we can walk down towards the right leaf node. > +  while (true) > +    { > +      enum node_type type = RLOAD (iter->type); > +      unsigned entry_count = RLOAD (iter->entry_count); > +      if (!btree_node_validate (iter, lock)) > +    goto restart; > +      if (!entry_count) > +    return NULL; > + > +      if (type == btree_node_inner) > +    { > +      // We cannot call find_inner_slot here because we need (relaxed) > +      // atomic reads here. > +      unsigned slot = 0; > +      while ( > +        ((slot + 1) < entry_count) > +        && (RLOAD (iter->content.children[slot].separator) < target_addr)) > +        ++slot; > +      struct btree_node *child = RLOAD > (iter->content.children[slot].child); > +      if (!btree_node_validate (iter, lock)) > +        goto restart; > + > +      // The node content can change at any point in time, thus we must > +      // interleave parent and child checks. > +      uintptr_t child_lock; > +      if (!btree_node_lock_optimistic (child, &child_lock)) > +        goto restart; > +      if (!btree_node_validate (iter, lock)) > +        goto restart; // make sure we still point to the correct node > after > +              // acquiring the optimistic lock. > + > +      // Go down > +      iter = child; > +      lock = child_lock; > +    } > +      else > +    { > +      // We cannot call find_leaf_slot here because we need (relaxed) > +      // atomic reads here. > +      unsigned slot = 0; > +      while (((slot + 1) < entry_count) > +         && (RLOAD (iter->content.entries[slot].base) > +               + RLOAD (iter->content.entries[slot].size) > +             <= target_addr)) > +        ++slot; > +      struct leaf_entry entry; > +      entry.base = RLOAD (iter->content.entries[slot].base); > +      entry.size = RLOAD (iter->content.entries[slot].size); > +      entry.ob = RLOAD (iter->content.entries[slot].ob); > +      if (!btree_node_validate (iter, lock)) > +        goto restart; > + > +      // Check if we have a hit. > +      if ((entry.base <= target_addr) > +          && (target_addr < entry.base + entry.size)) > +        { > +          return entry.ob; > +        } > +      return NULL; > +    } > +    } > +#undef RLOAD > +} > + > +#endif /* unwind-dw2-btree.h */ > diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c > index 8ee55be5675..1165be0c6df 100644 > --- a/libgcc/unwind-dw2-fde.c > +++ b/libgcc/unwind-dw2-fde.c > @@ -42,15 +42,34 @@ see the files COPYING3 and COPYING.RUNTIME > respectively.  If not, see >  #endif >  #endif > > +#ifdef ATOMIC_FDE_FAST_PATH > +#include "unwind-dw2-btree.h" > + > +static struct btree registered_frames; > + > +static void > +release_registered_frames (void) __attribute__ ((destructor (110))); > +static void > +release_registered_frames (void) > +{ > +  /* Release the b-tree and all frames. Frame releases that happen > later are > +   * silently ignored */ > +  btree_destroy (®istered_frames); > +} > + > +static void > +get_pc_range (const struct object *ob, uintptr_t *range); > +static void > +init_object (struct object *ob); > + > +#else > + >  /* The unseen_objects list contains objects that have been registered >     but not yet categorized in any way.  The seen_objects list has had >     its pc_begin and count fields initialized at minimum, and is sorted >     by decreasing value of pc_begin.  */ >  static struct object *unseen_objects; >  static struct object *seen_objects; > -#ifdef ATOMIC_FDE_FAST_PATH > -static int any_objects_registered; > -#endif > >  #ifdef __GTHREAD_MUTEX_INIT >  static __gthread_mutex_t object_mutex = __GTHREAD_MUTEX_INIT; > @@ -78,6 +97,7 @@ init_object_mutex_once (void) >  static __gthread_mutex_t object_mutex; >  #endif >  #endif > +#endif > >  /* Called from crtbegin.o to register the unwind info for an object.  */ > > @@ -99,23 +119,23 @@ __register_frame_info_bases (const void *begin, > struct object *ob, >    ob->fde_end = NULL; >  #endif > > +#ifdef ATOMIC_FDE_FAST_PATH > +  // Initialize  eagerly to avoid locking later > +  init_object (ob); > + > +  // And register the frame > +  uintptr_t range[2]; > +  get_pc_range (ob, range); > +  btree_insert (®istered_frames, range[0], range[1] - range[0], ob); > +#else >    init_object_mutex_once (); >    __gthread_mutex_lock (&object_mutex); > >    ob->next = unseen_objects; >    unseen_objects = ob; > -#ifdef ATOMIC_FDE_FAST_PATH > -  /* Set flag that at least one library has registered FDEs. > -     Use relaxed MO here, it is up to the app to ensure that the library > -     loading/initialization happens-before using that library in other > -     threads (in particular unwinding with that library's functions > -     appearing in the backtraces).  Calling that library's functions > -     without waiting for the library to initialize would be racy.  */ > -  if (!any_objects_registered) > -    __atomic_store_n (&any_objects_registered, 1, __ATOMIC_RELAXED); > -#endif > >    __gthread_mutex_unlock (&object_mutex); > +#endif >  } > >  void > @@ -153,23 +173,23 @@ __register_frame_info_table_bases (void *begin, > struct object *ob, >    ob->s.b.from_array = 1; >    ob->s.b.encoding = DW_EH_PE_omit; > > +#ifdef ATOMIC_FDE_FAST_PATH > +  // Initialize  eagerly to avoid locking later > +  init_object (ob); > + > +  // And register the frame > +  uintptr_t range[2]; > +  get_pc_range (ob, range); > +  btree_insert (®istered_frames, range[0], range[1] - range[0], ob); > +#else >    init_object_mutex_once (); >    __gthread_mutex_lock (&object_mutex); > >    ob->next = unseen_objects; >    unseen_objects = ob; > -#ifdef ATOMIC_FDE_FAST_PATH > -  /* Set flag that at least one library has registered FDEs. > -     Use relaxed MO here, it is up to the app to ensure that the library > -     loading/initialization happens-before using that library in other > -     threads (in particular unwinding with that library's functions > -     appearing in the backtraces).  Calling that library's functions > -     without waiting for the library to initialize would be racy.  */ > -  if (!any_objects_registered) > -    __atomic_store_n (&any_objects_registered, 1, __ATOMIC_RELAXED); > -#endif > >    __gthread_mutex_unlock (&object_mutex); > +#endif >  } > >  void > @@ -200,16 +220,33 @@ __register_frame_table (void *begin) >  void * >  __deregister_frame_info_bases (const void *begin) >  { > -  struct object **p; >    struct object *ob = 0; > >    /* If .eh_frame is empty, we haven't registered.  */ >    if ((const uword *) begin == 0 || *(const uword *) begin == 0) >      return ob; > > +#ifdef ATOMIC_FDE_FAST_PATH > +  // Find the corresponding PC range > +  struct object lookupob; > +  lookupob.tbase = 0; > +  lookupob.dbase = 0; > +  lookupob.u.single = begin; > +  lookupob.s.i = 0; > +  lookupob.s.b.encoding = DW_EH_PE_omit; > +#ifdef DWARF2_OBJECT_END_PTR_EXTENSION > +  lookupob.fde_end = NULL; > +#endif > +  uintptr_t range[2]; > +  get_pc_range (&lookupob, range); > + > +  // And remove > +  ob = btree_remove (®istered_frames, range[0]); > +#else >    init_object_mutex_once (); >    __gthread_mutex_lock (&object_mutex); > > +  struct object **p; >    for (p = &unseen_objects; *p ; p = &(*p)->next) >      if ((*p)->u.single == begin) >        { > @@ -241,6 +278,8 @@ __deregister_frame_info_bases (const void *begin) > >   out: >    __gthread_mutex_unlock (&object_mutex); > +#endif > + >    gcc_assert (ob); >    return (void *) ob; >  } > @@ -264,7 +303,7 @@ __deregister_frame (void *begin) >     instead of an _Unwind_Context.  */ > >  static _Unwind_Ptr > -base_from_object (unsigned char encoding, struct object *ob) > +base_from_object (unsigned char encoding, const struct object *ob) >  { >    if (encoding == DW_EH_PE_omit) >      return 0; > @@ -628,13 +667,17 @@ end_fde_sort (struct object *ob, struct > fde_accumulator *accu, size_t count) >      } >  } > > - > -/* Update encoding, mixed_encoding, and pc_begin for OB for the > -   fde array beginning at THIS_FDE.  Return the number of fdes > -   encountered along the way.  */ > +/* Inspect the fde array beginning at this_fde. This > +   function can be used either in query mode (RANGE is > +   not null, OB is const), or in update mode (RANGE is > +   null, OB is modified). In query mode the function computes > +   the range of PC values and stores it in rANGE. In s/rANGE/RANGE/ > +   update mode it updates encoding, mixed_encoding, and pc_begin > +   for OB. Return the number of fdes encountered along the way. */ > >  static size_t > -classify_object_over_fdes (struct object *ob, const fde *this_fde) > +classify_object_over_fdes (struct object *ob, const fde *this_fde, > +               uintptr_t *range) >  { >    const struct dwarf_cie *last_cie = 0; >    size_t count = 0; > @@ -644,7 +687,7 @@ classify_object_over_fdes (struct object *ob, const > fde *this_fde) >    for (; ! last_fde (ob, this_fde); this_fde = next_fde (this_fde)) >      { >        const struct dwarf_cie *this_cie; > -      _Unwind_Ptr mask, pc_begin; > +      _Unwind_Ptr mask, pc_begin, pc_range; > >        /* Skip CIEs.  */ >        if (this_fde->CIE_delta == 0) > @@ -660,14 +703,19 @@ classify_object_over_fdes (struct object *ob, > const fde *this_fde) >        if (encoding == DW_EH_PE_omit) >          return -1; >        base = base_from_object (encoding, ob); > -      if (ob->s.b.encoding == DW_EH_PE_omit) > -        ob->s.b.encoding = encoding; > -      else if (ob->s.b.encoding != encoding) > -        ob->s.b.mixed_encoding = 1; > +      if (!range) > +        { > +          if (ob->s.b.encoding == DW_EH_PE_omit) > +        ob->s.b.encoding = encoding; > +          else if (ob->s.b.encoding != encoding) > +        ob->s.b.mixed_encoding = 1; > +        } >      } > > -      read_encoded_value_with_base (encoding, base, this_fde->pc_begin, > -                    &pc_begin); > +      const unsigned char *p; > +      p = read_encoded_value_with_base (encoding, base, > this_fde->pc_begin, > +                    &pc_begin); > +      read_encoded_value_with_base (encoding & 0x0F, 0, p, &pc_range); It looks like pc_range is only used in query mode, so how about moving this read and the declaration of pc_range... >        /* Take care to ignore link-once functions that were removed. >       In these cases, the function address will be NULL, but if > @@ -683,8 +731,27 @@ classify_object_over_fdes (struct object *ob, const > fde *this_fde) >      continue; > >        count += 1; > -      if ((void *) pc_begin < ob->pc_begin) > -    ob->pc_begin = (void *) pc_begin; > +      if (range) > +    { ...in here? OK either way. Thanks again, this is impressive work! > +      _Unwind_Ptr pc_end = pc_begin + pc_range; > +      if ((!range[0]) && (!range[1])) > +        { > +          range[0] = pc_begin; > +          range[1] = pc_end; > +        } > +      else > +        { > +          if (pc_begin < range[0]) > +        range[0] = pc_begin; > +          if (pc_end > range[1]) > +        range[1] = pc_end; > +        } > +    } > +      else > +    { > +      if ((void *) pc_begin < ob->pc_begin) > +        ob->pc_begin = (void *) pc_begin; > +    } >      } > >    return count; > @@ -769,7 +836,7 @@ init_object (struct object* ob) >        fde **p = ob->u.array; >        for (count = 0; *p; ++p) >          { > -          size_t cur_count = classify_object_over_fdes (ob, *p); > +          size_t cur_count = classify_object_over_fdes (ob, *p, NULL); >            if (cur_count == (size_t) -1) >          goto unhandled_fdes; >            count += cur_count; > @@ -777,7 +844,7 @@ init_object (struct object* ob) >      } >        else >      { > -      count = classify_object_over_fdes (ob, ob->u.single); > +      count = classify_object_over_fdes (ob, ob->u.single, NULL); >        if (count == (size_t) -1) >          { >            static const fde terminator; > @@ -821,6 +888,32 @@ init_object (struct object* ob) >    ob->s.b.sorted = 1; >  } > > +#ifdef ATOMIC_FDE_FAST_PATH > +/* Get the PC range for lookup */ > +static void > +get_pc_range (const struct object *ob, uintptr_t *range) > +{ > +  // It is safe to cast to non-const object* here as > +  // classify_object_over_fdes does not modify ob in query mode. > +  struct object *ncob = (struct object *) (uintptr_t) ob; > +  range[0] = range[1] = 0; > +  if (ob->s.b.sorted) > +    { > +      classify_object_over_fdes (ncob, ob->u.sort->orig_data, range); > +    } > +  else if (ob->s.b.from_array) > +    { > +      fde **p = ob->u.array; > +      for (; *p; ++p) > +    classify_object_over_fdes (ncob, *p, range); > +    } > +  else > +    { > +      classify_object_over_fdes (ncob, ob->u.single, range); > +    } > +} > +#endif > + >  /* A linear search through a set of FDEs for the given PC.  This is >     used when there was insufficient memory to allocate and sort an >     array.  */ > @@ -985,6 +1078,9 @@ binary_search_mixed_encoding_fdes (struct object > *ob, void *pc) >  static const fde * >  search_object (struct object* ob, void *pc) >  { > +  /* The fast path initializes objects eagerly to avoid locking. > +   * On the slow path we initialize them now */ > +#ifndef ATOMIC_FDE_FAST_PATH >    /* If the data hasn't been sorted, try to do this now.  We may have >       more memory available than last time we tried.  */ >    if (! ob->s.b.sorted) > @@ -997,6 +1093,7 @@ search_object (struct object* ob, void *pc) >        if (pc < ob->pc_begin) >      return NULL; >      } > +#endif > >    if (ob->s.b.sorted) >      { > @@ -1033,17 +1130,12 @@ _Unwind_Find_FDE (void *pc, struct > dwarf_eh_bases *bases) >    const fde *f = NULL; > >  #ifdef ATOMIC_FDE_FAST_PATH > -  /* For targets where unwind info is usually not registered through these > -     APIs anymore, avoid taking a global lock. > -     Use relaxed MO here, it is up to the app to ensure that the library > -     loading/initialization happens-before using that library in other > -     threads (in particular unwinding with that library's functions > -     appearing in the backtraces).  Calling that library's functions > -     without waiting for the library to initialize would be racy.  */ > -  if (__builtin_expect (!__atomic_load_n (&any_objects_registered, > -                      __ATOMIC_RELAXED), 1)) > +  ob = btree_lookup (®istered_frames, (uintptr_t) pc); > +  if (!ob) >      return NULL; > -#endif > + > +  f = search_object (ob, pc); > +#else > >    init_object_mutex_once (); >    __gthread_mutex_lock (&object_mutex); > @@ -1081,6 +1173,7 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases > *bases) > >   fini: >    __gthread_mutex_unlock (&object_mutex); > +#endif > >    if (f) >      { > diff --git a/libgcc/unwind-dw2-fde.h b/libgcc/unwind-dw2-fde.h > index 8a011c358b4..77c2caa4f5a 100644 > --- a/libgcc/unwind-dw2-fde.h > +++ b/libgcc/unwind-dw2-fde.h > @@ -166,7 +166,7 @@ next_fde (const fde *f) >  extern const fde * _Unwind_Find_FDE (void *, struct dwarf_eh_bases *); > >  static inline int > -last_fde (struct object *obj __attribute__ ((__unused__)), const fde *f) > +last_fde (const struct object *obj __attribute__ ((__unused__)), const > fde *f) >  { >  #ifdef DWARF2_OBJECT_END_PTR_EXTENSION >    return f == (const fde *) obj->fde_end || f->length == 0;