public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Add splay-tree "view" for bitmap
@ 2018-10-18 13:58 Richard Biener
  2018-10-18 15:43 ` Richard Sandiford
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Richard Biener @ 2018-10-18 13:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: stevenb.gcc

[-- Attachment #1: Type: text/plain, Size: 65408 bytes --]


PR63155 made me pick up this old work from Steven, it turns our
linked-list implementation to a two-mode one with one being a
splay tree featuring O(log N) complexity for find/remove.

Over Stevens original patch I added a bitmap_tree_to_vec helper
that I use from the debug/print methods to avoid changing view
there.  In theory the bitmap iterator could get a "stack"
as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.

This can be used to fix the two biggest bottlenecks in the PRs
testcase, namely SSA propagator worklist handling and out-of-SSA
coalesce list building.  perf shows the following data, first
unpatched, second patched - also watch the thrid coulumn (samples)
when comparing percentages.

-O0
-   18.19%    17.35%           407  cc1      cc1               [.] bitmap_set_bб▒
   - bitmap_set_bit                                                            б▒
      + 8.77% create_coalesce_list_for_region                                  б▒
      + 4.21% calculate_live_ranges                                            б▒
      + 2.02% build_ssa_conflict_graph                                         б▒
      + 1.66% insert_phi_nodes_for                                             б▒
      + 0.86% coalesce_ssa_name                      
patched:
-   12.39%    10.48%           129  cc1      cc1               [.] bitmap_set_bб▒
   - bitmap_set_bit                                                            б▒
      + 5.27% calculate_live_ranges                                            б▒
      + 2.76% insert_phi_nodes_for                                             б▒
      + 1.90% create_coalesce_list_for_region                                  б▒
      + 1.63% build_ssa_conflict_graph                                         б▒
      + 0.35% coalesce_ssa_name                           

-O1
-   17.53%    17.53%           842  cc1      cc1               [.] bitmap_set_bб▒
   - bitmap_set_bit                                                            б▒
      + 12.39% add_ssa_edge                                                    б▒
      + 1.48% create_coalesce_list_for_region                                  б▒
      + 0.82% solve_constraints                                                б▒
      + 0.71% calculate_live_ranges                                            б▒
      + 0.64% add_implicit_graph_edge                                          б▒
      + 0.41% insert_phi_nodes_for                                             б▒
      + 0.34% build_ssa_conflict_graph                      
patched:
-    5.79%     5.00%           167  cc1      cc1               [.] bitmap_set_bб▒
   - bitmap_set_bit                                                            б▒
      + 1.41% add_ssa_edge                                                     б▒
      + 0.88% calculate_live_ranges                                            б▒
      + 0.75% add_implicit_graph_edge                                          б▒
      + 0.68% solve_constraints                                                б▒
      + 0.48% insert_phi_nodes_for                                             б▒
      + 0.45% build_ssa_conflict_graph                   

-O3
-   12.37%    12.34%          1145  cc1      cc1               [.] bitmap_set_bб▒
   - bitmap_set_bit                                                            б▒
      + 9.14% add_ssa_edge                                                     б▒
      + 0.80% create_coalesce_list_for_region                                  б▒
      + 0.69% add_implicit_graph_edge                                          б▒
      + 0.54% solve_constraints                                                б▒
      + 0.34% calculate_live_ranges                                            б▒
      + 0.27% insert_phi_nodes_for                                             б▒
      + 0.21% build_ssa_conflict_graph                 
-    4.36%     3.86%           227  cc1      cc1               [.] bitmap_set_bб▒
   - bitmap_set_bit                                                            б▒
      + 0.98% add_ssa_edge                                                     б▒
      + 0.86% add_implicit_graph_edge                                          б▒
      + 0.64% solve_constraints                                                б▒
      + 0.57% calculate_live_ranges                                            б▒
      + 0.32% build_ssa_conflict_graph                                         б▒
      + 0.29% mark_all_vars_used_1                                             б▒
      + 0.20% insert_phi_nodes_for                                             б▒
      + 0.16% create_coalesce_list_for_region             


Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Any objections?

Thanks,
Richard.

2018-10-18  Steven Bosscher <steven@gcc.gnu.org>
	Richard Biener  <rguenther@suse.de>

	* bitmap.h: Update data structure documentation, including a
	description of bitmap views as either linked-lists or splay trees.
	(struct bitmap_element_def): Update comments for splay tree bitmaps.
	(struct bitmap_head_def): Likewise.
	(bitmap_list_view, bitmap_tree_view): New prototypes.
	(debug_bitmap, debug_bitmap_file, bitmap_print): Update prototypes.
	(dump_bitmap): Update to take non-const bitmap.
	(bitmap_initialize_stat): Initialize a bitmap_head's indx and
	tree_form fields.
	(bmp_iter_set_init): Assert the iterated bitmaps are in list form.
	(bmp_iter_and_init, bmp_iter_and_compl_init): Likewise.
	* bitmap.c (next_bitmap_desc_id): Make unsigned.
	(get_bitmap_descriptor): Make sure there are no more than 2^31
	bitmap descriptors.
	(bitmap_elem_to_freelist): Unregister overhead of a released bitmap
	element here.
	(bitmap_element_free): Remove.
	(bitmap_elt_clear_from): Work on splay tree bitmaps.
	(bitmap_list_link_element): Renamed from bitmap_element_link.  Move
	this function similar ones such that linked-list bitmap implementation
	functions are grouped.
	(bitmap_list_unlink_element): Renamed from bitmap_element_unlink,
	and moved for grouping.
	(bitmap_list_insert_element_after): Renamed from
	bitmap_elt_insert_after, and moved for grouping.
	(bitmap_list_find_element): New function spliced from bitmap_find_bit.
	(bitmap_tree_link_left, bitmap_tree_link_right,
	bitmap_tree_rotate_left, bitmap_tree_rotate_right, bitmap_tree_splay,
	bitmap_tree_link_element, bitmap_tree_unlink_element,
	bitmap_tree_find_element): New functions for splay-tree bitmap
	implementation.
	(bitmap_element_link, bitmap_element_unlink, bitmap_elt_insert_after):
	Renamed and moved, see above entries.
	(bitmap_tree_listify_from): New function to convert part of a splay
	tree bitmap to a linked-list bitmap.
	(bitmap_list_view): Convert a splay tree bitmap to linked-list form.
	(bitmap_tree_view): Convert a linked-list bitmap to splay tree form.
	(bitmap_find_bit, bitmap_clear, bitmap_clear_bit, bitmap_set_bit,
	bitmap_single_bit_set_p, bitmap_first_set_bit, bitmap_last_set_bit):
	Handle splay tree bitmaps.
	(bitmap_copy, bitmap_count_bits, bitmap_and, bitmap_and_into,
	bitmap_elt_copy, bitmap_and_compl, bitmap_and_compl_into,
	bitmap_compl_and_into, bitmap_elt_ior, bitmap_ior, bitmap_ior_into,
	bitmap_xor, bitmap_xor_into, bitmap_equal_p, bitmap_intersect_p,
	bitmap_intersect_compl_p, bitmap_ior_and_compl,
	bitmap_ior_and_compl_into, bitmap_set_range, bitmap_clear_range,
	bitmap_hash): Reject trying to act on splay tree bitmaps.  Make
	corresponding changes to use linked-list specific bitmap_element
	manipulation functions as applicable for efficiency.
	(debug_bitmap_file): Handle splay tree bitmaps by converting the
	bitmap to linked-list form and back.
	(bitmap_print): Likewise.
	(debug_bitmap): Take a non-const bitmap.

	PR tree-optimization/63155
	* tree-ssa-propagate.c (ssa_prop_init): Use tree-view for the
	SSA edge worklists.
	* tree-ssa-coalesce.c (coalesce_ssa_name): Populate used_in_copies
	in tree-view.

Index: gcc/bitmap.c
===================================================================
--- gcc/bitmap.c	(revision 265259)
+++ gcc/bitmap.c	(working copy)
@@ -26,6 +26,8 @@ along with GCC; see the file COPYING3.
 /* Memory allocation statistics purpose instance.  */
 mem_alloc_description<bitmap_usage> bitmap_mem_desc;
 
+static bitmap_element *bitmap_tree_listify_from (bitmap, bitmap_element *);
+
 /* Register new bitmap.  */
 void
 bitmap_register (bitmap b MEM_STAT_DECL)
@@ -49,22 +51,18 @@ static int bitmap_default_obstack_depth;
 static GTY((deletable)) bitmap_element *bitmap_ggc_free; /* Freelist of
 							    GC'd elements.  */
 
-static void bitmap_elem_to_freelist (bitmap, bitmap_element *);
-static void bitmap_element_free (bitmap, bitmap_element *);
-static bitmap_element *bitmap_element_allocate (bitmap);
-static int bitmap_element_zerop (const bitmap_element *);
-static void bitmap_element_link (bitmap, bitmap_element *);
-static bitmap_element *bitmap_elt_insert_after (bitmap, bitmap_element *, unsigned int);
-static void bitmap_elt_clear_from (bitmap, bitmap_element *);
-static bitmap_element *bitmap_find_bit (bitmap, unsigned int);
 \f
+/* Bitmap memory management.  */
 
-/* Add ELEM to the appropriate freelist.  */
+/* Add ELT to the appropriate freelist.  */
 static inline void
 bitmap_elem_to_freelist (bitmap head, bitmap_element *elt)
 {
   bitmap_obstack *bit_obstack = head->obstack;
 
+  if (GATHER_STATISTICS)
+    register_overhead (head, -((int)sizeof (bitmap_element)));
+
   elt->next = NULL;
   elt->indx = -1;
   if (bit_obstack)
@@ -79,41 +77,6 @@ bitmap_elem_to_freelist (bitmap head, bi
     }
 }
 
-/* Free a bitmap element.  Since these are allocated off the
-   bitmap_obstack, "free" actually means "put onto the freelist".  */
-
-static inline void
-bitmap_element_free (bitmap head, bitmap_element *elt)
-{
-  bitmap_element *next = elt->next;
-  bitmap_element *prev = elt->prev;
-
-  if (prev)
-    prev->next = next;
-
-  if (next)
-    next->prev = prev;
-
-  if (head->first == elt)
-    head->first = next;
-
-  /* Since the first thing we try is to insert before current,
-     make current the next entry in preference to the previous.  */
-  if (head->current == elt)
-    {
-      head->current = next != 0 ? next : prev;
-      if (head->current)
-	head->indx = head->current->indx;
-      else
-	head->indx = 0;
-    }
-
-  if (GATHER_STATISTICS)
-    register_overhead (head, -((int)sizeof (bitmap_element)));
-
-  bitmap_elem_to_freelist (head, elt);
-}
-\f
 /* Allocate a bitmap element.  The bits are cleared, but nothing else is.  */
 
 static inline bitmap_element *
@@ -158,69 +121,545 @@ bitmap_element_allocate (bitmap head)
 	element = ggc_alloc<bitmap_element> ();
     }
 
-  if (GATHER_STATISTICS)
-    register_overhead (head, sizeof (bitmap_element));
+  if (GATHER_STATISTICS)
+    register_overhead (head, sizeof (bitmap_element));
+
+  memset (element->bits, 0, sizeof (element->bits));
+
+  return element;
+}
+
+/* Remove ELT and all following elements from bitmap HEAD.
+   Put the released elements in the freelist for HEAD.  */
+
+void
+bitmap_elt_clear_from (bitmap head, bitmap_element *elt)
+{
+  bitmap_element *prev;
+  bitmap_obstack *bit_obstack = head->obstack;
+
+  if (!elt)
+    return;
+
+  if (head->tree_form)
+    elt = bitmap_tree_listify_from (head, elt);
+
+  if (GATHER_STATISTICS)
+    {
+      int n = 0;
+      for (prev = elt; prev; prev = prev->next)
+	n++;
+      register_overhead (head, -sizeof (bitmap_element) * n);
+    }
+
+  prev = elt->prev;
+  if (prev)
+    {
+      prev->next = NULL;
+      if (head->current->indx > prev->indx)
+	{
+	  head->current = prev;
+	  head->indx = prev->indx;
+	}
+    }
+  else
+    {
+      head->first = NULL;
+      head->current = NULL;
+      head->indx = 0;
+    }
+
+  /* Put the entire list onto the freelist in one operation. */
+  if (bit_obstack)
+    {
+      elt->prev = bit_obstack->elements;
+      bit_obstack->elements = elt;
+    }
+  else
+    {
+      elt->prev = bitmap_ggc_free;
+      bitmap_ggc_free = elt;
+    }
+}
+\f
+/* Linked-list view of bitmaps.
+
+   In this representation, the bitmap elements form a double-linked list
+   with elements sorted by increasing index.  */
+
+/* Link the bitmap element into the current bitmap linked list.  */
+
+static inline void
+bitmap_list_link_element (bitmap head, bitmap_element *element)
+{
+  unsigned int indx = element->indx;
+  bitmap_element *ptr;
+
+  gcc_assert (!head->tree_form);
+
+  /* If this is the first and only element, set it in.  */
+  if (head->first == 0)
+    {
+      element->next = element->prev = 0;
+      head->first = element;
+    }
+
+  /* If this index is less than that of the current element, it goes someplace
+     before the current element.  */
+  else if (indx < head->indx)
+    {
+      for (ptr = head->current;
+	   ptr->prev != 0 && ptr->prev->indx > indx;
+	   ptr = ptr->prev)
+	;
+
+      if (ptr->prev)
+	ptr->prev->next = element;
+      else
+	head->first = element;
+
+      element->prev = ptr->prev;
+      element->next = ptr;
+      ptr->prev = element;
+    }
+
+  /* Otherwise, it must go someplace after the current element.  */
+  else
+    {
+      for (ptr = head->current;
+	   ptr->next != 0 && ptr->next->indx < indx;
+	   ptr = ptr->next)
+	;
+
+      if (ptr->next)
+	ptr->next->prev = element;
+
+      element->next = ptr->next;
+      element->prev = ptr;
+      ptr->next = element;
+    }
+
+  /* Set up so this is the first element searched.  */
+  head->current = element;
+  head->indx = indx;
+}
+
+/* Unlink the bitmap element from the current bitmap linked list,
+   and return it to the freelist.  */
+
+static inline void
+bitmap_list_unlink_element (bitmap head, bitmap_element *element)
+{
+  bitmap_element *next = element->next;
+  bitmap_element *prev = element->prev;
+
+  gcc_assert (!head->tree_form);
+
+  if (prev)
+    prev->next = next;
+
+  if (next)
+    next->prev = prev;
+
+  if (head->first == element)
+    head->first = next;
+
+  /* Since the first thing we try is to insert before current,
+     make current the next entry in preference to the previous.  */
+  if (head->current == element)
+    {
+      head->current = next != 0 ? next : prev;
+      if (head->current)
+	head->indx = head->current->indx;
+      else
+	head->indx = 0;
+    }
+
+  bitmap_elem_to_freelist (head, element);
+}
+
+/* Insert a new uninitialized element into bitmap HEAD after element
+   ELT.  If ELT is NULL, insert the element at the start.  Return the
+   new element.  */
+
+static bitmap_element *
+bitmap_list_insert_element_after (bitmap head,
+				  bitmap_element *elt, unsigned int indx)
+{
+  bitmap_element *node = bitmap_element_allocate (head);
+  node->indx = indx;
+
+  gcc_assert (!head->tree_form);
+
+  if (!elt)
+    {
+      if (!head->current)
+	{
+	  head->current = node;
+	  head->indx = indx;
+	}
+      node->next = head->first;
+      if (node->next)
+	node->next->prev = node;
+      head->first = node;
+      node->prev = NULL;
+    }
+  else
+    {
+      gcc_checking_assert (head->current);
+      node->next = elt->next;
+      if (node->next)
+	node->next->prev = node;
+      elt->next = node;
+      node->prev = elt;
+    }
+  return node;
+}
+
+/* Return the element for INDX, or NULL if the element doesn't exist.  */
+
+static inline bitmap_element *
+bitmap_list_find_element (bitmap head, unsigned int indx, bitmap_usage *usage)
+{
+  bitmap_element *element;
+  if (head->indx < indx)
+    /* INDX is beyond head->indx.  Search from head->current
+       forward.  */
+    for (element = head->current;
+	 element->next != 0 && element->indx < indx;
+	 element = element->next)
+      {
+	if (GATHER_STATISTICS && usage)
+	  usage->m_search_iter++;
+      }
+
+  else if (head->indx / 2 < indx)
+    /* INDX is less than head->indx and closer to head->indx than to
+       0.  Search from head->current backward.  */
+    for (element = head->current;
+	 element->prev != 0 && element->indx > indx;
+	 element = element->prev)
+      {
+	if (GATHER_STATISTICS && usage)
+	  usage->m_search_iter++;
+      }
+
+  else
+    /* INDX is less than head->indx and closer to 0 than to
+       head->indx.  Search from head->first forward.  */
+    for (element = head->first;
+	 element->next != 0 && element->indx < indx;
+	 element = element->next)
+      {
+	if (GATHER_STATISTICS && usage)
+	  usage->m_search_iter++;
+      }
+
+  /* `element' is the nearest to the one we want.  If it's not the one we
+     want, the one we want doesn't exist.  */
+  gcc_checking_assert (element != NULL);
+  head->current = element;
+  head->indx = element->indx;
+  if (element->indx != indx)
+    element = 0;
+  return element;
+}
+
+\f
+/* Splay-tree view of bitmaps.
+
+   This is an almost one-to-one the implementatin of the simple top-down
+   splay tree in Sleator and Tarjan's "Self-adjusting Binary Search Trees".
+   It is probably not the most efficient form of splay trees, but it should
+   be good enough to experiment with this idea of bitmaps-as-trees.
+   
+   For all functions below, the variable or function argument "t" is a node
+   in the tree, and "e" is a temporary or new node in the tree.  The rest
+   is sufficiently straigh-forward (and very well explained in the paper)
+   that comment would only clutter things.  */
+
+static inline void
+bitmap_tree_link_left (bitmap_element * &t, bitmap_element * &l)
+{
+  l->next = t;
+  l = t;
+  t = t->next;
+}
+
+static inline void
+bitmap_tree_link_right (bitmap_element * &t, bitmap_element * &r)
+{
+  r->prev = t;
+  r = t;
+  t = t->prev;
+}
+
+static inline void
+bitmap_tree_rotate_left (bitmap_element * &t)
+{
+  bitmap_element *e = t->next;
+  t->next = t->next->prev;
+  e->prev = t;
+  t = e;
+}
+
+static inline void
+bitmap_tree_rotate_right (bitmap_element * &t)
+{
+  bitmap_element *e = t->prev;
+  t->prev = t->prev->next;
+  e->next = t;
+  t = e;
+}
+
+static bitmap_element *
+bitmap_tree_splay (bitmap head, bitmap_element *t, unsigned int indx)
+{
+  bitmap_element N, *l, *r;
+
+  if (t == NULL)
+    return NULL;
+
+  bitmap_usage *usage = NULL;
+  if (GATHER_STATISTICS)
+    usage = bitmap_mem_desc.get_descriptor_for_instance (head);
+
+  N.prev = N.next = NULL;
+  l = r = &N;
+
+  while (indx != t->indx)
+    {
+      if (GATHER_STATISTICS && usage)
+	usage->m_search_iter++;
+
+      if (indx < t->indx)
+	{
+	  if (t->prev != NULL && indx < t->prev->indx)
+	    bitmap_tree_rotate_right (t);
+	  if (t->prev == NULL)
+	    break;
+	  bitmap_tree_link_right (t, r);
+	}
+      else if (indx > t->indx)
+	{
+	  if (t->next != NULL && indx > t->next->indx)
+	    bitmap_tree_rotate_left (t);
+	  if (t->next == NULL)
+	    break;
+	  bitmap_tree_link_left (t, l);
+	}
+    }
+
+  l->next = t->prev;
+  r->prev = t->next;
+  t->prev = N.next;
+  t->next = N.prev;
+  return t;
+}
+
+/* Link bitmap element E into the current bitmap splay tree.  */
+
+static inline void
+bitmap_tree_link_element (bitmap head, bitmap_element *e)
+{
+  if (head->first == NULL)
+    e->prev = e->next = NULL;
+  else
+    {
+      bitmap_element *t = bitmap_tree_splay (head, head->first, e->indx);
+      if (e->indx < t->indx)
+	{
+	  e->prev = t->prev;
+	  e->next = t;
+	  t->prev = NULL;
+	}
+      else if (e->indx > t->indx)
+	{
+	  e->next = t->next;
+	  e->prev = t;
+	  t->next = NULL;
+	}
+      else
+	gcc_unreachable ();
+    }
+  head->first = e;
+  head->current = e;
+  head->indx = e->indx;
+}
+
+/* Unlink bitmap element E from the current bitmap splay tree,
+   and return it to the freelist.  */
+
+static void
+bitmap_tree_unlink_element (bitmap head, bitmap_element *e)
+{
+  bitmap_element *t = bitmap_tree_splay (head, head->first, e->indx);
+
+  gcc_checking_assert (t == e);
+
+  if (e->prev == NULL)
+    t = e->next;
+  else
+    {
+      t = bitmap_tree_splay (head, e->prev, e->indx);
+      t->next = e->next;
+    }
+  head->first = t;
+  head->current = t;
+  head->indx = (t != NULL) ? t->indx : 0;
+
+  bitmap_elem_to_freelist (head, e);
+}
+
+/* Return the element for INDX, or NULL if the element doesn't exist.  */
+
+static inline bitmap_element *
+bitmap_tree_find_element (bitmap head, unsigned int indx)
+{
+  bitmap_element *element = bitmap_tree_splay (head, head->first, indx);
+  gcc_checking_assert (element != NULL);
+  head->first = element;
+  head->current = element;
+  head->indx = element->indx;
+  if (element->indx != indx)
+    element = 0;
+  return element;
+}
+\f
+/* Converting bitmap views from linked-list to tree and vice versa.  */
+
+/* Splice element E and all elements with a larger index from
+   bitmap HEAD, convert the spliced elements to the linked-list
+   view, and return the head of the list (which should be E again),  */
+
+static bitmap_element *
+bitmap_tree_listify_from (bitmap head, bitmap_element *e)
+{
+  bitmap_element *t, *erb;
+
+  /* Detach the right branch from E (all elements with indx > E->indx),
+     and splay E to the root.  */
+  erb = e->next;
+  e->next = NULL;
+  t = bitmap_tree_splay (head, head->first, e->indx);
+  gcc_checking_assert (t == e);
+
+  /* Because E has no right branch, and we rotated it to the root,
+     the left branch is the new root.  */
+  t = e->prev;
+  head->first = t;
+  head->current = t;
+  head->indx = (t != NULL) ? t->indx : 0;
+
+  /* Detach the tree from E, and re-attach the right branch of E.  */
+  e->prev = NULL;
+  e->next = erb;
+
+  /* The tree is now valid again.  Now we need to "un-tree" E.
+     It is imperative that a non-recursive implementation is used
+     for this, because splay trees have a worst case depth of O(N)
+     for a tree with N nodes.  A recursive implementation could
+     result in a stack overflow for a sufficiently large, unbalanced
+     bitmap tree.  */
+
+  vec<bitmap_element *> stack = vNULL;
+  vec<bitmap_element *> sorted_elements = vNULL;
+  bitmap_element *n = e;
+
+  while (true)
+    {
+      while (n != NULL)
+	{
+	  stack.safe_push (n);
+	  n = n->prev;
+	}
+
+      if (stack.is_empty ())
+	break;
+
+      n = stack.pop ();
+      sorted_elements.safe_push (n);
+      n = n->next;
+    }
+  stack.release ();
+
+  gcc_assert (sorted_elements[0] == e);
 
-  memset (element->bits, 0, sizeof (element->bits));
+  bitmap_element *prev = NULL;
+  unsigned ix;
+  FOR_EACH_VEC_ELT (sorted_elements, ix, n)
+    {
+      if (prev != NULL)
+        prev->next = n;
+      n->prev = prev;
+      n->next = NULL;
+      prev = n;
+    }
+  sorted_elements.release ();
 
-  return element;
+  return e;
 }
 
-/* Remove ELT and all following elements from bitmap HEAD.  */
+/* Convert bitmap HEAD from splay-tree view to linked-list view.  */
 
 void
-bitmap_elt_clear_from (bitmap head, bitmap_element *elt)
+bitmap_list_view (bitmap head)
 {
-  bitmap_element *prev;
-  bitmap_obstack *bit_obstack = head->obstack;
+  bitmap_element *ptr;
 
-  if (!elt) return;
+  gcc_assert (head->tree_form);
 
-  if (GATHER_STATISTICS)
+  ptr = head->first;
+  if (ptr)
     {
-      int n = 0;
-      for (prev = elt; prev; prev = prev->next)
-	n++;
-      register_overhead (head, -sizeof (bitmap_element) * n);
+      while (ptr->prev)
+	bitmap_tree_rotate_right (ptr);
+      head->first = ptr;
+      head->first = bitmap_tree_listify_from (head, ptr);
     }
 
-  prev = elt->prev;
-  if (prev)
-    {
-      prev->next = NULL;
-      if (head->current->indx > prev->indx)
-	{
-	  head->current = prev;
-	  head->indx = prev->indx;
-	}
-    }
-  else
-    {
-      head->first = NULL;
-      head->current = NULL;
-      head->indx = 0;
-    }
+  head->tree_form = false;
+}
 
-  /* Put the entire list onto the free list in one operation. */
-  if (bit_obstack)
-    {
-      elt->prev = bit_obstack->elements;
-      bit_obstack->elements = elt;
-    }
-  else
+/* Convert bitmap HEAD from linked-list view to splay-tree view.
+   This is simply a matter of dropping the prev or next pointers
+   and setting the tree_form flag.  The tree will balance itself
+   if and when it is used.  */
+
+void
+bitmap_tree_view (bitmap head)
+{
+  bitmap_element *ptr;
+
+  gcc_assert (! head->tree_form);
+
+  ptr = head->first;
+  while (ptr)
     {
-      elt->prev = bitmap_ggc_free;
-      bitmap_ggc_free = elt;
+      ptr->prev = NULL;
+      ptr = ptr->next;
     }
-}
 
-/* Clear a bitmap by freeing the linked list.  */
+  head->tree_form = true;
+}
+\f
+/* Clear a bitmap by freeing all its elements.  */
 
 void
 bitmap_clear (bitmap head)
 {
-  if (head->first)
-    bitmap_elt_clear_from (head, head->first);
+  if (head->first == NULL)
+    return;
+  if (head->tree_form)
+    {
+      bitmap_element *e, *t;
+      for (e = head->first; e->prev; e = e->prev)
+	/* Loop to find the element with the smallest index.  */ ;
+      t = bitmap_tree_splay (head, head->first, e->indx);
+      gcc_checking_assert (t == e);
+      head->first = t;
+    }
+  bitmap_elt_clear_from (head, head->first);
 }
 \f
 /* Initialize a bitmap obstack.  If BIT_OBSTACK is NULL, initialize
@@ -344,96 +783,6 @@ bitmap_element_zerop (const bitmap_eleme
 #endif
 }
 \f
-/* Link the bitmap element into the current bitmap linked list.  */
-
-static inline void
-bitmap_element_link (bitmap head, bitmap_element *element)
-{
-  unsigned int indx = element->indx;
-  bitmap_element *ptr;
-
-  /* If this is the first and only element, set it in.  */
-  if (head->first == 0)
-    {
-      element->next = element->prev = 0;
-      head->first = element;
-    }
-
-  /* If this index is less than that of the current element, it goes someplace
-     before the current element.  */
-  else if (indx < head->indx)
-    {
-      for (ptr = head->current;
-	   ptr->prev != 0 && ptr->prev->indx > indx;
-	   ptr = ptr->prev)
-	;
-
-      if (ptr->prev)
-	ptr->prev->next = element;
-      else
-	head->first = element;
-
-      element->prev = ptr->prev;
-      element->next = ptr;
-      ptr->prev = element;
-    }
-
-  /* Otherwise, it must go someplace after the current element.  */
-  else
-    {
-      for (ptr = head->current;
-	   ptr->next != 0 && ptr->next->indx < indx;
-	   ptr = ptr->next)
-	;
-
-      if (ptr->next)
-	ptr->next->prev = element;
-
-      element->next = ptr->next;
-      element->prev = ptr;
-      ptr->next = element;
-    }
-
-  /* Set up so this is the first element searched.  */
-  head->current = element;
-  head->indx = indx;
-}
-
-/* Insert a new uninitialized element into bitmap HEAD after element
-   ELT.  If ELT is NULL, insert the element at the start.  Return the
-   new element.  */
-
-static bitmap_element *
-bitmap_elt_insert_after (bitmap head, bitmap_element *elt, unsigned int indx)
-{
-  bitmap_element *node = bitmap_element_allocate (head);
-  node->indx = indx;
-
-  if (!elt)
-    {
-      if (!head->current)
-	{
-	  head->current = node;
-	  head->indx = indx;
-	}
-      node->next = head->first;
-      if (node->next)
-	node->next->prev = node;
-      head->first = node;
-      node->prev = NULL;
-    }
-  else
-    {
-      gcc_checking_assert (head->current);
-      node->next = elt->next;
-      if (node->next)
-	node->next->prev = node;
-      elt->next = node;
-      node->prev = elt;
-    }
-  return node;
-}
-\f
 /* Copy a bitmap to another bitmap.  */
 
 void
@@ -442,6 +791,8 @@ bitmap_copy (bitmap to, const_bitmap fro
   const bitmap_element *from_ptr;
   bitmap_element *to_ptr = 0;
 
+  gcc_assert (!to->tree_form && !from->tree_form);
+
   bitmap_clear (to);
 
   /* Copy elements in forward direction one at a time.  */
@@ -452,8 +803,9 @@ bitmap_copy (bitmap to, const_bitmap fro
       to_elt->indx = from_ptr->indx;
       memcpy (to_elt->bits, from_ptr->bits, sizeof (to_elt->bits));
 
-      /* Here we have a special case of bitmap_element_link, for the case
-	 where we know the links are being entered in sequence.  */
+      /* Here we have a special case of bitmap_list_link_element,
+         for the case where we know the links are being entered
+	 in sequence.  */
       if (to_ptr == 0)
 	{
 	  to->first = to->current = to_elt;
@@ -506,7 +858,9 @@ bitmap_find_bit (bitmap head, unsigned i
   if (head->current == NULL
       || head->indx == indx)
     return head->current;
-  if (head->current == head->first
+  /* ???  Premature optimization?  */
+  if (!head->tree_form
+      && head->current == head->first
       && head->first->next == NULL)
     return NULL;
 
@@ -521,45 +875,10 @@ bitmap_find_bit (bitmap head, unsigned i
   if (GATHER_STATISTICS && usage)
     usage->m_nsearches++;
 
-  if (head->indx < indx)
-    /* INDX is beyond head->indx.  Search from head->current
-       forward.  */
-    for (element = head->current;
-	 element->next != 0 && element->indx < indx;
-	 element = element->next)
-      {
-	if (GATHER_STATISTICS && usage)
-	  usage->m_search_iter++;
-      }
-
-  else if (head->indx / 2 < indx)
-    /* INDX is less than head->indx and closer to head->indx than to
-       0.  Search from head->current backward.  */
-    for (element = head->current;
-	 element->prev != 0 && element->indx > indx;
-	 element = element->prev)
-      {
-	if (GATHER_STATISTICS && usage)
-	  usage->m_search_iter++;
-      }
-
+  if (!head->tree_form)
+    element = bitmap_list_find_element (head, indx, usage);
   else
-    /* INDX is less than head->indx and closer to 0 than to
-       head->indx.  Search from head->first forward.  */
-    for (element = head->first;
-	 element->next != 0 && element->indx < indx;
-	 element = element->next)
-      if (GATHER_STATISTICS && usage)
-	{
-	  usage->m_search_iter++;
-	}
-
-  /* `element' is the nearest to the one we want.  If it's not the one we
-     want, the one we want doesn't exist.  */
-  head->current = element;
-  head->indx = element->indx;
-  if (element->indx != indx)
-    element = 0;
+    element = bitmap_tree_find_element (head, indx);
 
   return element;
 }
@@ -583,7 +902,12 @@ bitmap_clear_bit (bitmap head, int bit)
 	  /* If we cleared the entire word, free up the element.  */
 	  if (!ptr->bits[word_num]
 	      && bitmap_element_zerop (ptr))
-	    bitmap_element_free (head, ptr);
+	    {
+	      if (!head->tree_form)
+		bitmap_list_unlink_element (head, ptr);
+	      else
+		bitmap_tree_unlink_element (head, ptr);
+	    }
 	}
 
       return res;
@@ -602,21 +926,22 @@ bitmap_set_bit (bitmap head, int bit)
   unsigned bit_num  = bit % BITMAP_WORD_BITS;
   BITMAP_WORD bit_val = ((BITMAP_WORD) 1) << bit_num;
 
-  if (ptr == 0)
-    {
-      ptr = bitmap_element_allocate (head);
-      ptr->indx = bit / BITMAP_ELEMENT_ALL_BITS;
-      ptr->bits[word_num] = bit_val;
-      bitmap_element_link (head, ptr);
-      return true;
-    }
-  else
+  if (ptr != 0)
     {
       bool res = (ptr->bits[word_num] & bit_val) == 0;
       if (res)
 	ptr->bits[word_num] |= bit_val;
       return res;
     }
+
+  ptr = bitmap_element_allocate (head);
+  ptr->indx = bit / BITMAP_ELEMENT_ALL_BITS;
+  ptr->bits[word_num] = bit_val;
+  if (!head->tree_form)
+    bitmap_list_link_element (head, ptr);
+  else
+    bitmap_tree_link_element (head, ptr);
+  return true;
 }
 
 /* Return whether a bit is set within a bitmap.  */
@@ -692,6 +1017,7 @@ bitmap_count_bits (const_bitmap a)
   unsigned long count = 0;
   const bitmap_element *elt;
 
+  gcc_assert (!a->tree_form);
   for (elt = a->first; elt; elt = elt->next)
     count += bitmap_count_bits_in_word (elt->bits);
 
@@ -748,9 +1074,11 @@ bitmap_single_bit_set_p (const_bitmap a)
     return false;
 
   elt = a->first;
+
   /* As there are no completely empty bitmap elements, a second one
      means we have more than one bit set.  */
-  if (elt->next != NULL)
+  if (elt->next != NULL
+      && (!a->tree_form || elt->prev != NULL))
     return false;
 
   for (ix = 0; ix != BITMAP_ELEMENT_WORDS; ix++)
@@ -782,6 +1110,11 @@ bitmap_first_set_bit (const_bitmap a)
   unsigned ix;
 
   gcc_checking_assert (elt);
+
+  if (a->tree_form)
+    while (elt->prev)
+      elt = elt->prev;
+
   bit_no = elt->indx * BITMAP_ELEMENT_ALL_BITS;
   for (ix = 0; ix != BITMAP_ELEMENT_WORDS; ix++)
     {
@@ -827,14 +1160,20 @@ bitmap_first_set_bit (const_bitmap a)
 unsigned
 bitmap_last_set_bit (const_bitmap a)
 {
-  const bitmap_element *elt = a->current ? a->current : a->first;
+  const bitmap_element *elt;
   unsigned bit_no;
   BITMAP_WORD word;
   int ix;
 
+  if (a->tree_form)
+    elt = a->first;
+  else
+    elt = a->current ? a->current : a->first;
   gcc_checking_assert (elt);
+
   while (elt->next)
     elt = elt->next;
+
   bit_no = elt->indx * BITMAP_ELEMENT_ALL_BITS;
   for (ix = BITMAP_ELEMENT_WORDS - 1; ix >= 0; ix--)
     {
@@ -876,6 +1215,7 @@ bitmap_and (bitmap dst, const_bitmap a,
   const bitmap_element *b_elt = b->first;
   bitmap_element *dst_prev = NULL;
 
+  gcc_assert (!dst->tree_form && !a->tree_form && !b->tree_form);
   gcc_assert (dst != a && dst != b);
 
   if (a == b)
@@ -897,7 +1237,8 @@ bitmap_and (bitmap dst, const_bitmap a,
 	  BITMAP_WORD ior = 0;
 
 	  if (!dst_elt)
-	    dst_elt = bitmap_elt_insert_after (dst, dst_prev, a_elt->indx);
+	    dst_elt = bitmap_list_insert_element_after (dst, dst_prev,
+							a_elt->indx);
 	  else
 	    dst_elt->indx = a_elt->indx;
 	  for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++)
@@ -934,6 +1275,8 @@ bitmap_and_into (bitmap a, const_bitmap
   bitmap_element *next;
   bool changed = false;
 
+  gcc_assert (!a->tree_form && !b->tree_form);
+
   if (a == b)
     return false;
 
@@ -942,7 +1285,7 @@ bitmap_and_into (bitmap a, const_bitmap
       if (a_elt->indx < b_elt->indx)
 	{
 	  next = a_elt->next;
-	  bitmap_element_free (a, a_elt);
+	  bitmap_list_unlink_element (a, a_elt);
 	  a_elt = next;
 	  changed = true;
 	}
@@ -964,7 +1307,7 @@ bitmap_and_into (bitmap a, const_bitmap
 	    }
 	  next = a_elt->next;
 	  if (!ior)
-	    bitmap_element_free (a, a_elt);
+	    bitmap_list_unlink_element (a, a_elt);
 	  a_elt = next;
 	  b_elt = b_elt->next;
 	}
@@ -1006,7 +1349,8 @@ bitmap_elt_copy (bitmap dst, bitmap_elem
     {
       changed = true;
       if (!dst_elt)
-	dst_elt = bitmap_elt_insert_after (dst, dst_prev, src_elt->indx);
+	dst_elt = bitmap_list_insert_element_after (dst, dst_prev,
+						    src_elt->indx);
       else
 	dst_elt->indx = src_elt->indx;
       memcpy (dst_elt->bits, src_elt->bits, sizeof (dst_elt->bits));
@@ -1028,6 +1372,7 @@ bitmap_and_compl (bitmap dst, const_bitm
   bitmap_element **dst_prev_pnext = &dst->first;
   bool changed = false;
 
+  gcc_assert (!dst->tree_form && !a->tree_form && !b->tree_form);
   gcc_assert (dst != a && dst != b);
 
   if (a == b)
@@ -1076,7 +1421,8 @@ bitmap_and_compl (bitmap dst, const_bitm
 	      bool new_element;
 	      if (!dst_elt || dst_elt->indx > a_elt->indx)
 		{
-		  dst_elt = bitmap_elt_insert_after (dst, dst_prev, a_elt->indx);
+		  dst_elt = bitmap_list_insert_element_after (dst, dst_prev,
+							      a_elt->indx);
 		  new_element = true;
 		}
 	      else
@@ -1098,7 +1444,7 @@ bitmap_and_compl (bitmap dst, const_bitm
 	      else
 	        {
 	          changed |= !new_element;
-		  bitmap_element_free (dst, dst_elt);
+		  bitmap_list_unlink_element (dst, dst_elt);
 		  dst_elt = *dst_prev_pnext;
 		}
 	    }
@@ -1139,6 +1485,8 @@ bitmap_and_compl_into (bitmap a, const_b
   bitmap_element *next;
   BITMAP_WORD changed = 0;
 
+  gcc_assert (!a->tree_form && !b->tree_form);
+
   if (a == b)
     {
       if (bitmap_empty_p (a))
@@ -1173,7 +1521,7 @@ bitmap_and_compl_into (bitmap a, const_b
 	    }
 	  next = a_elt->next;
 	  if (!ior)
-	    bitmap_element_free (a, a_elt);
+	    bitmap_list_unlink_element (a, a_elt);
 	  a_elt = next;
 	  b_elt = b_elt->next;
 	}
@@ -1191,6 +1539,8 @@ bitmap_set_range (bitmap head, unsigned
   bitmap_element *elt, *elt_prev;
   unsigned int i;
 
+  gcc_assert (!head->tree_form);
+
   if (!count)
     return;
 
@@ -1213,7 +1563,7 @@ bitmap_set_range (bitmap head, unsigned
     {
       elt = bitmap_element_allocate (head);
       elt->indx = first_index;
-      bitmap_element_link (head, elt);
+      bitmap_list_link_element (head, elt);
     }
 
   gcc_checking_assert (elt->indx == first_index);
@@ -1230,7 +1580,7 @@ bitmap_set_range (bitmap head, unsigned
       unsigned int ix;
 
       if (!elt || elt->indx != i)
-	elt = bitmap_elt_insert_after (head, elt_prev, i);
+	elt = bitmap_list_insert_element_after (head, elt_prev, i);
 
       if (elt_start_bit <= start)
 	{
@@ -1296,6 +1646,8 @@ bitmap_clear_range (bitmap head, unsigne
   unsigned int first_index, end_bit_plus1, last_index;
   bitmap_element *elt;
 
+  gcc_assert (!head->tree_form);
+
   if (!count)
     return;
 
@@ -1339,7 +1691,7 @@ bitmap_clear_range (bitmap head, unsigne
 
       if (elt_start_bit >= start && elt_end_bit_plus1 <= end_bit_plus1)
 	/* Get rid of the entire elt and go to the next one.  */
-	bitmap_element_free (head, elt);
+	bitmap_list_unlink_element (head, elt);
       else
 	{
 	  /* Going to have to knock out some bits in this elt.  */
@@ -1409,7 +1761,7 @@ bitmap_clear_range (bitmap head, unsigne
 	      }
 	  /* Check to see if there are any bits left.  */
 	  if (clear)
-	    bitmap_element_free (head, elt);
+	    bitmap_list_unlink_element (head, elt);
 	}
       elt = next_elt;
     }
@@ -1431,6 +1783,7 @@ bitmap_compl_and_into (bitmap a, const_b
   bitmap_element *a_prev = NULL;
   bitmap_element *next;
 
+  gcc_assert (!a->tree_form && !b->tree_form);
   gcc_assert (a != b);
 
   if (bitmap_empty_p (a))
@@ -1451,13 +1804,13 @@ bitmap_compl_and_into (bitmap a, const_b
 	  /* A is before B.  Remove A */
 	  next = a_elt->next;
 	  a_prev = a_elt->prev;
-	  bitmap_element_free (a, a_elt);
+	  bitmap_list_unlink_element (a, a_elt);
 	  a_elt = next;
 	}
       else if (!a_elt || b_elt->indx < a_elt->indx)
 	{
 	  /* B is before A.  Copy B. */
-	  next = bitmap_elt_insert_after (a, a_prev, b_elt->indx);
+	  next = bitmap_list_insert_element_after (a, a_prev, b_elt->indx);
 	  memcpy (next->bits, b_elt->bits, sizeof (next->bits));
 	  a_prev = next;
 	  b_elt = b_elt->next;
@@ -1478,7 +1831,7 @@ bitmap_compl_and_into (bitmap a, const_b
 	    }
 	  next = a_elt->next;
 	  if (!ior)
-	    bitmap_element_free (a, a_elt);
+	    bitmap_list_unlink_element (a, a_elt);
 	  else
 	    a_prev = a_elt;
 	  a_elt = next;
@@ -1523,7 +1876,8 @@ bitmap_elt_ior (bitmap dst, bitmap_eleme
 	{
 	  changed = true;
 	  if (!dst_elt)
-	    dst_elt = bitmap_elt_insert_after (dst, dst_prev, a_elt->indx);
+	    dst_elt = bitmap_list_insert_element_after (dst, dst_prev,
+							a_elt->indx);
 	  else
 	    dst_elt->indx = a_elt->indx;
 	  for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++)
@@ -1562,6 +1916,7 @@ bitmap_ior (bitmap dst, const_bitmap a,
   bitmap_element **dst_prev_pnext = &dst->first;
   bool changed = false;
 
+  gcc_assert (!dst->tree_form && !a->tree_form && !b->tree_form);
   gcc_assert (dst != a && dst != b);
 
   while (a_elt || b_elt)
@@ -1610,6 +1965,7 @@ bitmap_ior_into (bitmap a, const_bitmap
   bitmap_element **a_prev_pnext = &a->first;
   bool changed = false;
 
+  gcc_assert (!a->tree_form && !b->tree_form);
   if (a == b)
     return false;
 
@@ -1648,7 +2004,9 @@ bitmap_xor (bitmap dst, const_bitmap a,
   const bitmap_element *b_elt = b->first;
   bitmap_element *dst_prev = NULL;
 
+  gcc_assert (!dst->tree_form && !a->tree_form && !b->tree_form);
   gcc_assert (dst != a && dst != b);
+
   if (a == b)
     {
       bitmap_clear (dst);
@@ -1664,7 +2022,8 @@ bitmap_xor (bitmap dst, const_bitmap a,
 	  BITMAP_WORD ior = 0;
 
 	  if (!dst_elt)
-	    dst_elt = bitmap_elt_insert_after (dst, dst_prev, a_elt->indx);
+	    dst_elt = bitmap_list_insert_element_after (dst, dst_prev,
+							a_elt->indx);
 	  else
 	    dst_elt->indx = a_elt->indx;
 	  for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++)
@@ -1699,7 +2058,8 @@ bitmap_xor (bitmap dst, const_bitmap a,
 	    }
 
 	  if (!dst_elt)
-	    dst_elt = bitmap_elt_insert_after (dst, dst_prev, src->indx);
+	    dst_elt = bitmap_list_insert_element_after (dst, dst_prev,
+							src->indx);
 	  else
 	    dst_elt->indx = src->indx;
 	  memcpy (dst_elt->bits, src->bits, sizeof (dst_elt->bits));
@@ -1724,6 +2084,8 @@ bitmap_xor_into (bitmap a, const_bitmap
   const bitmap_element *b_elt = b->first;
   bitmap_element *a_prev = NULL;
 
+  gcc_assert (!a->tree_form && !b->tree_form);
+
   if (a == b)
     {
       bitmap_clear (a);
@@ -1735,7 +2097,8 @@ bitmap_xor_into (bitmap a, const_bitmap
       if (!a_elt || b_elt->indx < a_elt->indx)
 	{
 	  /* Copy b_elt.  */
-	  bitmap_element *dst = bitmap_elt_insert_after (a, a_prev, b_elt->indx);
+	  bitmap_element *dst = bitmap_list_insert_element_after (a, a_prev,
+								  b_elt->indx);
 	  memcpy (dst->bits, b_elt->bits, sizeof (dst->bits));
 	  a_prev = dst;
 	  b_elt = b_elt->next;
@@ -1763,7 +2126,7 @@ bitmap_xor_into (bitmap a, const_bitmap
 	  if (ior)
 	    a_prev = a_elt;
 	  else
-	    bitmap_element_free (a, a_elt);
+	    bitmap_list_unlink_element (a, a_elt);
 	  a_elt = next;
 	}
     }
@@ -1783,6 +2146,8 @@ bitmap_equal_p (const_bitmap a, const_bi
   const bitmap_element *b_elt;
   unsigned ix;
 
+  gcc_assert (!a->tree_form && !b->tree_form);
+
   for (a_elt = a->first, b_elt = b->first;
        a_elt && b_elt;
        a_elt = a_elt->next, b_elt = b_elt->next)
@@ -1805,6 +2170,8 @@ bitmap_intersect_p (const_bitmap a, cons
   const bitmap_element *b_elt;
   unsigned ix;
 
+  gcc_assert (!a->tree_form && !b->tree_form);
+
   for (a_elt = a->first, b_elt = b->first;
        a_elt && b_elt;)
     {
@@ -1832,6 +2199,9 @@ bitmap_intersect_compl_p (const_bitmap a
   const bitmap_element *a_elt;
   const bitmap_element *b_elt;
   unsigned ix;
+
+  gcc_assert (!a->tree_form && !b->tree_form);
+
   for (a_elt = a->first, b_elt = b->first;
        a_elt && b_elt;)
     {
@@ -1866,6 +2236,8 @@ bitmap_ior_and_compl (bitmap dst, const_
   bitmap_element *dst_prev = NULL;
   bitmap_element **dst_prev_pnext = &dst->first;
 
+  gcc_assert (!dst->tree_form && !a->tree_form && !b->tree_form
+	      && !kill->tree_form);
   gcc_assert (dst != a && dst != b && dst != kill);
 
   /* Special cases.  We don't bother checking for bitmap_equal_p (b, kill).  */
@@ -1958,16 +2330,18 @@ bitmap_ior_and_compl (bitmap dst, const_
   return changed;
 }
 
-/* A |= (FROM1 & ~FROM2).  Return true if A changes.  */
+/* A |= (B & ~C).  Return true if A changes.  */
 
 bool
-bitmap_ior_and_compl_into (bitmap a, const_bitmap from1, const_bitmap from2)
+bitmap_ior_and_compl_into (bitmap a, const_bitmap b, const_bitmap c)
 {
   bitmap_head tmp;
   bool changed;
 
+  gcc_assert (!a->tree_form && !b->tree_form && !c->tree_form);
+
   bitmap_initialize (&tmp, &bitmap_default_obstack);
-  bitmap_and_compl (&tmp, from1, from2);
+  bitmap_and_compl (&tmp, b, c);
   changed = bitmap_ior_into (a, &tmp);
   bitmap_clear (&tmp);
 
@@ -1988,6 +2362,8 @@ bitmap_ior_and_into (bitmap a, const_bit
   bool changed = false;
   unsigned ix;
 
+  gcc_assert (!a->tree_form && !b->tree_form && !c->tree_form);
+
   if (b == c)
     return bitmap_ior_into (a, b);
   if (bitmap_empty_p (b) || bitmap_empty_p (c))
@@ -2054,6 +2430,7 @@ bitmap_ior_and_into (bitmap a, const_bit
 }
 
 /* Compute hash of bitmap (for purposes of hashing).  */
+
 hashval_t
 bitmap_hash (const_bitmap head)
 {
@@ -2061,6 +2438,8 @@ bitmap_hash (const_bitmap head)
   BITMAP_WORD hash = 0;
   int ix;
 
+  gcc_assert (!head->tree_form);
+
   for (ptr = head->first; ptr; ptr = ptr->next)
     {
       hash ^= ptr->indx;
@@ -2071,10 +2450,67 @@ bitmap_hash (const_bitmap head)
 }
 
 \f
+/* Function to obtain a vector of bitmap elements in bit order from
+   HEAD in tree view.  */
+
+static void
+bitmap_tree_to_vec (vec<bitmap_element *> &elts, const_bitmap head)
+{
+  gcc_assert (head->tree_form);
+  auto_vec<bitmap_element *, 32> stack;
+  bitmap_element *e = head->first;
+  while (e)
+    {
+      stack.safe_push (e);
+      e = e->prev;
+    }
+  while (!stack.is_empty ())
+    {
+      bitmap_element *e = stack.pop ();
+      elts.safe_push (e);
+      e = e->next;
+      while (e)
+	{
+	  stack.safe_push (e);
+	  e = e->prev;
+	}
+    }
+}
+
+/* Debugging function to print out the contents of a bitmap element.  */
+
+DEBUG_FUNCTION void
+debug_bitmap_elt_file (FILE *file, const bitmap_element *ptr)
+{
+  unsigned int i, j, col = 26;
+
+  fprintf (file, "\t" HOST_PTR_PRINTF " next = " HOST_PTR_PRINTF
+	   " prev = " HOST_PTR_PRINTF " indx = %u\n\t\tbits = {",
+	   (const void*) ptr, (const void*) ptr->next,
+	   (const void*) ptr->prev, ptr->indx);
+
+  for (i = 0; i < BITMAP_ELEMENT_WORDS; i++)
+    for (j = 0; j < BITMAP_WORD_BITS; j++)
+      if ((ptr->bits[i] >> j) & 1)
+	{
+	  if (col > 70)
+	    {
+	      fprintf (file, "\n\t\t\t");
+	      col = 24;
+	    }
+
+	  fprintf (file, " %u", (ptr->indx * BITMAP_ELEMENT_ALL_BITS
+				 + i * BITMAP_WORD_BITS + j));
+	  col += 4;
+	}
+
+  fprintf (file, " }\n");
+}
+
 /* Debugging function to print out the contents of a bitmap.  */
 
 DEBUG_FUNCTION void
-debug_bitmap_file (FILE *file, const_bitmap head)
+debug_bitmap_file (FILE *file, bitmap head)
 {
   const bitmap_element *ptr;
 
@@ -2082,39 +2518,23 @@ debug_bitmap_file (FILE *file, const_bit
 	   " current = " HOST_PTR_PRINTF " indx = %u\n",
 	   (void *) head->first, (void *) head->current, head->indx);
 
-  for (ptr = head->first; ptr; ptr = ptr->next)
+  if (head->tree_form)
     {
-      unsigned int i, j, col = 26;
-
-      fprintf (file, "\t" HOST_PTR_PRINTF " next = " HOST_PTR_PRINTF
-	       " prev = " HOST_PTR_PRINTF " indx = %u\n\t\tbits = {",
-	       (const void*) ptr, (const void*) ptr->next,
-	       (const void*) ptr->prev, ptr->indx);
-
-      for (i = 0; i < BITMAP_ELEMENT_WORDS; i++)
-	for (j = 0; j < BITMAP_WORD_BITS; j++)
-	  if ((ptr->bits[i] >> j) & 1)
-	    {
-	      if (col > 70)
-		{
-		  fprintf (file, "\n\t\t\t");
-		  col = 24;
-		}
-
-	      fprintf (file, " %u", (ptr->indx * BITMAP_ELEMENT_ALL_BITS
-				     + i * BITMAP_WORD_BITS + j));
-	      col += 4;
-	    }
-
-      fprintf (file, " }\n");
+      auto_vec<bitmap_element *, 32> elts;
+      bitmap_tree_to_vec (elts, head);
+      for (unsigned i = 0; i < elts.length (); ++i)
+	debug_bitmap_elt_file (file, elts[i]);
     }
+  else
+    for (ptr = head->first; ptr; ptr = ptr->next)
+      debug_bitmap_elt_file (file, ptr);
 }
 
 /* Function to be called from the debugger to print the contents
    of a bitmap.  */
 
 DEBUG_FUNCTION void
-debug_bitmap (const_bitmap head)
+debug_bitmap (bitmap head)
 {
   debug_bitmap_file (stderr, head);
 }
@@ -2123,18 +2543,39 @@ debug_bitmap (const_bitmap head)
    it does not print anything but the bits.  */
 
 DEBUG_FUNCTION void
-bitmap_print (FILE *file, const_bitmap head, const char *prefix,
+bitmap_print (FILE *file, bitmap head, const char *prefix,
 	      const char *suffix)
 {
   const char *comma = "";
   unsigned i;
-  bitmap_iterator bi;
 
   fputs (prefix, file);
-  EXECUTE_IF_SET_IN_BITMAP (head, 0, i, bi)
+  if (head->tree_form)
+    {
+      auto_vec<bitmap_element *, 32> elts;
+      bitmap_tree_to_vec (elts, head);
+      for (i = 0; i < elts.length (); ++i)
+	for (unsigned ix = 0; ix != BITMAP_ELEMENT_WORDS; ++ix)
+	  {
+	    BITMAP_WORD word = elts[i]->bits[ix];
+	    for (unsigned bit = 0; bit != BITMAP_WORD_BITS; ++bit)
+	      if (word & ((BITMAP_WORD)1 << bit))
+		{
+		  fprintf (file, "%s%d", comma,
+			   (bit + BITMAP_WORD_BITS * ix
+			    + elts[i]->indx * BITMAP_ELEMENT_ALL_BITS));
+		  comma = ", ";
+		}
+	  }
+    }
+  else
     {
-      fprintf (file, "%s%d", comma, i);
-      comma = ", ";
+      bitmap_iterator bi;
+      EXECUTE_IF_SET_IN_BITMAP (head, 0, i, bi)
+	{
+	  fprintf (file, "%s%d", comma, i);
+	  comma = ", ";
+	}
     }
   fputs (suffix, file);
 }
@@ -2150,13 +2591,13 @@ dump_bitmap_statistics (void)
 }
 
 DEBUG_FUNCTION void
-debug (const bitmap_head &ref)
+debug (bitmap_head &ref)
 {
   dump_bitmap (stderr, &ref);
 }
 
 DEBUG_FUNCTION void
-debug (const bitmap_head *ptr)
+debug (bitmap_head *ptr)
 {
   if (ptr)
     debug (*ptr);
Index: gcc/bitmap.h
===================================================================
--- gcc/bitmap.h	(revision 265259)
+++ gcc/bitmap.h	(working copy)
@@ -20,16 +20,21 @@ along with GCC; see the file COPYING3.
 #ifndef GCC_BITMAP_H
 #define GCC_BITMAP_H
 
-/* Implementation of sparse integer sets as a linked list.
+/* Implementation of sparse integer sets as a linked list or tree.
 
    This sparse set representation is suitable for sparse sets with an
-   unknown (a priori) universe.  The set is represented as a double-linked
-   list of container nodes (struct bitmap_element).  Each node consists
-   of an index for the first member that could be held in the container,
-   a small array of integers that represent the members in the container,
-   and pointers to the next and previous element in the linked list.  The
-   elements in the list are sorted in ascending order, i.e. the head of
+   unknown (a priori) universe.
+
+   Sets are represented as double-linked lists of container nodes of
+   type "struct bitmap_element" or as a binary trees of the same
+   container nodes.  Each container node consists of an index for the
+   first member that could be held in the container, a small array of
+   integers that represent the members in the container, and pointers
+   to the next and previous element in the linked list, or left and
+   right children in the tree.  In linked-list form, the container
+   nodes in the list are sorted in ascending order, i.e. the head of
    the list holds the element with the smallest member of the set.
+   In tree form, nodes to the left have a smaller container index.
 
    For a given member I in the set:
      - the element for I will have index is I / (bits per element)
@@ -42,34 +47,97 @@ along with GCC; see the file COPYING3.
    high storage overhead *per element*, but a small overall overhead if
    the set is very sparse.
 
-   The downside is that many operations are relatively slow because the
-   linked list has to be traversed to test membership (i.e. member_p/
-   add_member/remove_member).  To improve the performance of this set
-   representation, the last accessed element and its index are cached.
-   For membership tests on members close to recently accessed members,
-   the cached last element improves membership test to a constant-time
-   operation.
+   The storage requirements for linked-list sparse sets are O(E), with E->N
+   in the worst case (a sparse set with large distances between the values
+   of the set members).
 
-   The following operations can always be performed in O(1) time:
+   This representation also works well for data flow problems where the size
+   of the set may grow dynamically, but care must be taken that the member_p,
+   add_member, and remove_member operations occur with a suitable access
+   pattern.
+
+   The linked-list set representation works well for problems involving very
+   sparse sets.  The canonical example in GCC is, of course, the "set of
+   sets" for some CFG-based data flow problems (liveness analysis, dominance
+   frontiers, etc.).
+   
+   For random-access sparse sets of unknown universe, the binary tree
+   representation is likely to be a more suitable choice.  Theoretical
+   access times for the binary tree representation are better than those
+   for the linked-list, but in practice this is only true for truely
+   random access.
+
+   Often the most suitable representation during construction of the set
+   is not the best choice for the usage of the set.  For such cases, the
+   "view" of the set can be changed from one representation to the other.
+   This is an O(E) operation:
+
+     * from list to tree view	: bitmap_tree_view
+     * from tree to list view	: bitmap_list_view
+
+   Traversing linked lists or trees can be cache-unfriendly.  Performance
+   can be improved by keeping container nodes in the set grouped together
+   in  memory, using a dedicated obstack for a set (or group of related
+   sets).  Elements allocated on obstacks are released to a free-list and
+   taken off the free list.  If multiple sets are allocated on the same
+   obstack, elements freed from one set may be re-used for one of the other
+   sets.  This usually helps avoid cache misses.
+
+   A single free-list is used for all sets allocated in GGC space.  This is
+   bad for persistent sets, so persistent sets should be allocated on an
+   obstack whenever possible.
+
+   For random-access sets with a known, relatively small universe size, the
+   SparseSet or simple bitmap representations may be more efficient than a
+   linked-list set.
+
+
+   LINKED LIST FORM
+   ================
+
+   In linked-list form, in-order iterations of the set can be executed
+   efficiently.  The downside is that many random-access operations are
+   relatively slow, because the linked list has to be traversed to test
+   membership (i.e. member_p/ add_member/remove_member).
+   
+   To improve the performance of this set representation, the last
+   accessed element and its index are cached.  For membership tests on
+   members close to recently accessed members, the cached last element
+   improves membership test to a constant-time operation.
+
+   The following operations can always be performed in O(1) time in
+   list view:
 
      * clear			: bitmap_clear
+     * smallest_member		: bitmap_first_set_bit
      * choose_one		: (not implemented, but could be
-				   implemented in constant time)
+				   in constant time)
 
-   The following operations can be performed in O(E) time worst-case (with
-   E the number of elements in the linked list), but in O(1) time with a
-   suitable access patterns:
+   The following operations can be performed in O(E) time worst-case in
+   list view (with E the number of elements in the linked list), but in
+   O(1) time with a suitable access patterns:
 
      * member_p			: bitmap_bit_p
-     * add_member		: bitmap_set_bit
-     * remove_member		: bitmap_clear_bit
+     * add_member		: bitmap_set_bit / bitmap_set_range
+     * remove_member		: bitmap_clear_bit / bitmap_clear_range
 
-   The following operations can be performed in O(E) time:
+   The following operations can be performed in O(E) time in list view:
 
      * cardinality		: bitmap_count_bits
-     * set_size			: bitmap_last_set_bit (but this could
+     * largest_member		: bitmap_last_set_bit (but this could
 				  in constant time with a pointer to
 				  the last element in the chain)
+     * set_size			: bitmap_last_set_bit
+
+   In tree view the following operations can all be performed in O(log E)
+   amortized time with O(E) worst-case behavior.
+
+     * smallest_member
+     * largest_member
+     * set_size
+     * member_p
+     * add_member
+     * remove_member
 
    Additionally, the linked-list sparse set representation supports
    enumeration of the members in O(E) time:
@@ -93,39 +161,53 @@ along with GCC; see the file COPYING3.
      * A | (B & ~C)		: bitmap_ior_and_compl /
 				  bitmap_ior_and_compl_into
 
-   The storage requirements for linked-list sparse sets are O(E), with E->N
-   in the worst case (a sparse set with large distances between the values
-   of the set members).
 
-   The linked-list set representation works well for problems involving very
-   sparse sets.  The canonical example in GCC is, of course, the "set of
-   sets" for some CFG-based data flow problems (liveness analysis, dominance
-   frontiers, etc.).
+   BINARY TREE FORM
+   ================
+   An alternate "view" of a bitmap is its binary tree representation.
+   For this representation, splay trees are used because they can be
+   implemented using the same data structures as the linked list, with
+   no overhead for meta-data (like color, or rank) on the tree nodes.
+
+   In binary tree form, random-access to the set is much more efficient
+   than for the linked-list representation.  Downsides are the high cost
+   of clearing the set, and the relatively large number of operations
+   necessary to balance the tree.  Also, iterating the set members is
+   not supported.
    
-   This representation also works well for data flow problems where the size
-   of the set may grow dynamically, but care must be taken that the member_p,
-   add_member, and remove_member operations occur with a suitable access
-   pattern.
-   
-   For random-access sets with a known, relatively small universe size, the
-   SparseSet or simple bitmap representations may be more efficient than a
-   linked-list set.  For random-access sets of unknown universe, a hash table
-   or a balanced binary tree representation is likely to be a more suitable
-   choice.
+   As for the linked-list representation, the last accessed element and
+   its index are cached, so that membership tests on the latest accessed
+   members is a constant-time operation.  Other lookups take O(logE)
+   time amortized (but O(E) time worst-case).
 
-   Traversing linked lists is usually cache-unfriendly, even with the last
-   accessed element cached.
-   
-   Cache performance can be improved by keeping the elements in the set
-   grouped together in memory, using a dedicated obstack for a set (or group
-   of related sets).  Elements allocated on obstacks are released to a
-   free-list and taken off the free list.  If multiple sets are allocated on
-   the same obstack, elements freed from one set may be re-used for one of
-   the other sets.  This usually helps avoid cache misses.
+   The following operations can always be performed in O(1) time:
 
-   A single free-list is used for all sets allocated in GGC space.  This is
-   bad for persistent sets, so persistent sets should be allocated on an
-   obstack whenever possible.  */
+     * choose_one		: (not implemented, but could be
+				   implemented in constant time)
+
+   The following operations can be performed in O(logE) time amortized
+   but O(E) time worst-case, but in O(1) time if the same element is
+   accessed.
+
+     * member_p			: bitmap_bit_p
+     * add_member		: bitmap_set_bit
+     * remove_member		: bitmap_clear_bit
+
+   The following operations can be performed in O(logE) time amortized
+   but O(E) time worst-case:
+
+     * smallest_member		: bitmap_first_set_bit
+     * largest_member		: bitmap_last_set_bit
+     * set_size			: bitmap_last_set_bit
+
+   The following operations can be performed in O(E) time:
+
+     * clear			: bitmap_clear
+
+   The binary tree sparse set representation does *not* support any form
+   of enumeration, and does also *not* support logical operations on sets.
+   The binary tree representation is only supposed to be used for sets
+   on which many random-access membership tests will happen.  */
 
 #include "obstack.h"
 
@@ -225,24 +307,34 @@ struct GTY (()) bitmap_obstack {
    linear in the number of elements to be freed.  */
 
 struct GTY((chain_next ("%h.next"), chain_prev ("%h.prev"))) bitmap_element {
-  struct bitmap_element *next;	/* Next element.  */
-  struct bitmap_element *prev;	/* Previous element.  */
-  unsigned int indx;			/* regno/BITMAP_ELEMENT_ALL_BITS.  */
-  BITMAP_WORD bits[BITMAP_ELEMENT_WORDS]; /* Bits that are set.  */
+  /* In list form, the next element in the linked list;
+     in tree form, the left child node in the tree.  */
+  struct bitmap_element *next;
+  /* In list form, the previous element in the linked list;
+     in tree form, the right child node in the tree.  */
+  struct bitmap_element *prev;
+  /* regno/BITMAP_ELEMENT_ALL_BITS.  */
+  unsigned int indx;
+  /* Bits that are set, counting from INDX, inclusive  */
+  BITMAP_WORD bits[BITMAP_ELEMENT_WORDS];
 };
 
 /* Head of bitmap linked list.  The 'current' member points to something
    already pointed to by the chain started by first, so GTY((skip)) it.  */
 
 struct GTY(()) bitmap_head {
-  unsigned int indx;			/* Index of last element looked at.  */
-  unsigned int descriptor_id;		/* Unique identifier for the allocation
-					   site of this bitmap, for detailed
-					   statistics gathering.  */
-  bitmap_element *first;		/* First element in linked list.  */
-  bitmap_element * GTY((skip(""))) current; /* Last element looked at.  */
-  bitmap_obstack *obstack;		/* Obstack to allocate elements from.
-					   If NULL, then use GGC allocation.  */
+  /* Index of last element looked at.  */
+  unsigned int indx;
+  /* False if the bitmap is in list form; true if the bitmap is in tree form.
+     Bitmap iterators only work on bitmaps in list form.  */
+  bool tree_form;
+  /* In list form, the first element in the linked list;
+     in tree form, the root of the tree.   */
+  bitmap_element *first;
+  /* Last element looked at.  */
+  bitmap_element * GTY((skip(""))) current;
+  /* Obstack to allocate elements from.  If NULL, then use GGC allocation.  */
+  bitmap_obstack *obstack;
   void dump ();
 };
 
@@ -250,6 +342,10 @@ struct GTY(()) bitmap_head {
 extern bitmap_element bitmap_zero_bits;	/* Zero bitmap element */
 extern bitmap_obstack bitmap_default_obstack;   /* Default bitmap obstack */
 
+/* Change the view of the bitmap to list, or tree.  */
+void bitmap_list_view (bitmap);
+void bitmap_tree_view (bitmap);
+
 /* Clear a bitmap by freeing up the linked list.  */
 extern void bitmap_clear (bitmap);
 
@@ -316,15 +412,15 @@ extern bool bitmap_clear_bit (bitmap, in
 /* Set a single bit in a bitmap.  Return true if the bit changed.  */
 extern bool bitmap_set_bit (bitmap, int);
 
-/* Return true if a register is set in a register set.  */
+/* Return true if a bit is set in a bitmap.  */
 extern int bitmap_bit_p (bitmap, int);
 
-/* Debug functions to print a bitmap linked list.  */
-extern void debug_bitmap (const_bitmap);
-extern void debug_bitmap_file (FILE *, const_bitmap);
+/* Debug functions to print a bitmap.  */
+extern void debug_bitmap (bitmap);
+extern void debug_bitmap_file (FILE *, bitmap);
 
 /* Print a bitmap.  */
-extern void bitmap_print (FILE *, const_bitmap, const char *, const char *);
+extern void bitmap_print (FILE *, bitmap, const char *, const char *);
 
 /* Initialize and release a bitmap obstack.  */
 extern void bitmap_obstack_initialize (bitmap_obstack *);
@@ -339,6 +435,7 @@ static inline void
 bitmap_initialize (bitmap head, bitmap_obstack *obstack CXX_MEM_STAT_INFO)
 {
   head->first = head->current = NULL;
+  head->indx = head->tree_form = 0;
   head->obstack = obstack;
   if (GATHER_STATISTICS)
     bitmap_register (head PASS_MEM_STAT);
@@ -352,12 +449,12 @@ extern bitmap bitmap_gc_alloc (ALONE_CXX
 extern void bitmap_obstack_free (bitmap);
 
 /* A few compatibility/functions macros for compatibility with sbitmaps */
-inline void dump_bitmap (FILE *file, const_bitmap map)
+inline void dump_bitmap (FILE *file, bitmap map)
 {
   bitmap_print (file, map, "", "\n");
 }
-extern void debug (const bitmap_head &ref);
-extern void debug (const bitmap_head *ptr);
+extern void debug (bitmap_head &ref);
+extern void debug (bitmap_head *ptr);
 
 extern unsigned bitmap_first_set_bit (const_bitmap);
 extern unsigned bitmap_last_set_bit (const_bitmap);
@@ -398,6 +495,8 @@ bmp_iter_set_init (bitmap_iterator *bi,
   bi->elt1 = map->first;
   bi->elt2 = NULL;
 
+  gcc_assert (!map->tree_form);
+
   /* Advance elt1 until it is not before the block containing start_bit.  */
   while (1)
     {
@@ -440,6 +539,8 @@ bmp_iter_and_init (bitmap_iterator *bi,
   bi->elt1 = map1->first;
   bi->elt2 = map2->first;
 
+  gcc_assert (!map1->tree_form && !map2->tree_form);
+
   /* Advance elt1 until it is not before the block containing
      start_bit.  */
   while (1)
@@ -498,8 +599,7 @@ bmp_iter_and_init (bitmap_iterator *bi,
   *bit_no = start_bit;
 }
 
-/* Initialize an iterator to iterate over the bits in MAP1 & ~MAP2.
-   */
+/* Initialize an iterator to iterate over the bits in MAP1 & ~MAP2.  */
 
 static inline void
 bmp_iter_and_compl_init (bitmap_iterator *bi,
@@ -509,6 +609,8 @@ bmp_iter_and_compl_init (bitmap_iterator
   bi->elt1 = map1->first;
   bi->elt2 = map2->first;
 
+  gcc_assert (!map1->tree_form && !map2->tree_form);
+
   /* Advance elt1 until it is not before the block containing start_bit.  */
   while (1)
     {
Index: gcc/tree-ssa-propagate.c
===================================================================
--- gcc/tree-ssa-propagate.c	(revision 265259)
+++ gcc/tree-ssa-propagate.c	(working copy)
@@ -381,6 +381,8 @@ ssa_prop_init (void)
   /* Worklists of SSA edges.  */
   ssa_edge_worklist = BITMAP_ALLOC (NULL);
   ssa_edge_worklist_back = BITMAP_ALLOC (NULL);
+  bitmap_tree_view (ssa_edge_worklist);
+  bitmap_tree_view (ssa_edge_worklist_back);
 
   /* Worklist of basic-blocks.  */
   bb_to_cfg_order = XNEWVEC (int, last_basic_block_for_fn (cfun) + 1);
Index: gcc/tree-ssa-coalesce.c
===================================================================
--- gcc/tree-ssa-coalesce.c	(revision 265259)
+++ gcc/tree-ssa-coalesce.c	(working copy)
@@ -1703,9 +1709,11 @@ coalesce_ssa_name (var_map map)
   coalesce_list *cl;
   auto_bitmap used_in_copies;
 
+  bitmap_tree_view (used_in_copies);
   cl = create_coalesce_list_for_region (map, used_in_copies);
   if (map->outofssa_p)
     populate_coalesce_list_for_outofssa (cl, used_in_copies);
+  bitmap_list_view (used_in_copies);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     dump_var_map (dump_file, map);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add splay-tree "view" for bitmap
  2018-10-18 13:58 [PATCH] Add splay-tree "view" for bitmap Richard Biener
@ 2018-10-18 15:43 ` Richard Sandiford
  2018-10-18 15:46   ` Richard Biener
  2018-10-18 15:53 ` David Malcolm
  2018-10-18 18:01 ` Jeff Law
  2 siblings, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2018-10-18 15:43 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, stevenb.gcc

Richard Biener <rguenther@suse.de> writes:
> PR63155 made me pick up this old work from Steven, it turns our
> linked-list implementation to a two-mode one with one being a
> splay tree featuring O(log N) complexity for find/remove.
>
> Over Stevens original patch I added a bitmap_tree_to_vec helper
> that I use from the debug/print methods to avoid changing view
> there.  In theory the bitmap iterator could get a "stack"
> as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.
>
> This can be used to fix the two biggest bottlenecks in the PRs
> testcase, namely SSA propagator worklist handling and out-of-SSA
> coalesce list building.  perf shows the following data, first
> unpatched, second patched - also watch the thrid coulumn (samples)
> when comparing percentages.
>
> -O0
> -   18.19%    17.35%           407  cc1      cc1               [.] bitmap_set_b▒
>    - bitmap_set_bit                                                            ▒
>       + 8.77% create_coalesce_list_for_region                                  ▒
>       + 4.21% calculate_live_ranges                                            ▒
>       + 2.02% build_ssa_conflict_graph                                         ▒
>       + 1.66% insert_phi_nodes_for                                             ▒
>       + 0.86% coalesce_ssa_name                      
> patched:
> -   12.39%    10.48%           129  cc1      cc1               [.] bitmap_set_b▒
>    - bitmap_set_bit                                                            ▒
>       + 5.27% calculate_live_ranges                                            ▒
>       + 2.76% insert_phi_nodes_for                                             ▒
>       + 1.90% create_coalesce_list_for_region                                  ▒
>       + 1.63% build_ssa_conflict_graph                                         ▒
>       + 0.35% coalesce_ssa_name                           
>
> -O1
> -   17.53%    17.53%           842  cc1      cc1               [.] bitmap_set_b▒
>    - bitmap_set_bit                                                            ▒
>       + 12.39% add_ssa_edge                                                    ▒
>       + 1.48% create_coalesce_list_for_region                                  ▒
>       + 0.82% solve_constraints                                                ▒
>       + 0.71% calculate_live_ranges                                            ▒
>       + 0.64% add_implicit_graph_edge                                          ▒
>       + 0.41% insert_phi_nodes_for                                             ▒
>       + 0.34% build_ssa_conflict_graph                      
> patched:
> -    5.79%     5.00%           167  cc1      cc1               [.] bitmap_set_b▒
>    - bitmap_set_bit                                                            ▒
>       + 1.41% add_ssa_edge                                                     ▒
>       + 0.88% calculate_live_ranges                                            ▒
>       + 0.75% add_implicit_graph_edge                                          ▒
>       + 0.68% solve_constraints                                                ▒
>       + 0.48% insert_phi_nodes_for                                             ▒
>       + 0.45% build_ssa_conflict_graph                   
>
> -O3
> -   12.37%    12.34%          1145  cc1      cc1               [.] bitmap_set_b▒
>    - bitmap_set_bit                                                            ▒
>       + 9.14% add_ssa_edge                                                     ▒
>       + 0.80% create_coalesce_list_for_region                                  ▒
>       + 0.69% add_implicit_graph_edge                                          ▒
>       + 0.54% solve_constraints                                                ▒
>       + 0.34% calculate_live_ranges                                            ▒
>       + 0.27% insert_phi_nodes_for                                             ▒
>       + 0.21% build_ssa_conflict_graph                 
> -    4.36%     3.86%           227  cc1      cc1               [.] bitmap_set_b▒
>    - bitmap_set_bit                                                            ▒
>       + 0.98% add_ssa_edge                                                     ▒
>       + 0.86% add_implicit_graph_edge                                          ▒
>       + 0.64% solve_constraints                                                ▒
>       + 0.57% calculate_live_ranges                                            ▒
>       + 0.32% build_ssa_conflict_graph                                         ▒
>       + 0.29% mark_all_vars_used_1                                             ▒
>       + 0.20% insert_phi_nodes_for                                             ▒
>       + 0.16% create_coalesce_list_for_region             
>
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

What's the performance like for more reasonable testcases?  E.g. is there
a measurable change either way in --enable-checking=release for some gcc
.iis at -g or -O2 -g?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add splay-tree "view" for bitmap
  2018-10-18 15:43 ` Richard Sandiford
@ 2018-10-18 15:46   ` Richard Biener
  2018-10-18 22:27     ` Richard Sandiford
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Biener @ 2018-10-18 15:46 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, stevenb.gcc

[-- Attachment #1: Type: text/plain, Size: 7456 bytes --]

On Thu, 18 Oct 2018, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > PR63155 made me pick up this old work from Steven, it turns our
> > linked-list implementation to a two-mode one with one being a
> > splay tree featuring O(log N) complexity for find/remove.
> >
> > Over Stevens original patch I added a bitmap_tree_to_vec helper
> > that I use from the debug/print methods to avoid changing view
> > there.  In theory the bitmap iterator could get a "stack"
> > as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.
> >
> > This can be used to fix the two biggest bottlenecks in the PRs
> > testcase, namely SSA propagator worklist handling and out-of-SSA
> > coalesce list building.  perf shows the following data, first
> > unpatched, second patched - also watch the thrid coulumn (samples)
> > when comparing percentages.
> >
> > -O0
> > -   18.19%    17.35%           407  cc1      cc1               [.] bitmap_set_bâ–’
> >    - bitmap_set_bit                                                            â–’
> >       + 8.77% create_coalesce_list_for_region                                  â–’
> >       + 4.21% calculate_live_ranges                                            â–’
> >       + 2.02% build_ssa_conflict_graph                                         â–’
> >       + 1.66% insert_phi_nodes_for                                             â–’
> >       + 0.86% coalesce_ssa_name                      
> > patched:
> > -   12.39%    10.48%           129  cc1      cc1               [.] bitmap_set_bâ–’
> >    - bitmap_set_bit                                                            â–’
> >       + 5.27% calculate_live_ranges                                            â–’
> >       + 2.76% insert_phi_nodes_for                                             â–’
> >       + 1.90% create_coalesce_list_for_region                                  â–’
> >       + 1.63% build_ssa_conflict_graph                                         â–’
> >       + 0.35% coalesce_ssa_name                           
> >
> > -O1
> > -   17.53%    17.53%           842  cc1      cc1               [.] bitmap_set_bâ–’
> >    - bitmap_set_bit                                                            â–’
> >       + 12.39% add_ssa_edge                                                    â–’
> >       + 1.48% create_coalesce_list_for_region                                  â–’
> >       + 0.82% solve_constraints                                                â–’
> >       + 0.71% calculate_live_ranges                                            â–’
> >       + 0.64% add_implicit_graph_edge                                          â–’
> >       + 0.41% insert_phi_nodes_for                                             â–’
> >       + 0.34% build_ssa_conflict_graph                      
> > patched:
> > -    5.79%     5.00%           167  cc1      cc1               [.] bitmap_set_bâ–’
> >    - bitmap_set_bit                                                            â–’
> >       + 1.41% add_ssa_edge                                                     â–’
> >       + 0.88% calculate_live_ranges                                            â–’
> >       + 0.75% add_implicit_graph_edge                                          â–’
> >       + 0.68% solve_constraints                                                â–’
> >       + 0.48% insert_phi_nodes_for                                             â–’
> >       + 0.45% build_ssa_conflict_graph                   
> >
> > -O3
> > -   12.37%    12.34%          1145  cc1      cc1               [.] bitmap_set_bâ–’
> >    - bitmap_set_bit                                                            â–’
> >       + 9.14% add_ssa_edge                                                     â–’
> >       + 0.80% create_coalesce_list_for_region                                  â–’
> >       + 0.69% add_implicit_graph_edge                                          â–’
> >       + 0.54% solve_constraints                                                â–’
> >       + 0.34% calculate_live_ranges                                            â–’
> >       + 0.27% insert_phi_nodes_for                                             â–’
> >       + 0.21% build_ssa_conflict_graph                 
> > -    4.36%     3.86%           227  cc1      cc1               [.] bitmap_set_bâ–’
> >    - bitmap_set_bit                                                            â–’
> >       + 0.98% add_ssa_edge                                                     â–’
> >       + 0.86% add_implicit_graph_edge                                          â–’
> >       + 0.64% solve_constraints                                                â–’
> >       + 0.57% calculate_live_ranges                                            â–’
> >       + 0.32% build_ssa_conflict_graph                                         â–’
> >       + 0.29% mark_all_vars_used_1                                             â–’
> >       + 0.20% insert_phi_nodes_for                                             â–’
> >       + 0.16% create_coalesce_list_for_region             
> >
> >
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> What's the performance like for more reasonable testcases?  E.g. is there
> a measurable change either way in --enable-checking=release for some gcc
> .iis at -g or -O2 -g?

I did a quick check on my set of cc1files (still .i, .ii ones tend to
be unusable quite quickly...).  Most of them compile too quickly
to make any difference appear other than noise.  Multi-second differences
like for PR63155 should be the exception but our O(n) bitmap
implementation really makes some parts of GCC quadratic where it
doesn't appear so.

Is there a reason you expect it to be ever slower?

Differences unpatched vs. patched for -O0 -g (first file then time):

 /suse/rguenther/src/cc1files/alias.i
-0.36user
+0.34user
 /suse/rguenther/src/cc1files/alloc-pool.i
 0.04user
 /suse/rguenther/src/cc1files/attribs.i
-0.16user
+0.08user
 /suse/rguenther/src/cc1files/auto-inc-dec.i
-0.10user
+0.09user
 /suse/rguenther/src/cc1files/bb-reorder.i
-0.18user
+0.14user
 /suse/rguenther/src/cc1files/bitmap.i
-0.13user
+0.09user
 /suse/rguenther/src/cc1files/bt-load.i
-0.26user
+0.24user
 /suse/rguenther/src/cc1files/builtins.i
-0.82user
+0.69user
... skipping similar changes ...
 /suse/rguenther/src/cc1files/cfg.i
-0.19user
+0.21user
(so there are cases of larger time)
 /suse/rguenther/src/cc1files/insn-attrtab.i
-3.61user
+3.48user
 /suse/rguenther/src/cc1files/insn-automata.i
-0.51user
+0.58user
 /suse/rguenther/src/cc1files/insn-emit.i
 2.46user
 /suse/rguenther/src/cc1files/insn-opinit.i
-0.19user
+0.30user
 /suse/rguenther/src/cc1files/insn-output.i
-0.56user
+0.69user

and at -O2 -g

 /suse/rguenther/src/cc1files/alias.i
-0.87user
+0.81user
 /suse/rguenther/src/cc1files/alloc-pool.i
-0.08user
+0.09user
 /suse/rguenther/src/cc1files/attribs.i
-0.32user
+0.33user
 /suse/rguenther/src/cc1files/auto-inc-dec.i
-0.17user
+0.24user
 /suse/rguenther/src/cc1files/bb-reorder.i
-0.77user
+0.61user
 /suse/rguenther/src/cc1files/bitmap.i
-0.47user
+0.53user
 /suse/rguenther/src/cc1files/bt-load.i
 0.55user
...
 /suse/rguenther/src/cc1files/dwarf2out.i
-5.31user
+5.30user
...

I'd say any difference is fake news.

Richard.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add splay-tree "view" for bitmap
  2018-10-18 13:58 [PATCH] Add splay-tree "view" for bitmap Richard Biener
  2018-10-18 15:43 ` Richard Sandiford
@ 2018-10-18 15:53 ` David Malcolm
  2018-10-19 13:37   ` Richard Biener
  2018-10-18 18:01 ` Jeff Law
  2 siblings, 1 reply; 15+ messages in thread
From: David Malcolm @ 2018-10-18 15:53 UTC (permalink / raw)
  To: Richard Biener, gcc-patches; +Cc: stevenb.gcc

On Thu, 2018-10-18 at 15:09 +0200, Richard Biener wrote:
> PR63155 made me pick up this old work from Steven, it turns our
> linked-list implementation to a two-mode one with one being a
> splay tree featuring O(log N) complexity for find/remove.
> 
> Over Stevens original patch I added a bitmap_tree_to_vec helper
> that I use from the debug/print methods to avoid changing view
> there.  In theory the bitmap iterator could get a "stack"
> as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.
> 
> This can be used to fix the two biggest bottlenecks in the PRs
> testcase, namely SSA propagator worklist handling and out-of-SSA
> coalesce list building.  perf shows the following data, first
> unpatched, second patched - also watch the thrid coulumn (samples)
> when comparing percentages.
> 
[...snip...]

> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> Any objections?
> 
> Thanks,
> Richard.
> 
> 2018-10-18  Steven Bosscher <steven@gcc.gnu.org>
> 	Richard Biener  <rguenther@suse.de>
> 
> 	* bitmap.h: Update data structure documentation, including a
> 	description of bitmap views as either linked-lists or splay
> trees.

[...snip...]

From a "correctness" perspective, we have some existing unit-test
coverage for bitmap via selftests in bitmap.c.  Perhaps those tests
could be generalized to verify that the two different implementations
work, and that the conversions work correctly?

e.g. currently we have:

static void
test_clear_bit_in_middle ()
{
  bitmap b = bitmap_gc_alloc ();

  /* Set b to [100..200].  */
  bitmap_set_range (b, 100, 100);
  ASSERT_EQ (100, bitmap_count_bits (b));

  /* Clear a bit in the middle.  */
  bool changed = bitmap_clear_bit (b, 150);
  ASSERT_TRUE (changed);
  ASSERT_EQ (99, bitmap_count_bits (b));
  ASSERT_TRUE (bitmap_bit_p (b, 149));
  ASSERT_FALSE (bitmap_bit_p (b, 150));
  ASSERT_TRUE (bitmap_bit_p (b, 151));
}

Maybe this could change to:

static void
test_clear_bit_in_middle ()
{
  bitmap b = bitmap_gc_alloc ();

  FOR_EACH_BITMAP_IMPL (b)
    {
      /* Set b to [100..200].  */
      bitmap_set_range (b, 100, 100);
      ASSERT_EQ (100, bitmap_count_bits (b));
    }

  bool first_time = true;
  /* Clear a bit in the middle.  */
  FOR_EACH_BITMAP_IMPL (b)
    {
      if (first_time)
        {
          bool changed = bitmap_clear_bit (b, 150);
          ASSERT_TRUE (changed);
          first_time = false;
        }
      ASSERT_EQ (99, bitmap_count_bits (b));
      ASSERT_TRUE (bitmap_bit_p (b, 149));
      ASSERT_FALSE (bitmap_bit_p (b, 150));
      ASSERT_TRUE (bitmap_bit_p (b, 151));
    }
}

...or somesuch, where maybe FOR_EACH_BITMAP_IMPL (b) could try linked-
list, then splay tree, then linked-list, converting "b" as it goes.  
This would hopefully give us a lot of test coverage for the various
operations in both modes, and for the conversion routines (in both
directions, assuming that both directions are supported).

Hope this is constructive
Dave

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add splay-tree "view" for bitmap
  2018-10-18 13:58 [PATCH] Add splay-tree "view" for bitmap Richard Biener
  2018-10-18 15:43 ` Richard Sandiford
  2018-10-18 15:53 ` David Malcolm
@ 2018-10-18 18:01 ` Jeff Law
  2 siblings, 0 replies; 15+ messages in thread
From: Jeff Law @ 2018-10-18 18:01 UTC (permalink / raw)
  To: Richard Biener, gcc-patches; +Cc: stevenb.gcc

On 10/18/18 7:09 AM, Richard Biener wrote:
> 
> PR63155 made me pick up this old work from Steven, it turns our
> linked-list implementation to a two-mode one with one being a
> splay tree featuring O(log N) complexity for find/remove.
> 
> Over Stevens original patch I added a bitmap_tree_to_vec helper
> that I use from the debug/print methods to avoid changing view
> there.  In theory the bitmap iterator could get a "stack"
> as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.
> 
> This can be used to fix the two biggest bottlenecks in the PRs
> testcase, namely SSA propagator worklist handling and out-of-SSA
> coalesce list building.  perf shows the following data, first
> unpatched, second patched - also watch the thrid coulumn (samples)
> when comparing percentages.
> 
> -O0
> -   18.19%    17.35%           407  cc1      cc1               [.] bitmap_set_bб▒
>    - bitmap_set_bit                                                            б▒
>       + 8.77% create_coalesce_list_for_region                                  б▒
>       + 4.21% calculate_live_ranges                                            б▒
>       + 2.02% build_ssa_conflict_graph                                         б▒
>       + 1.66% insert_phi_nodes_for                                             б▒
>       + 0.86% coalesce_ssa_name                      
> patched:
> -   12.39%    10.48%           129  cc1      cc1               [.] bitmap_set_bб▒
>    - bitmap_set_bit                                                            б▒
>       + 5.27% calculate_live_ranges                                            б▒
>       + 2.76% insert_phi_nodes_for                                             б▒
>       + 1.90% create_coalesce_list_for_region                                  б▒
>       + 1.63% build_ssa_conflict_graph                                         б▒
>       + 0.35% coalesce_ssa_name                           
> 
> -O1
> -   17.53%    17.53%           842  cc1      cc1               [.] bitmap_set_bб▒
>    - bitmap_set_bit                                                            б▒
>       + 12.39% add_ssa_edge                                                    б▒
>       + 1.48% create_coalesce_list_for_region                                  б▒
>       + 0.82% solve_constraints                                                б▒
>       + 0.71% calculate_live_ranges                                            б▒
>       + 0.64% add_implicit_graph_edge                                          б▒
>       + 0.41% insert_phi_nodes_for                                             б▒
>       + 0.34% build_ssa_conflict_graph                      
> patched:
> -    5.79%     5.00%           167  cc1      cc1               [.] bitmap_set_bб▒
>    - bitmap_set_bit                                                            б▒
>       + 1.41% add_ssa_edge                                                     б▒
>       + 0.88% calculate_live_ranges                                            б▒
>       + 0.75% add_implicit_graph_edge                                          б▒
>       + 0.68% solve_constraints                                                б▒
>       + 0.48% insert_phi_nodes_for                                             б▒
>       + 0.45% build_ssa_conflict_graph                   
> 
> -O3
> -   12.37%    12.34%          1145  cc1      cc1               [.] bitmap_set_bб▒
>    - bitmap_set_bit                                                            б▒
>       + 9.14% add_ssa_edge                                                     б▒
>       + 0.80% create_coalesce_list_for_region                                  б▒
>       + 0.69% add_implicit_graph_edge                                          б▒
>       + 0.54% solve_constraints                                                б▒
>       + 0.34% calculate_live_ranges                                            б▒
>       + 0.27% insert_phi_nodes_for                                             б▒
>       + 0.21% build_ssa_conflict_graph                 
> -    4.36%     3.86%           227  cc1      cc1               [.] bitmap_set_bб▒
>    - bitmap_set_bit                                                            б▒
>       + 0.98% add_ssa_edge                                                     б▒
>       + 0.86% add_implicit_graph_edge                                          б▒
>       + 0.64% solve_constraints                                                б▒
>       + 0.57% calculate_live_ranges                                            б▒
>       + 0.32% build_ssa_conflict_graph                                         б▒
>       + 0.29% mark_all_vars_used_1                                             б▒
>       + 0.20% insert_phi_nodes_for                                             б▒
>       + 0.16% create_coalesce_list_for_region             
> 
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> Any objections?
> 
> Thanks,
> Richard.
> 
> 2018-10-18  Steven Bosscher <steven@gcc.gnu.org>
> 	Richard Biener  <rguenther@suse.de>
> 
> 	* bitmap.h: Update data structure documentation, including a
> 	description of bitmap views as either linked-lists or splay trees.
> 	(struct bitmap_element_def): Update comments for splay tree bitmaps.
> 	(struct bitmap_head_def): Likewise.
> 	(bitmap_list_view, bitmap_tree_view): New prototypes.
> 	(debug_bitmap, debug_bitmap_file, bitmap_print): Update prototypes.
> 	(dump_bitmap): Update to take non-const bitmap.
> 	(bitmap_initialize_stat): Initialize a bitmap_head's indx and
> 	tree_form fields.
> 	(bmp_iter_set_init): Assert the iterated bitmaps are in list form.
> 	(bmp_iter_and_init, bmp_iter_and_compl_init): Likewise.
> 	* bitmap.c (next_bitmap_desc_id): Make unsigned.
> 	(get_bitmap_descriptor): Make sure there are no more than 2^31
> 	bitmap descriptors.
> 	(bitmap_elem_to_freelist): Unregister overhead of a released bitmap
> 	element here.
> 	(bitmap_element_free): Remove.
> 	(bitmap_elt_clear_from): Work on splay tree bitmaps.
> 	(bitmap_list_link_element): Renamed from bitmap_element_link.  Move
> 	this function similar ones such that linked-list bitmap implementation
> 	functions are grouped.
> 	(bitmap_list_unlink_element): Renamed from bitmap_element_unlink,
> 	and moved for grouping.
> 	(bitmap_list_insert_element_after): Renamed from
> 	bitmap_elt_insert_after, and moved for grouping.
> 	(bitmap_list_find_element): New function spliced from bitmap_find_bit.
> 	(bitmap_tree_link_left, bitmap_tree_link_right,
> 	bitmap_tree_rotate_left, bitmap_tree_rotate_right, bitmap_tree_splay,
> 	bitmap_tree_link_element, bitmap_tree_unlink_element,
> 	bitmap_tree_find_element): New functions for splay-tree bitmap
> 	implementation.
> 	(bitmap_element_link, bitmap_element_unlink, bitmap_elt_insert_after):
> 	Renamed and moved, see above entries.
> 	(bitmap_tree_listify_from): New function to convert part of a splay
> 	tree bitmap to a linked-list bitmap.
> 	(bitmap_list_view): Convert a splay tree bitmap to linked-list form.
> 	(bitmap_tree_view): Convert a linked-list bitmap to splay tree form.
> 	(bitmap_find_bit, bitmap_clear, bitmap_clear_bit, bitmap_set_bit,
> 	bitmap_single_bit_set_p, bitmap_first_set_bit, bitmap_last_set_bit):
> 	Handle splay tree bitmaps.
> 	(bitmap_copy, bitmap_count_bits, bitmap_and, bitmap_and_into,
> 	bitmap_elt_copy, bitmap_and_compl, bitmap_and_compl_into,
> 	bitmap_compl_and_into, bitmap_elt_ior, bitmap_ior, bitmap_ior_into,
> 	bitmap_xor, bitmap_xor_into, bitmap_equal_p, bitmap_intersect_p,
> 	bitmap_intersect_compl_p, bitmap_ior_and_compl,
> 	bitmap_ior_and_compl_into, bitmap_set_range, bitmap_clear_range,
> 	bitmap_hash): Reject trying to act on splay tree bitmaps.  Make
> 	corresponding changes to use linked-list specific bitmap_element
> 	manipulation functions as applicable for efficiency.
> 	(debug_bitmap_file): Handle splay tree bitmaps by converting the
> 	bitmap to linked-list form and back.
> 	(bitmap_print): Likewise.
> 	(debug_bitmap): Take a non-const bitmap.
> 
> 	PR tree-optimization/63155
> 	* tree-ssa-propagate.c (ssa_prop_init): Use tree-view for the
> 	SSA edge worklists.
> 	* tree-ssa-coalesce.c (coalesce_ssa_name): Populate used_in_copies
> 	in tree-view.
I'm a fan.  No objections from me.

It likely invalidates the analysis I did last year where jump threading
from VRP was important to optimize the bitmap stuff well.  But that
analysis probably needs to redone anyway and I'm unlikely to be able to
push on removal of VRP threading this cycle anyway.

Jeff


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add splay-tree "view" for bitmap
  2018-10-18 15:46   ` Richard Biener
@ 2018-10-18 22:27     ` Richard Sandiford
  2018-10-19  7:20       ` Richard Biener
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2018-10-18 22:27 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, stevenb.gcc

Richard Biener <rguenther@suse.de> writes:
> On Thu, 18 Oct 2018, Richard Sandiford wrote:
>
>> Richard Biener <rguenther@suse.de> writes:
>> > PR63155 made me pick up this old work from Steven, it turns our
>> > linked-list implementation to a two-mode one with one being a
>> > splay tree featuring O(log N) complexity for find/remove.
>> >
>> > Over Stevens original patch I added a bitmap_tree_to_vec helper
>> > that I use from the debug/print methods to avoid changing view
>> > there.  In theory the bitmap iterator could get a "stack"
>> > as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.
>> >
>> > This can be used to fix the two biggest bottlenecks in the PRs
>> > testcase, namely SSA propagator worklist handling and out-of-SSA
>> > coalesce list building.  perf shows the following data, first
>> > unpatched, second patched - also watch the thrid coulumn (samples)
>> > when comparing percentages.
>> >
>> > -O0
>> > -   18.19%    17.35%           407  cc1      cc1               [.] bitmap_set_b▒
>> >    - bitmap_set_bit                                                            ▒
>> >       + 8.77% create_coalesce_list_for_region                                  ▒
>> >       + 4.21% calculate_live_ranges                                            ▒
>> >       + 2.02% build_ssa_conflict_graph                                         ▒
>> >       + 1.66% insert_phi_nodes_for                                             ▒
>> >       + 0.86% coalesce_ssa_name                      
>> > patched:
>> > -   12.39%    10.48%           129  cc1      cc1               [.] bitmap_set_b▒
>> >    - bitmap_set_bit                                                            ▒
>> >       + 5.27% calculate_live_ranges                                            ▒
>> >       + 2.76% insert_phi_nodes_for                                             ▒
>> >       + 1.90% create_coalesce_list_for_region                                  ▒
>> >       + 1.63% build_ssa_conflict_graph                                         ▒
>> >       + 0.35% coalesce_ssa_name                           
>> >
>> > -O1
>> > -   17.53%    17.53%           842  cc1      cc1               [.] bitmap_set_b▒
>> >    - bitmap_set_bit                                                            ▒
>> >       + 12.39% add_ssa_edge                                                    ▒
>> >       + 1.48% create_coalesce_list_for_region                                  ▒
>> >       + 0.82% solve_constraints                                                ▒
>> >       + 0.71% calculate_live_ranges                                            ▒
>> >       + 0.64% add_implicit_graph_edge                                          ▒
>> >       + 0.41% insert_phi_nodes_for                                             ▒
>> >       + 0.34% build_ssa_conflict_graph                      
>> > patched:
>> > -    5.79%     5.00%           167  cc1      cc1               [.] bitmap_set_b▒
>> >    - bitmap_set_bit                                                            ▒
>> >       + 1.41% add_ssa_edge                                                     ▒
>> >       + 0.88% calculate_live_ranges                                            ▒
>> >       + 0.75% add_implicit_graph_edge                                          ▒
>> >       + 0.68% solve_constraints                                                ▒
>> >       + 0.48% insert_phi_nodes_for                                             ▒
>> >       + 0.45% build_ssa_conflict_graph                   
>> >
>> > -O3
>> > -   12.37%    12.34%          1145  cc1      cc1               [.] bitmap_set_b▒
>> >    - bitmap_set_bit                                                            ▒
>> >       + 9.14% add_ssa_edge                                                     ▒
>> >       + 0.80% create_coalesce_list_for_region                                  ▒
>> >       + 0.69% add_implicit_graph_edge                                          ▒
>> >       + 0.54% solve_constraints                                                ▒
>> >       + 0.34% calculate_live_ranges                                            ▒
>> >       + 0.27% insert_phi_nodes_for                                             ▒
>> >       + 0.21% build_ssa_conflict_graph                 
>> > -    4.36%     3.86%           227  cc1      cc1               [.] bitmap_set_b▒
>> >    - bitmap_set_bit                                                            ▒
>> >       + 0.98% add_ssa_edge                                                     ▒
>> >       + 0.86% add_implicit_graph_edge                                          ▒
>> >       + 0.64% solve_constraints                                                ▒
>> >       + 0.57% calculate_live_ranges                                            ▒
>> >       + 0.32% build_ssa_conflict_graph                                         ▒
>> >       + 0.29% mark_all_vars_used_1                                             ▒
>> >       + 0.20% insert_phi_nodes_for                                             ▒
>> >       + 0.16% create_coalesce_list_for_region             
>> >
>> >
>> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>> 
>> What's the performance like for more reasonable testcases?  E.g. is there
>> a measurable change either way in --enable-checking=release for some gcc
>> .iis at -g or -O2 -g?
>
> I did a quick check on my set of cc1files (still .i, .ii ones tend to
> be unusable quite quickly...).  Most of them compile too quickly
> to make any difference appear other than noise.  Multi-second differences
> like for PR63155 should be the exception but our O(n) bitmap
> implementation really makes some parts of GCC quadratic where it
> doesn't appear so.
>
> Is there a reason you expect it to be ever slower?

During recent stage3s I've tried to look at profiles of cc1plus
to see whether there was something easy we could do to improve
compile times.  And bitmap operations always showed up near the
top of the profile.  There were always some pathological queries
in which the linear search really hurt, but whenever I tried "simple"
ways to avoid the obvious cases, they made those queries quicker
but slowed things down overall.  It seemed that adding any extra logic
to the queries hurted because even a small slowdown in common lookups
overwhelmed a big saving in less common lookups.

But there again I was looking to speed up more "normal" cases, not ones
like the PR.

FWIW I've tried it on a local x86_64 box and it slows down current
optabs.ii at -O2 -g by ~0.35% (reproducable).  I see similar slowdowns
for the other files I tried.  But that's hardly a huge amount, and
probably a price worth paying for the speed-up in the PR.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add splay-tree "view" for bitmap
  2018-10-18 22:27     ` Richard Sandiford
@ 2018-10-19  7:20       ` Richard Biener
  2018-10-19  8:44         ` Richard Sandiford
  2018-10-19  9:10         ` Steven Bosscher
  0 siblings, 2 replies; 15+ messages in thread
From: Richard Biener @ 2018-10-19  7:20 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, stevenb.gcc

On October 18, 2018 11:05:32 PM GMT+02:00, Richard Sandiford <richard.sandiford@arm.com> wrote:
>Richard Biener <rguenther@suse.de> writes:
>> On Thu, 18 Oct 2018, Richard Sandiford wrote:
>>
>>> Richard Biener <rguenther@suse.de> writes:
>>> > PR63155 made me pick up this old work from Steven, it turns our
>>> > linked-list implementation to a two-mode one with one being a
>>> > splay tree featuring O(log N) complexity for find/remove.
>>> >
>>> > Over Stevens original patch I added a bitmap_tree_to_vec helper
>>> > that I use from the debug/print methods to avoid changing view
>>> > there.  In theory the bitmap iterator could get a "stack"
>>> > as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.
>>> >
>>> > This can be used to fix the two biggest bottlenecks in the PRs
>>> > testcase, namely SSA propagator worklist handling and out-of-SSA
>>> > coalesce list building.  perf shows the following data, first
>>> > unpatched, second patched - also watch the thrid coulumn (samples)
>>> > when comparing percentages.
>>> >
>>> > -O0
>>> > -   18.19%    17.35%           407  cc1      cc1               [.]
>bitmap_set_b▒
>>> >    - bitmap_set_bit                                               
>            ▒
>>> >       + 8.77% create_coalesce_list_for_region                     
>            ▒
>>> >       + 4.21% calculate_live_ranges                               
>            ▒
>>> >       + 2.02% build_ssa_conflict_graph                            
>            ▒
>>> >       + 1.66% insert_phi_nodes_for                                
>            ▒
>>> >       + 0.86% coalesce_ssa_name                      
>>> > patched:
>>> > -   12.39%    10.48%           129  cc1      cc1               [.]
>bitmap_set_b▒
>>> >    - bitmap_set_bit                                               
>            ▒
>>> >       + 5.27% calculate_live_ranges                               
>            ▒
>>> >       + 2.76% insert_phi_nodes_for                                
>            ▒
>>> >       + 1.90% create_coalesce_list_for_region                     
>            ▒
>>> >       + 1.63% build_ssa_conflict_graph                            
>            ▒
>>> >       + 0.35% coalesce_ssa_name                           
>>> >
>>> > -O1
>>> > -   17.53%    17.53%           842  cc1      cc1               [.]
>bitmap_set_b▒
>>> >    - bitmap_set_bit                                               
>            ▒
>>> >       + 12.39% add_ssa_edge                                       
>            ▒
>>> >       + 1.48% create_coalesce_list_for_region                     
>            ▒
>>> >       + 0.82% solve_constraints                                   
>            ▒
>>> >       + 0.71% calculate_live_ranges                               
>            ▒
>>> >       + 0.64% add_implicit_graph_edge                             
>            ▒
>>> >       + 0.41% insert_phi_nodes_for                                
>            ▒
>>> >       + 0.34% build_ssa_conflict_graph                      
>>> > patched:
>>> > -    5.79%     5.00%           167  cc1      cc1               [.]
>bitmap_set_b▒
>>> >    - bitmap_set_bit                                               
>            ▒
>>> >       + 1.41% add_ssa_edge                                        
>            ▒
>>> >       + 0.88% calculate_live_ranges                               
>            ▒
>>> >       + 0.75% add_implicit_graph_edge                             
>            ▒
>>> >       + 0.68% solve_constraints                                   
>            ▒
>>> >       + 0.48% insert_phi_nodes_for                                
>            ▒
>>> >       + 0.45% build_ssa_conflict_graph                   
>>> >
>>> > -O3
>>> > -   12.37%    12.34%          1145  cc1      cc1               [.]
>bitmap_set_b▒
>>> >    - bitmap_set_bit                                               
>            ▒
>>> >       + 9.14% add_ssa_edge                                        
>            ▒
>>> >       + 0.80% create_coalesce_list_for_region                     
>            ▒
>>> >       + 0.69% add_implicit_graph_edge                             
>            ▒
>>> >       + 0.54% solve_constraints                                   
>            ▒
>>> >       + 0.34% calculate_live_ranges                               
>            ▒
>>> >       + 0.27% insert_phi_nodes_for                                
>            ▒
>>> >       + 0.21% build_ssa_conflict_graph                 
>>> > -    4.36%     3.86%           227  cc1      cc1               [.]
>bitmap_set_b▒
>>> >    - bitmap_set_bit                                               
>            ▒
>>> >       + 0.98% add_ssa_edge                                        
>            ▒
>>> >       + 0.86% add_implicit_graph_edge                             
>            ▒
>>> >       + 0.64% solve_constraints                                   
>            ▒
>>> >       + 0.57% calculate_live_ranges                               
>            ▒
>>> >       + 0.32% build_ssa_conflict_graph                            
>            ▒
>>> >       + 0.29% mark_all_vars_used_1                                
>            ▒
>>> >       + 0.20% insert_phi_nodes_for                                
>            ▒
>>> >       + 0.16% create_coalesce_list_for_region             
>>> >
>>> >
>>> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>>> 
>>> What's the performance like for more reasonable testcases?  E.g. is
>there
>>> a measurable change either way in --enable-checking=release for some
>gcc
>>> .iis at -g or -O2 -g?
>>
>> I did a quick check on my set of cc1files (still .i, .ii ones tend to
>> be unusable quite quickly...).  Most of them compile too quickly
>> to make any difference appear other than noise.  Multi-second
>differences
>> like for PR63155 should be the exception but our O(n) bitmap
>> implementation really makes some parts of GCC quadratic where it
>> doesn't appear so.
>>
>> Is there a reason you expect it to be ever slower?
>
>During recent stage3s I've tried to look at profiles of cc1plus
>to see whether there was something easy we could do to improve
>compile times.  And bitmap operations always showed up near the
>top of the profile.  There were always some pathological queries
>in which the linear search really hurt, but whenever I tried "simple"
>ways to avoid the obvious cases, they made those queries quicker
>but slowed things down overall.  It seemed that adding any extra logic
>to the queries hurted because even a small slowdown in common lookups
>overwhelmed a big saving in less common lookups.

Yeah. I also noticed some 'obvious' shortcomings in the heuristics...
I guess in the end well predicted branches in the out of line code are important... 

>
>But there again I was looking to speed up more "normal" cases, not ones
>like the PR.
>
>FWIW I've tried it on a local x86_64 box and it slows down current
>optabs.ii at -O2 -g by ~0.35% (reproducable).  I see similar slowdowns
>for the other files I tried.  But that's hardly a huge amount, and
>probably a price worth paying for the speed-up in the PR.

Just to make sure what to reproduce - this is with checking disabled? And with or without the hunks actually making use of the splay tree path? 

Richard. 

>Thanks,
>Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add splay-tree "view" for bitmap
  2018-10-19  7:20       ` Richard Biener
@ 2018-10-19  8:44         ` Richard Sandiford
  2018-10-19 13:33           ` Richard Biener
  2018-10-19  9:10         ` Steven Bosscher
  1 sibling, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2018-10-19  8:44 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, stevenb.gcc

Richard Biener <rguenther@suse.de> writes:
> On October 18, 2018 11:05:32 PM GMT+02:00, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>Richard Biener <rguenther@suse.de> writes:
>>> On Thu, 18 Oct 2018, Richard Sandiford wrote:
>>>
>>>> Richard Biener <rguenther@suse.de> writes:
>>>> > PR63155 made me pick up this old work from Steven, it turns our
>>>> > linked-list implementation to a two-mode one with one being a
>>>> > splay tree featuring O(log N) complexity for find/remove.
>>>> >
>>>> > Over Stevens original patch I added a bitmap_tree_to_vec helper
>>>> > that I use from the debug/print methods to avoid changing view
>>>> > there.  In theory the bitmap iterator could get a "stack"
>>>> > as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.
>>>> >
>>>> > This can be used to fix the two biggest bottlenecks in the PRs
>>>> > testcase, namely SSA propagator worklist handling and out-of-SSA
>>>> > coalesce list building.  perf shows the following data, first
>>>> > unpatched, second patched - also watch the thrid coulumn (samples)
>>>> > when comparing percentages.
>>>> >
>>>> > -O0
>>>> > -   18.19%    17.35%           407  cc1      cc1               [.]
>>bitmap_set_b▒
>>>> >    - bitmap_set_bit                                               
>>            ▒
>>>> >       + 8.77% create_coalesce_list_for_region                     
>>            ▒
>>>> >       + 4.21% calculate_live_ranges                               
>>            ▒
>>>> >       + 2.02% build_ssa_conflict_graph                            
>>            ▒
>>>> >       + 1.66% insert_phi_nodes_for                                
>>            ▒
>>>> >       + 0.86% coalesce_ssa_name                      
>>>> > patched:
>>>> > -   12.39%    10.48%           129  cc1      cc1               [.]
>>bitmap_set_b▒
>>>> >    - bitmap_set_bit                                               
>>            ▒
>>>> >       + 5.27% calculate_live_ranges                               
>>            ▒
>>>> >       + 2.76% insert_phi_nodes_for                                
>>            ▒
>>>> >       + 1.90% create_coalesce_list_for_region                     
>>            ▒
>>>> >       + 1.63% build_ssa_conflict_graph                            
>>            ▒
>>>> >       + 0.35% coalesce_ssa_name                           
>>>> >
>>>> > -O1
>>>> > -   17.53%    17.53%           842  cc1      cc1               [.]
>>bitmap_set_b▒
>>>> >    - bitmap_set_bit                                               
>>            ▒
>>>> >       + 12.39% add_ssa_edge                                       
>>            ▒
>>>> >       + 1.48% create_coalesce_list_for_region                     
>>            ▒
>>>> >       + 0.82% solve_constraints                                   
>>            ▒
>>>> >       + 0.71% calculate_live_ranges                               
>>            ▒
>>>> >       + 0.64% add_implicit_graph_edge                             
>>            ▒
>>>> >       + 0.41% insert_phi_nodes_for                                
>>            ▒
>>>> >       + 0.34% build_ssa_conflict_graph                      
>>>> > patched:
>>>> > -    5.79%     5.00%           167  cc1      cc1               [.]
>>bitmap_set_b▒
>>>> >    - bitmap_set_bit                                               
>>            ▒
>>>> >       + 1.41% add_ssa_edge                                        
>>            ▒
>>>> >       + 0.88% calculate_live_ranges                               
>>            ▒
>>>> >       + 0.75% add_implicit_graph_edge                             
>>            ▒
>>>> >       + 0.68% solve_constraints                                   
>>            ▒
>>>> >       + 0.48% insert_phi_nodes_for                                
>>            ▒
>>>> >       + 0.45% build_ssa_conflict_graph                   
>>>> >
>>>> > -O3
>>>> > -   12.37%    12.34%          1145  cc1      cc1               [.]
>>bitmap_set_b▒
>>>> >    - bitmap_set_bit                                               
>>            ▒
>>>> >       + 9.14% add_ssa_edge                                        
>>            ▒
>>>> >       + 0.80% create_coalesce_list_for_region                     
>>            ▒
>>>> >       + 0.69% add_implicit_graph_edge                             
>>            ▒
>>>> >       + 0.54% solve_constraints                                   
>>            ▒
>>>> >       + 0.34% calculate_live_ranges                               
>>            ▒
>>>> >       + 0.27% insert_phi_nodes_for                                
>>            ▒
>>>> >       + 0.21% build_ssa_conflict_graph                 
>>>> > -    4.36%     3.86%           227  cc1      cc1               [.]
>>bitmap_set_b▒
>>>> >    - bitmap_set_bit                                               
>>            ▒
>>>> >       + 0.98% add_ssa_edge                                        
>>            ▒
>>>> >       + 0.86% add_implicit_graph_edge                             
>>            ▒
>>>> >       + 0.64% solve_constraints                                   
>>            ▒
>>>> >       + 0.57% calculate_live_ranges                               
>>            ▒
>>>> >       + 0.32% build_ssa_conflict_graph                            
>>            ▒
>>>> >       + 0.29% mark_all_vars_used_1                                
>>            ▒
>>>> >       + 0.20% insert_phi_nodes_for                                
>>            ▒
>>>> >       + 0.16% create_coalesce_list_for_region             
>>>> >
>>>> >
>>>> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>>>> 
>>>> What's the performance like for more reasonable testcases?  E.g. is
>>there
>>>> a measurable change either way in --enable-checking=release for some
>>gcc
>>>> .iis at -g or -O2 -g?
>>>
>>> I did a quick check on my set of cc1files (still .i, .ii ones tend to
>>> be unusable quite quickly...).  Most of them compile too quickly
>>> to make any difference appear other than noise.  Multi-second
>>differences
>>> like for PR63155 should be the exception but our O(n) bitmap
>>> implementation really makes some parts of GCC quadratic where it
>>> doesn't appear so.
>>>
>>> Is there a reason you expect it to be ever slower?
>>
>>During recent stage3s I've tried to look at profiles of cc1plus
>>to see whether there was something easy we could do to improve
>>compile times.  And bitmap operations always showed up near the
>>top of the profile.  There were always some pathological queries
>>in which the linear search really hurt, but whenever I tried "simple"
>>ways to avoid the obvious cases, they made those queries quicker
>>but slowed things down overall.  It seemed that adding any extra logic
>>to the queries hurted because even a small slowdown in common lookups
>>overwhelmed a big saving in less common lookups.
>
> Yeah. I also noticed some 'obvious' shortcomings in the heuristics...
> I guess in the end well predicted branches in the out of line code are
> important...
>
>>
>>But there again I was looking to speed up more "normal" cases, not ones
>>like the PR.
>>
>>FWIW I've tried it on a local x86_64 box and it slows down current
>>optabs.ii at -O2 -g by ~0.35% (reproducable).  I see similar slowdowns
>>for the other files I tried.  But that's hardly a huge amount, and
>>probably a price worth paying for the speed-up in the PR.
>
> Just to make sure what to reproduce - this is with checking disabled?
> And with or without the hunks actually making use of the splay tree
> path?

Yeah, with an --enable-checking=release stage3:

   ./cc1plus optabs.ii -o /dev/null -O2 -g

using the optabs.ii from the unpatched --enable-checking=release build.

It was the whole patch vs. without the patch.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add splay-tree "view" for bitmap
  2018-10-19  7:20       ` Richard Biener
  2018-10-19  8:44         ` Richard Sandiford
@ 2018-10-19  9:10         ` Steven Bosscher
  2018-10-19 13:14           ` Richard Biener
  1 sibling, 1 reply; 15+ messages in thread
From: Steven Bosscher @ 2018-10-19  9:10 UTC (permalink / raw)
  To: Richard Biener; +Cc: richard.sandiford, GCC Patches

On Fri, Oct 19, 2018 at 8:46 AM Richard Biener <> wrote:
> Yeah. I also noticed some 'obvious' shortcomings in the heuristics...
> I guess in the end well predicted branches in the out of line code are important...

What also would help is to put bitmaps on their own obstack to improve
cache locality.

As for the patch, I never hacked it with "production code" in mind, it
was just a proof of concept. Not all of it is optimal or even safe
as-is. For example you probably should add
"gcc_checking_assert(!(BITMAP)->tree-form)" tests in the
bmp_iter_*_init functions. And perhaps semi-splaying trees work better
for the use cases of GCC (x.f. "Rehabilitation of an unloved child:
semi-splaying"). I implemented classic splay trees because I could not
find a semi-splay tree implementation in any of the usual text books
while classic splay tree implementations were given in all of those
books ;-)

Ciao!
Steven

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add splay-tree "view" for bitmap
  2018-10-19  9:10         ` Steven Bosscher
@ 2018-10-19 13:14           ` Richard Biener
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Biener @ 2018-10-19 13:14 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: richard.sandiford, GCC Patches

On Fri, 19 Oct 2018, Steven Bosscher wrote:

> On Fri, Oct 19, 2018 at 8:46 AM Richard Biener <> wrote:
> > Yeah. I also noticed some 'obvious' shortcomings in the heuristics...
> > I guess in the end well predicted branches in the out of line code are important...

I specifically meant the fact that we happily update ->current to
->first and we do not check ->first->index == indx.  We also
decide when we want to walk backwards from current via
head->indx / 2 < indx rather than (head->indx - head->first->indx) / 2
+ head->first->indx < indx - but that's a heuristic assuming evenly
distributed set bits anyway.

> What also would help is to put bitmaps on their own obstack to improve
> cache locality.

That's true.  I guess increasing bitmap_element size to cover a
whole cache-line would be excessive ;)  On lp64 targets it's size
is currently 40 bytes which makes multiple ones not a perfect fit
for a usual cacheline (128 bytes).

> 
> As for the patch, I never hacked it with "production code" in mind, it
> was just a proof of concept. Not all of it is optimal or even safe
> as-is. For example you probably should add
> "gcc_checking_assert(!(BITMAP)->tree-form)" tests in the
> bmp_iter_*_init functions.

Those are already there.

> And perhaps semi-splaying trees work better
> for the use cases of GCC (x.f. "Rehabilitation of an unloved child:
> semi-splaying"). I implemented classic splay trees because I could not
> find a semi-splay tree implementation in any of the usual text books
> while classic splay tree implementations were given in all of those
> books ;-)

I think the classic splay tree mimics best the existing behavior
of the linked-list implementation (in find_element).  Another
thing to note would be that the ->current cache is basically
unused for the tree representation (it should always equal ->first),
but we do neither document that fact nor exploit it by not updating
->indx or ->current.

Anyway, I'll give the paper a read.

Overall it's clear that there are places in GCC and testcases that
make it necessary to address the linar-complexity of our bitmap
implementation...

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add splay-tree "view" for bitmap
  2018-10-19  8:44         ` Richard Sandiford
@ 2018-10-19 13:33           ` Richard Biener
  2018-10-19 14:32             ` Richard Biener
  2018-10-20 12:38             ` Richard Sandiford
  0 siblings, 2 replies; 15+ messages in thread
From: Richard Biener @ 2018-10-19 13:33 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, stevenb.gcc

[-- Attachment #1: Type: text/plain, Size: 10730 bytes --]

On Fri, 19 Oct 2018, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On October 18, 2018 11:05:32 PM GMT+02:00, Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>Richard Biener <rguenther@suse.de> writes:
> >>> On Thu, 18 Oct 2018, Richard Sandiford wrote:
> >>>
> >>>> Richard Biener <rguenther@suse.de> writes:
> >>>> > PR63155 made me pick up this old work from Steven, it turns our
> >>>> > linked-list implementation to a two-mode one with one being a
> >>>> > splay tree featuring O(log N) complexity for find/remove.
> >>>> >
> >>>> > Over Stevens original patch I added a bitmap_tree_to_vec helper
> >>>> > that I use from the debug/print methods to avoid changing view
> >>>> > there.  In theory the bitmap iterator could get a "stack"
> >>>> > as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.
> >>>> >
> >>>> > This can be used to fix the two biggest bottlenecks in the PRs
> >>>> > testcase, namely SSA propagator worklist handling and out-of-SSA
> >>>> > coalesce list building.  perf shows the following data, first
> >>>> > unpatched, second patched - also watch the thrid coulumn (samples)
> >>>> > when comparing percentages.
> >>>> >
> >>>> > -O0
> >>>> > -   18.19%    17.35%           407  cc1      cc1               [.]
> >>bitmap_set_bâ–’
> >>>> >    - bitmap_set_bit                                               
> >>            â–’
> >>>> >       + 8.77% create_coalesce_list_for_region                     
> >>            â–’
> >>>> >       + 4.21% calculate_live_ranges                               
> >>            â–’
> >>>> >       + 2.02% build_ssa_conflict_graph                            
> >>            â–’
> >>>> >       + 1.66% insert_phi_nodes_for                                
> >>            â–’
> >>>> >       + 0.86% coalesce_ssa_name                      
> >>>> > patched:
> >>>> > -   12.39%    10.48%           129  cc1      cc1               [.]
> >>bitmap_set_bâ–’
> >>>> >    - bitmap_set_bit                                               
> >>            â–’
> >>>> >       + 5.27% calculate_live_ranges                               
> >>            â–’
> >>>> >       + 2.76% insert_phi_nodes_for                                
> >>            â–’
> >>>> >       + 1.90% create_coalesce_list_for_region                     
> >>            â–’
> >>>> >       + 1.63% build_ssa_conflict_graph                            
> >>            â–’
> >>>> >       + 0.35% coalesce_ssa_name                           
> >>>> >
> >>>> > -O1
> >>>> > -   17.53%    17.53%           842  cc1      cc1               [.]
> >>bitmap_set_bâ–’
> >>>> >    - bitmap_set_bit                                               
> >>            â–’
> >>>> >       + 12.39% add_ssa_edge                                       
> >>            â–’
> >>>> >       + 1.48% create_coalesce_list_for_region                     
> >>            â–’
> >>>> >       + 0.82% solve_constraints                                   
> >>            â–’
> >>>> >       + 0.71% calculate_live_ranges                               
> >>            â–’
> >>>> >       + 0.64% add_implicit_graph_edge                             
> >>            â–’
> >>>> >       + 0.41% insert_phi_nodes_for                                
> >>            â–’
> >>>> >       + 0.34% build_ssa_conflict_graph                      
> >>>> > patched:
> >>>> > -    5.79%     5.00%           167  cc1      cc1               [.]
> >>bitmap_set_bâ–’
> >>>> >    - bitmap_set_bit                                               
> >>            â–’
> >>>> >       + 1.41% add_ssa_edge                                        
> >>            â–’
> >>>> >       + 0.88% calculate_live_ranges                               
> >>            â–’
> >>>> >       + 0.75% add_implicit_graph_edge                             
> >>            â–’
> >>>> >       + 0.68% solve_constraints                                   
> >>            â–’
> >>>> >       + 0.48% insert_phi_nodes_for                                
> >>            â–’
> >>>> >       + 0.45% build_ssa_conflict_graph                   
> >>>> >
> >>>> > -O3
> >>>> > -   12.37%    12.34%          1145  cc1      cc1               [.]
> >>bitmap_set_bâ–’
> >>>> >    - bitmap_set_bit                                               
> >>            â–’
> >>>> >       + 9.14% add_ssa_edge                                        
> >>            â–’
> >>>> >       + 0.80% create_coalesce_list_for_region                     
> >>            â–’
> >>>> >       + 0.69% add_implicit_graph_edge                             
> >>            â–’
> >>>> >       + 0.54% solve_constraints                                   
> >>            â–’
> >>>> >       + 0.34% calculate_live_ranges                               
> >>            â–’
> >>>> >       + 0.27% insert_phi_nodes_for                                
> >>            â–’
> >>>> >       + 0.21% build_ssa_conflict_graph                 
> >>>> > -    4.36%     3.86%           227  cc1      cc1               [.]
> >>bitmap_set_bâ–’
> >>>> >    - bitmap_set_bit                                               
> >>            â–’
> >>>> >       + 0.98% add_ssa_edge                                        
> >>            â–’
> >>>> >       + 0.86% add_implicit_graph_edge                             
> >>            â–’
> >>>> >       + 0.64% solve_constraints                                   
> >>            â–’
> >>>> >       + 0.57% calculate_live_ranges                               
> >>            â–’
> >>>> >       + 0.32% build_ssa_conflict_graph                            
> >>            â–’
> >>>> >       + 0.29% mark_all_vars_used_1                                
> >>            â–’
> >>>> >       + 0.20% insert_phi_nodes_for                                
> >>            â–’
> >>>> >       + 0.16% create_coalesce_list_for_region             
> >>>> >
> >>>> >
> >>>> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >>>> 
> >>>> What's the performance like for more reasonable testcases?  E.g. is
> >>there
> >>>> a measurable change either way in --enable-checking=release for some
> >>gcc
> >>>> .iis at -g or -O2 -g?
> >>>
> >>> I did a quick check on my set of cc1files (still .i, .ii ones tend to
> >>> be unusable quite quickly...).  Most of them compile too quickly
> >>> to make any difference appear other than noise.  Multi-second
> >>differences
> >>> like for PR63155 should be the exception but our O(n) bitmap
> >>> implementation really makes some parts of GCC quadratic where it
> >>> doesn't appear so.
> >>>
> >>> Is there a reason you expect it to be ever slower?
> >>
> >>During recent stage3s I've tried to look at profiles of cc1plus
> >>to see whether there was something easy we could do to improve
> >>compile times.  And bitmap operations always showed up near the
> >>top of the profile.  There were always some pathological queries
> >>in which the linear search really hurt, but whenever I tried "simple"
> >>ways to avoid the obvious cases, they made those queries quicker
> >>but slowed things down overall.  It seemed that adding any extra logic
> >>to the queries hurted because even a small slowdown in common lookups
> >>overwhelmed a big saving in less common lookups.
> >
> > Yeah. I also noticed some 'obvious' shortcomings in the heuristics...
> > I guess in the end well predicted branches in the out of line code are
> > important...
> >
> >>
> >>But there again I was looking to speed up more "normal" cases, not ones
> >>like the PR.
> >>
> >>FWIW I've tried it on a local x86_64 box and it slows down current
> >>optabs.ii at -O2 -g by ~0.35% (reproducable).  I see similar slowdowns
> >>for the other files I tried.  But that's hardly a huge amount, and
> >>probably a price worth paying for the speed-up in the PR.
> >
> > Just to make sure what to reproduce - this is with checking disabled?
> > And with or without the hunks actually making use of the splay tree
> > path?
> 
> Yeah, with an --enable-checking=release stage3:
> 
>    ./cc1plus optabs.ii -o /dev/null -O2 -g
> 
> using the optabs.ii from the unpatched --enable-checking=release build.
> 
> It was the whole patch vs. without the patch.

OK, so there are almost no hits from the SSA propagator or out-of-SSA
but only from "unchanged" paths:

-    2.90%     2.90%            23  cc1plus  cc1plus           [.] 
bitmap_set_bâ–’
   - bitmap_set_bit                                                            
â–’
      + 0.79% df_lr_bb_local_compute                                           
â–’
      + 0.38% insert_updated_phi_nodes_for                                     
â–’
      + 0.27% sched_analyze_reg                                                
â–’
      + 0.23% walk_aliased_vdefs_1                                             
â–’
      + 0.13% get_continuation_for_phi                                         
â–’
      + 0.13% add_control_edge                                                 
â–’
      + 0.13% df_md_bb_local_compute_process_def                               
â–’
      + 0.13% mark_for_renaming                                                
â–’
      + 0.13% (anonymous namespace)::pass_rtl_cprop::execute                   
â–’
      + 0.13% compute_dominance_frontiers                                      
â–’
      + 0.12% df_simulate_initialize_backwards      

it's also interesting that most branch misses (for bitmap_set_bit)
are from

      bool res = (ptr->bits[word_num] & bit_val) == 0;
      if (res)
        ptr->bits[word_num] |= bit_val;
      return res;

I'm not sure how "bad" a mispredicted branch is here.  I guess
if it is predicted to be taken (skip the write) then it's bad
but if it is predicted the other way around it should be a matter
of not retiring the store...  But I am not a CPU guy.  I guess
unconditionally updating the memory wouldn't be so bad after all
and it might also help combine in using architecture specific
optimizations like using some CC flags of the OR operation
to get at the comparison result.  Can you benchmark a difference
for this?

Individual compiles of optabs.ii are too fast (<4s for me) to
make out overall performance changes and noise is quite high
for me as well (+-6%).

It looks like you cannot even use perfs precise counters (Haswell here)
of retired instructions to measure differences reliably.

That is, I cannot really reproduce your findings on optabs.ii...
for me, the best runtime of both patched and unpatched is the
same 3.22s.

Richard.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add splay-tree "view" for bitmap
  2018-10-18 15:53 ` David Malcolm
@ 2018-10-19 13:37   ` Richard Biener
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Biener @ 2018-10-19 13:37 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches, stevenb.gcc

On Thu, 18 Oct 2018, David Malcolm wrote:

> On Thu, 2018-10-18 at 15:09 +0200, Richard Biener wrote:
> > PR63155 made me pick up this old work from Steven, it turns our
> > linked-list implementation to a two-mode one with one being a
> > splay tree featuring O(log N) complexity for find/remove.
> > 
> > Over Stevens original patch I added a bitmap_tree_to_vec helper
> > that I use from the debug/print methods to avoid changing view
> > there.  In theory the bitmap iterator could get a "stack"
> > as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.
> > 
> > This can be used to fix the two biggest bottlenecks in the PRs
> > testcase, namely SSA propagator worklist handling and out-of-SSA
> > coalesce list building.  perf shows the following data, first
> > unpatched, second patched - also watch the thrid coulumn (samples)
> > when comparing percentages.
> > 
> [...snip...]
> 
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> > 
> > Any objections?
> > 
> > Thanks,
> > Richard.
> > 
> > 2018-10-18  Steven Bosscher <steven@gcc.gnu.org>
> > 	Richard Biener  <rguenther@suse.de>
> > 
> > 	* bitmap.h: Update data structure documentation, including a
> > 	description of bitmap views as either linked-lists or splay
> > trees.
> 
> [...snip...]
> 
> From a "correctness" perspective, we have some existing unit-test
> coverage for bitmap via selftests in bitmap.c.  Perhaps those tests
> could be generalized to verify that the two different implementations
> work, and that the conversions work correctly?
> 
> e.g. currently we have:
> 
> static void
> test_clear_bit_in_middle ()
> {
>   bitmap b = bitmap_gc_alloc ();
> 
>   /* Set b to [100..200].  */
>   bitmap_set_range (b, 100, 100);
>   ASSERT_EQ (100, bitmap_count_bits (b));
> 
>   /* Clear a bit in the middle.  */
>   bool changed = bitmap_clear_bit (b, 150);
>   ASSERT_TRUE (changed);
>   ASSERT_EQ (99, bitmap_count_bits (b));
>   ASSERT_TRUE (bitmap_bit_p (b, 149));
>   ASSERT_FALSE (bitmap_bit_p (b, 150));
>   ASSERT_TRUE (bitmap_bit_p (b, 151));
> }
> 
> Maybe this could change to:
> 
> static void
> test_clear_bit_in_middle ()
> {
>   bitmap b = bitmap_gc_alloc ();
> 
>   FOR_EACH_BITMAP_IMPL (b)
>     {
>       /* Set b to [100..200].  */
>       bitmap_set_range (b, 100, 100);
>       ASSERT_EQ (100, bitmap_count_bits (b));
>     }
> 
>   bool first_time = true;
>   /* Clear a bit in the middle.  */
>   FOR_EACH_BITMAP_IMPL (b)
>     {
>       if (first_time)
>         {
>           bool changed = bitmap_clear_bit (b, 150);
>           ASSERT_TRUE (changed);
>           first_time = false;
>         }
>       ASSERT_EQ (99, bitmap_count_bits (b));
>       ASSERT_TRUE (bitmap_bit_p (b, 149));
>       ASSERT_FALSE (bitmap_bit_p (b, 150));
>       ASSERT_TRUE (bitmap_bit_p (b, 151));
>     }
> }
> 
> ...or somesuch, where maybe FOR_EACH_BITMAP_IMPL (b) could try linked-
> list, then splay tree, then linked-list, converting "b" as it goes.  
> This would hopefully give us a lot of test coverage for the various
> operations in both modes, and for the conversion routines (in both
> directions, assuming that both directions are supported).

Hmm, unfortunately the splay-tree variant doesn't implement
bitmap_count_bits or bitmap_set_range.

Note some of the missing functionality might need implementation
of a bitmap element iterator (maybe I should transform my
bitmap_tree_to_vec to that instead).

But I'm quite sure bitmaps are extensively tested with GCC
bootstrap ;)

Richard.

> Hope this is constructive
> Dave
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add splay-tree "view" for bitmap
  2018-10-19 13:33           ` Richard Biener
@ 2018-10-19 14:32             ` Richard Biener
  2018-10-20 12:38             ` Richard Sandiford
  1 sibling, 0 replies; 15+ messages in thread
From: Richard Biener @ 2018-10-19 14:32 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, stevenb.gcc

[-- Attachment #1: Type: text/plain, Size: 5239 bytes --]

On Fri, 19 Oct 2018, Richard Biener wrote:

> On Fri, 19 Oct 2018, Richard Sandiford wrote:
> 
> > Richard Biener <rguenther@suse.de> writes:
> > > On October 18, 2018 11:05:32 PM GMT+02:00, Richard Sandiford
> > > <richard.sandiford@arm.com> wrote:
> > >>Richard Biener <rguenther@suse.de> writes:
> > >>> On Thu, 18 Oct 2018, Richard Sandiford wrote:
> > >>>> What's the performance like for more reasonable testcases?  E.g. is
> > >>there
> > >>>> a measurable change either way in --enable-checking=release for some
> > >>gcc
> > >>>> .iis at -g or -O2 -g?
> > >>>
> > >>> I did a quick check on my set of cc1files (still .i, .ii ones tend to
> > >>> be unusable quite quickly...).  Most of them compile too quickly
> > >>> to make any difference appear other than noise.  Multi-second
> > >>differences
> > >>> like for PR63155 should be the exception but our O(n) bitmap
> > >>> implementation really makes some parts of GCC quadratic where it
> > >>> doesn't appear so.
> > >>>
> > >>> Is there a reason you expect it to be ever slower?
> > >>
> > >>During recent stage3s I've tried to look at profiles of cc1plus
> > >>to see whether there was something easy we could do to improve
> > >>compile times.  And bitmap operations always showed up near the
> > >>top of the profile.  There were always some pathological queries
> > >>in which the linear search really hurt, but whenever I tried "simple"
> > >>ways to avoid the obvious cases, they made those queries quicker
> > >>but slowed things down overall.  It seemed that adding any extra logic
> > >>to the queries hurted because even a small slowdown in common lookups
> > >>overwhelmed a big saving in less common lookups.
> > >
> > > Yeah. I also noticed some 'obvious' shortcomings in the heuristics...
> > > I guess in the end well predicted branches in the out of line code are
> > > important...
> > >
> > >>
> > >>But there again I was looking to speed up more "normal" cases, not ones
> > >>like the PR.
> > >>
> > >>FWIW I've tried it on a local x86_64 box and it slows down current
> > >>optabs.ii at -O2 -g by ~0.35% (reproducable).  I see similar slowdowns
> > >>for the other files I tried.  But that's hardly a huge amount, and
> > >>probably a price worth paying for the speed-up in the PR.
> > >
> > > Just to make sure what to reproduce - this is with checking disabled?
> > > And with or without the hunks actually making use of the splay tree
> > > path?
> > 
> > Yeah, with an --enable-checking=release stage3:
> > 
> >    ./cc1plus optabs.ii -o /dev/null -O2 -g
> > 
> > using the optabs.ii from the unpatched --enable-checking=release build.
> > 
> > It was the whole patch vs. without the patch.
> 
> OK, so there are almost no hits from the SSA propagator or out-of-SSA
> but only from "unchanged" paths:
> 
> -    2.90%     2.90%            23  cc1plus  cc1plus           [.] 
> bitmap_set_bâ–’
>    - bitmap_set_bit                                                            
> â–’
>       + 0.79% df_lr_bb_local_compute                                           
> â–’
>       + 0.38% insert_updated_phi_nodes_for                                     
> â–’
>       + 0.27% sched_analyze_reg                                                
> â–’
>       + 0.23% walk_aliased_vdefs_1                                             
> â–’
>       + 0.13% get_continuation_for_phi                                         
> â–’
>       + 0.13% add_control_edge                                                 
> â–’
>       + 0.13% df_md_bb_local_compute_process_def                               
> â–’
>       + 0.13% mark_for_renaming                                                
> â–’
>       + 0.13% (anonymous namespace)::pass_rtl_cprop::execute                   
> â–’
>       + 0.13% compute_dominance_frontiers                                      
> â–’
>       + 0.12% df_simulate_initialize_backwards      
> 
> it's also interesting that most branch misses (for bitmap_set_bit)
> are from
> 
>       bool res = (ptr->bits[word_num] & bit_val) == 0;
>       if (res)
>         ptr->bits[word_num] |= bit_val;
>       return res;
> 
> I'm not sure how "bad" a mispredicted branch is here.  I guess
> if it is predicted to be taken (skip the write) then it's bad
> but if it is predicted the other way around it should be a matter
> of not retiring the store...  But I am not a CPU guy.  I guess
> unconditionally updating the memory wouldn't be so bad after all
> and it might also help combine in using architecture specific
> optimizations like using some CC flags of the OR operation
> to get at the comparison result.  Can you benchmark a difference
> for this?

So I can also see mispredicts on

  if (!head->tree_form)
    element = bitmap_list_find_element (head, indx, usage);
  else
    element = bitmap_tree_find_element (head, indx);

which I could significantly reduce via re-ordering (thus
w/o any branch history this currently gets predicted to the
tree variant).

That suggests to instead of the above runtime checks do
bitmap_tree_bit_p / bitmap_tree_set_bit routines (or do tricks
with C++ and overloading via a tbitmap type).

Would you be more happy with that approach?

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add splay-tree "view" for bitmap
  2018-10-19 13:33           ` Richard Biener
  2018-10-19 14:32             ` Richard Biener
@ 2018-10-20 12:38             ` Richard Sandiford
  2018-10-22 12:17               ` Richard Biener
  1 sibling, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2018-10-20 12:38 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, stevenb.gcc

Richard Biener <rguenther@suse.de> writes:
> On Fri, 19 Oct 2018, Richard Sandiford wrote:
>
>> Richard Biener <rguenther@suse.de> writes:
>> > On October 18, 2018 11:05:32 PM GMT+02:00, Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> >>During recent stage3s I've tried to look at profiles of cc1plus
>> >>to see whether there was something easy we could do to improve
>> >>compile times.  And bitmap operations always showed up near the
>> >>top of the profile.  There were always some pathological queries
>> >>in which the linear search really hurt, but whenever I tried "simple"
>> >>ways to avoid the obvious cases, they made those queries quicker
>> >>but slowed things down overall.  It seemed that adding any extra logic
>> >>to the queries hurted because even a small slowdown in common lookups
>> >>overwhelmed a big saving in less common lookups.
>> >
>> > Yeah. I also noticed some 'obvious' shortcomings in the heuristics...
>> > I guess in the end well predicted branches in the out of line code are
>> > important...
>> >
>> >>
>> >>But there again I was looking to speed up more "normal" cases, not ones
>> >>like the PR.
>> >>
>> >>FWIW I've tried it on a local x86_64 box and it slows down current
>> >>optabs.ii at -O2 -g by ~0.35% (reproducable).  I see similar slowdowns
>> >>for the other files I tried.  But that's hardly a huge amount, and
>> >>probably a price worth paying for the speed-up in the PR.
>> >
>> > Just to make sure what to reproduce - this is with checking disabled?
>> > And with or without the hunks actually making use of the splay tree
>> > path?
>> 
>> Yeah, with an --enable-checking=release stage3:
>> 
>>    ./cc1plus optabs.ii -o /dev/null -O2 -g
>> 
>> using the optabs.ii from the unpatched --enable-checking=release build.
>> 
>> It was the whole patch vs. without the patch.
>
> OK, so there are almost no hits from the SSA propagator or out-of-SSA
> but only from "unchanged" paths:
>
> -    2.90%     2.90%            23  cc1plus  cc1plus           [.] 
> bitmap_set_b▒
>    - bitmap_set_bit                                                            
> ▒
>       + 0.79% df_lr_bb_local_compute                                           
> ▒
>       + 0.38% insert_updated_phi_nodes_for                                     
> ▒
>       + 0.27% sched_analyze_reg                                                
> ▒
>       + 0.23% walk_aliased_vdefs_1                                             
> ▒
>       + 0.13% get_continuation_for_phi                                         
> ▒
>       + 0.13% add_control_edge                                                 
> ▒
>       + 0.13% df_md_bb_local_compute_process_def                               
> ▒
>       + 0.13% mark_for_renaming                                                
> ▒
>       + 0.13% (anonymous namespace)::pass_rtl_cprop::execute                   
> ▒
>       + 0.13% compute_dominance_frontiers                                      
> ▒
>       + 0.12% df_simulate_initialize_backwards      
>
> it's also interesting that most branch misses (for bitmap_set_bit)
> are from
>
>       bool res = (ptr->bits[word_num] & bit_val) == 0;
>       if (res)
>         ptr->bits[word_num] |= bit_val;
>       return res;
>
> I'm not sure how "bad" a mispredicted branch is here.  I guess
> if it is predicted to be taken (skip the write) then it's bad
> but if it is predicted the other way around it should be a matter
> of not retiring the store...  But I am not a CPU guy.  I guess
> unconditionally updating the memory wouldn't be so bad after all
> and it might also help combine in using architecture specific
> optimizations like using some CC flags of the OR operation
> to get at the comparison result.  Can you benchmark a difference
> for this?
>
> Individual compiles of optabs.ii are too fast (<4s for me) to
> make out overall performance changes and noise is quite high
> for me as well (+-6%).

The noise is much lower on my box.  Lucky me I guess :-)

> It looks like you cannot even use perfs precise counters (Haswell here)
> of retired instructions to measure differences reliably.
>
> That is, I cannot really reproduce your findings on optabs.ii...
> for me, the best runtime of both patched and unpatched is the
> same 3.22s.

At least part of the difference I'm seeing seems to come from extra
asserts like:

  gcc_assert (!dst->tree_form && !a->tree_form && !b->tree_form
	      && !kill->tree_form);

in which each test needs a separate test and branch, since it isn't valid
to dereference "a" when dst->tree_form is true, etc.  I see a measurable
difference from changing all of the "foo->tree_form && bar->tree_form"
tests to use "&", or making them all gcc_checking_asserts instead of
gcc_asserts.

Didn't really sound likely at first, but there are ~2.4 million
calls to functions with those kinds of chained test (mostly with
just 2 arguments), so it mounts up.

BTW, in case it sounds otherwise, I'm not objecting to the patch.
Looks like a nice thing to have and I don't think this should hold it up.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add splay-tree "view" for bitmap
  2018-10-20 12:38             ` Richard Sandiford
@ 2018-10-22 12:17               ` Richard Biener
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Biener @ 2018-10-22 12:17 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, stevenb.gcc

[-- Attachment #1: Type: text/plain, Size: 67718 bytes --]

On Sat, 20 Oct 2018, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Fri, 19 Oct 2018, Richard Sandiford wrote:
> >
> >> Richard Biener <rguenther@suse.de> writes:
> >> > On October 18, 2018 11:05:32 PM GMT+02:00, Richard Sandiford
> >> > <richard.sandiford@arm.com> wrote:
> >> >>During recent stage3s I've tried to look at profiles of cc1plus
> >> >>to see whether there was something easy we could do to improve
> >> >>compile times.  And bitmap operations always showed up near the
> >> >>top of the profile.  There were always some pathological queries
> >> >>in which the linear search really hurt, but whenever I tried "simple"
> >> >>ways to avoid the obvious cases, they made those queries quicker
> >> >>but slowed things down overall.  It seemed that adding any extra logic
> >> >>to the queries hurted because even a small slowdown in common lookups
> >> >>overwhelmed a big saving in less common lookups.
> >> >
> >> > Yeah. I also noticed some 'obvious' shortcomings in the heuristics...
> >> > I guess in the end well predicted branches in the out of line code are
> >> > important...
> >> >
> >> >>
> >> >>But there again I was looking to speed up more "normal" cases, not ones
> >> >>like the PR.
> >> >>
> >> >>FWIW I've tried it on a local x86_64 box and it slows down current
> >> >>optabs.ii at -O2 -g by ~0.35% (reproducable).  I see similar slowdowns
> >> >>for the other files I tried.  But that's hardly a huge amount, and
> >> >>probably a price worth paying for the speed-up in the PR.
> >> >
> >> > Just to make sure what to reproduce - this is with checking disabled?
> >> > And with or without the hunks actually making use of the splay tree
> >> > path?
> >> 
> >> Yeah, with an --enable-checking=release stage3:
> >> 
> >>    ./cc1plus optabs.ii -o /dev/null -O2 -g
> >> 
> >> using the optabs.ii from the unpatched --enable-checking=release build.
> >> 
> >> It was the whole patch vs. without the patch.
> >
> > OK, so there are almost no hits from the SSA propagator or out-of-SSA
> > but only from "unchanged" paths:
> >
> > -    2.90%     2.90%            23  cc1plus  cc1plus           [.] 
> > bitmap_set_bâ–’
> >    - bitmap_set_bit                                                            
> > â–’
> >       + 0.79% df_lr_bb_local_compute                                           
> > â–’
> >       + 0.38% insert_updated_phi_nodes_for                                     
> > â–’
> >       + 0.27% sched_analyze_reg                                                
> > â–’
> >       + 0.23% walk_aliased_vdefs_1                                             
> > â–’
> >       + 0.13% get_continuation_for_phi                                         
> > â–’
> >       + 0.13% add_control_edge                                                 
> > â–’
> >       + 0.13% df_md_bb_local_compute_process_def                               
> > â–’
> >       + 0.13% mark_for_renaming                                                
> > â–’
> >       + 0.13% (anonymous namespace)::pass_rtl_cprop::execute                   
> > â–’
> >       + 0.13% compute_dominance_frontiers                                      
> > â–’
> >       + 0.12% df_simulate_initialize_backwards      
> >
> > it's also interesting that most branch misses (for bitmap_set_bit)
> > are from
> >
> >       bool res = (ptr->bits[word_num] & bit_val) == 0;
> >       if (res)
> >         ptr->bits[word_num] |= bit_val;
> >       return res;
> >
> > I'm not sure how "bad" a mispredicted branch is here.  I guess
> > if it is predicted to be taken (skip the write) then it's bad
> > but if it is predicted the other way around it should be a matter
> > of not retiring the store...  But I am not a CPU guy.  I guess
> > unconditionally updating the memory wouldn't be so bad after all
> > and it might also help combine in using architecture specific
> > optimizations like using some CC flags of the OR operation
> > to get at the comparison result.  Can you benchmark a difference
> > for this?
> >
> > Individual compiles of optabs.ii are too fast (<4s for me) to
> > make out overall performance changes and noise is quite high
> > for me as well (+-6%).
> 
> The noise is much lower on my box.  Lucky me I guess :-)
> 
> > It looks like you cannot even use perfs precise counters (Haswell here)
> > of retired instructions to measure differences reliably.
> >
> > That is, I cannot really reproduce your findings on optabs.ii...
> > for me, the best runtime of both patched and unpatched is the
> > same 3.22s.
> 
> At least part of the difference I'm seeing seems to come from extra
> asserts like:
> 
>   gcc_assert (!dst->tree_form && !a->tree_form && !b->tree_form
> 	      && !kill->tree_form);
> 
> in which each test needs a separate test and branch, since it isn't valid
> to dereference "a" when dst->tree_form is true, etc.  I see a measurable
> difference from changing all of the "foo->tree_form && bar->tree_form"
> tests to use "&", or making them all gcc_checking_asserts instead of
> gcc_asserts.

Ah, ok - I made them gcc_assert()s from originally checking ones.
I've now changed them back.

> Didn't really sound likely at first, but there are ~2.4 million
> calls to functions with those kinds of chained test (mostly with
> just 2 arguments), so it mounts up.
> 
> BTW, in case it sounds otherwise, I'm not objecting to the patch.
> Looks like a nice thing to have and I don't think this should hold it up.

OK.  I've now distributed the choice of tree or list find_element
to the three callers that remain when eliding those with known
representation (together with eliminating bitmap_find_bit in favor
of those two).  That's 95% of the way towards having explicit
overloads which is what I do not have time to work on (or think
about a sane API) right now.

I've also switched the test to favor static prediction towards
the list implementation.

A final bootstrap & regtest is running right now, I will install
the patch once that was successful.

Richard.

2018-10-22  Steven Bosscher <steven@gcc.gnu.org>
	Richard Biener  <rguenther@suse.de>

	* bitmap.h: Update data structure documentation, including a
	description of bitmap views as either linked-lists or splay trees.
	(struct bitmap_element_def): Update comments for splay tree bitmaps.
	(struct bitmap_head_def): Likewise.
	(bitmap_list_view, bitmap_tree_view): New prototypes.
	(bitmap_initialize_stat): Initialize a bitmap_head's indx and
	tree_form fields.
	(bmp_iter_set_init): Assert the iterated bitmaps are in list form.
	(bmp_iter_and_init, bmp_iter_and_compl_init): Likewise.
	* bitmap.c (bitmap_elem_to_freelist): Unregister overhead of a
	released bitmap element here.
	(bitmap_element_free): Remove.
	(bitmap_elt_clear_from): Work on splay tree bitmaps.
	(bitmap_list_link_element): Renamed from bitmap_element_link.  Move
	this function similar ones such that linked-list bitmap implementation
	functions are grouped.
	(bitmap_list_unlink_element): Renamed from bitmap_element_unlink,
	and moved for grouping.
	(bitmap_list_insert_element_after): Renamed from
	bitmap_elt_insert_after, and moved for grouping.
	(bitmap_list_find_element): New function spliced from bitmap_find_bit.
	(bitmap_tree_link_left, bitmap_tree_link_right,
	bitmap_tree_rotate_left, bitmap_tree_rotate_right, bitmap_tree_splay,
	bitmap_tree_link_element, bitmap_tree_unlink_element,
	bitmap_tree_find_element): New functions for splay-tree bitmap
	implementation.
	(bitmap_element_link, bitmap_element_unlink, bitmap_elt_insert_after):
	Renamed and moved, see above entries.
	(bitmap_tree_listify_from): New function to convert part of a splay
	tree bitmap to a linked-list bitmap.
	(bitmap_list_view): Convert a splay tree bitmap to linked-list form.
	(bitmap_tree_view): Convert a linked-list bitmap to splay tree form.
	(bitmap_find_bit): Remove.
	(bitmap_clear, bitmap_clear_bit, bitmap_set_bit,
	bitmap_single_bit_set_p, bitmap_first_set_bit, bitmap_last_set_bit):
	Handle splay tree bitmaps.
	(bitmap_copy, bitmap_count_bits, bitmap_and, bitmap_and_into,
	bitmap_elt_copy, bitmap_and_compl, bitmap_and_compl_into,
	bitmap_compl_and_into, bitmap_elt_ior, bitmap_ior, bitmap_ior_into,
	bitmap_xor, bitmap_xor_into, bitmap_equal_p, bitmap_intersect_p,
	bitmap_intersect_compl_p, bitmap_ior_and_compl,
	bitmap_ior_and_compl_into, bitmap_set_range, bitmap_clear_range,
	bitmap_hash): Reject trying to act on splay tree bitmaps.  Make
	corresponding changes to use linked-list specific bitmap_element
	manipulation functions as applicable for efficiency.
	(bitmap_tree_to_vec): New function.
	(debug_bitmap_elt_file): New function split out from ...
	(debug_bitmap_file): ... here.  Handle splay tree bitmaps.
	(bitmap_print): Likewise.

	PR tree-optimization/63155
	* tree-ssa-propagate.c (ssa_prop_init): Use tree-view for the
	SSA edge worklists.
	* tree-ssa-coalesce.c (coalesce_ssa_name): Populate used_in_copies
	in tree-view.

Index: gcc/bitmap.c
===================================================================
--- gcc/bitmap.c	(revision 265317)
+++ gcc/bitmap.c	(working copy)
@@ -26,6 +26,8 @@ along with GCC; see the file COPYING3.
 /* Memory allocation statistics purpose instance.  */
 mem_alloc_description<bitmap_usage> bitmap_mem_desc;
 
+static bitmap_element *bitmap_tree_listify_from (bitmap, bitmap_element *);
+
 /* Register new bitmap.  */
 void
 bitmap_register (bitmap b MEM_STAT_DECL)
@@ -49,22 +51,18 @@ static int bitmap_default_obstack_depth;
 static GTY((deletable)) bitmap_element *bitmap_ggc_free; /* Freelist of
 							    GC'd elements.  */
 
-static void bitmap_elem_to_freelist (bitmap, bitmap_element *);
-static void bitmap_element_free (bitmap, bitmap_element *);
-static bitmap_element *bitmap_element_allocate (bitmap);
-static int bitmap_element_zerop (const bitmap_element *);
-static void bitmap_element_link (bitmap, bitmap_element *);
-static bitmap_element *bitmap_elt_insert_after (bitmap, bitmap_element *, unsigned int);
-static void bitmap_elt_clear_from (bitmap, bitmap_element *);
-static bitmap_element *bitmap_find_bit (bitmap, unsigned int);
 \f
+/* Bitmap memory management.  */
 
-/* Add ELEM to the appropriate freelist.  */
+/* Add ELT to the appropriate freelist.  */
 static inline void
 bitmap_elem_to_freelist (bitmap head, bitmap_element *elt)
 {
   bitmap_obstack *bit_obstack = head->obstack;
 
+  if (GATHER_STATISTICS)
+    register_overhead (head, -((int)sizeof (bitmap_element)));
+
   elt->next = NULL;
   elt->indx = -1;
   if (bit_obstack)
@@ -79,41 +77,6 @@ bitmap_elem_to_freelist (bitmap head, bi
     }
 }
 
-/* Free a bitmap element.  Since these are allocated off the
-   bitmap_obstack, "free" actually means "put onto the freelist".  */
-
-static inline void
-bitmap_element_free (bitmap head, bitmap_element *elt)
-{
-  bitmap_element *next = elt->next;
-  bitmap_element *prev = elt->prev;
-
-  if (prev)
-    prev->next = next;
-
-  if (next)
-    next->prev = prev;
-
-  if (head->first == elt)
-    head->first = next;
-
-  /* Since the first thing we try is to insert before current,
-     make current the next entry in preference to the previous.  */
-  if (head->current == elt)
-    {
-      head->current = next != 0 ? next : prev;
-      if (head->current)
-	head->indx = head->current->indx;
-      else
-	head->indx = 0;
-    }
-
-  if (GATHER_STATISTICS)
-    register_overhead (head, -((int)sizeof (bitmap_element)));
-
-  bitmap_elem_to_freelist (head, elt);
-}
-\f
 /* Allocate a bitmap element.  The bits are cleared, but nothing else is.  */
 
 static inline bitmap_element *
@@ -166,7 +129,8 @@ bitmap_element_allocate (bitmap head)
   return element;
 }
 
-/* Remove ELT and all following elements from bitmap HEAD.  */
+/* Remove ELT and all following elements from bitmap HEAD.
+   Put the released elements in the freelist for HEAD.  */
 
 void
 bitmap_elt_clear_from (bitmap head, bitmap_element *elt)
@@ -174,7 +138,11 @@ bitmap_elt_clear_from (bitmap head, bitm
   bitmap_element *prev;
   bitmap_obstack *bit_obstack = head->obstack;
 
-  if (!elt) return;
+  if (!elt)
+    return;
+
+  if (head->tree_form)
+    elt = bitmap_tree_listify_from (head, elt);
 
   if (GATHER_STATISTICS)
     {
@@ -201,7 +169,7 @@ bitmap_elt_clear_from (bitmap head, bitm
       head->indx = 0;
     }
 
-  /* Put the entire list onto the free list in one operation. */
+  /* Put the entire list onto the freelist in one operation. */
   if (bit_obstack)
     {
       elt->prev = bit_obstack->elements;
@@ -209,18 +177,525 @@ bitmap_elt_clear_from (bitmap head, bitm
     }
   else
     {
-      elt->prev = bitmap_ggc_free;
-      bitmap_ggc_free = elt;
+      elt->prev = bitmap_ggc_free;
+      bitmap_ggc_free = elt;
+    }
+}
+\f
+/* Linked-list view of bitmaps.
+
+   In this representation, the bitmap elements form a double-linked list
+   with elements sorted by increasing index.  */
+
+/* Link the bitmap element into the current bitmap linked list.  */
+
+static inline void
+bitmap_list_link_element (bitmap head, bitmap_element *element)
+{
+  unsigned int indx = element->indx;
+  bitmap_element *ptr;
+
+  gcc_checking_assert (!head->tree_form);
+
+  /* If this is the first and only element, set it in.  */
+  if (head->first == 0)
+    {
+      element->next = element->prev = 0;
+      head->first = element;
+    }
+
+  /* If this index is less than that of the current element, it goes someplace
+     before the current element.  */
+  else if (indx < head->indx)
+    {
+      for (ptr = head->current;
+	   ptr->prev != 0 && ptr->prev->indx > indx;
+	   ptr = ptr->prev)
+	;
+
+      if (ptr->prev)
+	ptr->prev->next = element;
+      else
+	head->first = element;
+
+      element->prev = ptr->prev;
+      element->next = ptr;
+      ptr->prev = element;
+    }
+
+  /* Otherwise, it must go someplace after the current element.  */
+  else
+    {
+      for (ptr = head->current;
+	   ptr->next != 0 && ptr->next->indx < indx;
+	   ptr = ptr->next)
+	;
+
+      if (ptr->next)
+	ptr->next->prev = element;
+
+      element->next = ptr->next;
+      element->prev = ptr;
+      ptr->next = element;
+    }
+
+  /* Set up so this is the first element searched.  */
+  head->current = element;
+  head->indx = indx;
+}
+
+/* Unlink the bitmap element from the current bitmap linked list,
+   and return it to the freelist.  */
+
+static inline void
+bitmap_list_unlink_element (bitmap head, bitmap_element *element)
+{
+  bitmap_element *next = element->next;
+  bitmap_element *prev = element->prev;
+
+  gcc_checking_assert (!head->tree_form);
+
+  if (prev)
+    prev->next = next;
+
+  if (next)
+    next->prev = prev;
+
+  if (head->first == element)
+    head->first = next;
+
+  /* Since the first thing we try is to insert before current,
+     make current the next entry in preference to the previous.  */
+  if (head->current == element)
+    {
+      head->current = next != 0 ? next : prev;
+      if (head->current)
+	head->indx = head->current->indx;
+      else
+	head->indx = 0;
+    }
+
+  bitmap_elem_to_freelist (head, element);
+}
+
+/* Insert a new uninitialized element into bitmap HEAD after element
+   ELT.  If ELT is NULL, insert the element at the start.  Return the
+   new element.  */
+
+static bitmap_element *
+bitmap_list_insert_element_after (bitmap head,
+				  bitmap_element *elt, unsigned int indx)
+{
+  bitmap_element *node = bitmap_element_allocate (head);
+  node->indx = indx;
+
+  gcc_checking_assert (!head->tree_form);
+
+  if (!elt)
+    {
+      if (!head->current)
+	{
+	  head->current = node;
+	  head->indx = indx;
+	}
+      node->next = head->first;
+      if (node->next)
+	node->next->prev = node;
+      head->first = node;
+      node->prev = NULL;
+    }
+  else
+    {
+      gcc_checking_assert (head->current);
+      node->next = elt->next;
+      if (node->next)
+	node->next->prev = node;
+      elt->next = node;
+      node->prev = elt;
+    }
+  return node;
+}
+
+/* Return the element for INDX, or NULL if the element doesn't exist.
+   Update the `current' field even if we can't find an element that  
+   would hold the bitmap's bit to make eventual allocation
+   faster.  */
+
+static inline bitmap_element *
+bitmap_list_find_element (bitmap head, unsigned int indx)
+{
+  bitmap_element *element;
+
+  if (head->current == NULL
+      || head->indx == indx)
+    return head->current;
+
+  if (head->current == head->first
+      && head->first->next == NULL)
+    return NULL;
+
+  /* Usage can be NULL due to allocated bitmaps for which we do not
+     call initialize function.  */
+  bitmap_usage *usage = NULL;
+  if (GATHER_STATISTICS)
+    usage = bitmap_mem_desc.get_descriptor_for_instance (head);
+
+  /* This bitmap has more than one element, and we're going to look
+     through the elements list.  Count that as a search.  */
+  if (GATHER_STATISTICS && usage)
+    usage->m_nsearches++;
+
+  if (head->indx < indx)
+    /* INDX is beyond head->indx.  Search from head->current
+       forward.  */
+    for (element = head->current;
+	 element->next != 0 && element->indx < indx;
+	 element = element->next)
+      {
+	if (GATHER_STATISTICS && usage)
+	  usage->m_search_iter++;
+      }
+
+  else if (head->indx / 2 < indx)
+    /* INDX is less than head->indx and closer to head->indx than to
+       0.  Search from head->current backward.  */
+    for (element = head->current;
+	 element->prev != 0 && element->indx > indx;
+	 element = element->prev)
+      {
+	if (GATHER_STATISTICS && usage)
+	  usage->m_search_iter++;
+      }
+
+  else
+    /* INDX is less than head->indx and closer to 0 than to
+       head->indx.  Search from head->first forward.  */
+    for (element = head->first;
+	 element->next != 0 && element->indx < indx;
+	 element = element->next)
+      {
+	if (GATHER_STATISTICS && usage)
+	  usage->m_search_iter++;
+      }
+
+  /* `element' is the nearest to the one we want.  If it's not the one we
+     want, the one we want doesn't exist.  */
+  gcc_checking_assert (element != NULL);
+  head->current = element;
+  head->indx = element->indx;
+  if (element->indx != indx)
+    element = 0;
+  return element;
+}
+
+\f
+/* Splay-tree view of bitmaps.
+
+   This is an almost one-to-one the implementatin of the simple top-down
+   splay tree in Sleator and Tarjan's "Self-adjusting Binary Search Trees".
+   It is probably not the most efficient form of splay trees, but it should
+   be good enough to experiment with this idea of bitmaps-as-trees.
+   
+   For all functions below, the variable or function argument "t" is a node
+   in the tree, and "e" is a temporary or new node in the tree.  The rest
+   is sufficiently straigh-forward (and very well explained in the paper)
+   that comment would only clutter things.  */
+
+static inline void
+bitmap_tree_link_left (bitmap_element * &t, bitmap_element * &l)
+{
+  l->next = t;
+  l = t;
+  t = t->next;
+}
+
+static inline void
+bitmap_tree_link_right (bitmap_element * &t, bitmap_element * &r)
+{
+  r->prev = t;
+  r = t;
+  t = t->prev;
+}
+
+static inline void
+bitmap_tree_rotate_left (bitmap_element * &t)
+{
+  bitmap_element *e = t->next;
+  t->next = t->next->prev;
+  e->prev = t;
+  t = e;
+}
+
+static inline void
+bitmap_tree_rotate_right (bitmap_element * &t)
+{
+  bitmap_element *e = t->prev;
+  t->prev = t->prev->next;
+  e->next = t;
+  t = e;
+}
+
+static bitmap_element *
+bitmap_tree_splay (bitmap head, bitmap_element *t, unsigned int indx)
+{
+  bitmap_element N, *l, *r;
+
+  if (t == NULL)
+    return NULL;
+
+  bitmap_usage *usage = NULL;
+  if (GATHER_STATISTICS)
+    usage = bitmap_mem_desc.get_descriptor_for_instance (head);
+
+  N.prev = N.next = NULL;
+  l = r = &N;
+
+  while (indx != t->indx)
+    {
+      if (GATHER_STATISTICS && usage)
+	usage->m_search_iter++;
+
+      if (indx < t->indx)
+	{
+	  if (t->prev != NULL && indx < t->prev->indx)
+	    bitmap_tree_rotate_right (t);
+	  if (t->prev == NULL)
+	    break;
+	  bitmap_tree_link_right (t, r);
+	}
+      else if (indx > t->indx)
+	{
+	  if (t->next != NULL && indx > t->next->indx)
+	    bitmap_tree_rotate_left (t);
+	  if (t->next == NULL)
+	    break;
+	  bitmap_tree_link_left (t, l);
+	}
+    }
+
+  l->next = t->prev;
+  r->prev = t->next;
+  t->prev = N.next;
+  t->next = N.prev;
+  return t;
+}
+
+/* Link bitmap element E into the current bitmap splay tree.  */
+
+static inline void
+bitmap_tree_link_element (bitmap head, bitmap_element *e)
+{
+  if (head->first == NULL)
+    e->prev = e->next = NULL;
+  else
+    {
+      bitmap_element *t = bitmap_tree_splay (head, head->first, e->indx);
+      if (e->indx < t->indx)
+	{
+	  e->prev = t->prev;
+	  e->next = t;
+	  t->prev = NULL;
+	}
+      else if (e->indx > t->indx)
+	{
+	  e->next = t->next;
+	  e->prev = t;
+	  t->next = NULL;
+	}
+      else
+	gcc_unreachable ();
+    }
+  head->first = e;
+  head->current = e;
+  head->indx = e->indx;
+}
+
+/* Unlink bitmap element E from the current bitmap splay tree,
+   and return it to the freelist.  */
+
+static void
+bitmap_tree_unlink_element (bitmap head, bitmap_element *e)
+{
+  bitmap_element *t = bitmap_tree_splay (head, head->first, e->indx);
+
+  gcc_checking_assert (t == e);
+
+  if (e->prev == NULL)
+    t = e->next;
+  else
+    {
+      t = bitmap_tree_splay (head, e->prev, e->indx);
+      t->next = e->next;
+    }
+  head->first = t;
+  head->current = t;
+  head->indx = (t != NULL) ? t->indx : 0;
+
+  bitmap_elem_to_freelist (head, e);
+}
+
+/* Return the element for INDX, or NULL if the element doesn't exist.  */
+
+static inline bitmap_element *
+bitmap_tree_find_element (bitmap head, unsigned int indx)
+{
+  if (head->current == NULL
+      || head->indx == indx)
+    return head->current;
+
+  /* Usage can be NULL due to allocated bitmaps for which we do not
+     call initialize function.  */
+  bitmap_usage *usage = NULL;
+  if (GATHER_STATISTICS)
+    usage = bitmap_mem_desc.get_descriptor_for_instance (head);
+
+  /* This bitmap has more than one element, and we're going to look
+     through the elements list.  Count that as a search.  */
+  if (GATHER_STATISTICS && usage)
+    usage->m_nsearches++;
+
+  bitmap_element *element = bitmap_tree_splay (head, head->first, indx);
+  gcc_checking_assert (element != NULL);
+  head->first = element;
+  head->current = element;
+  head->indx = element->indx;
+  if (element->indx != indx)
+    element = 0;
+  return element;
+}
+\f
+/* Converting bitmap views from linked-list to tree and vice versa.  */
+
+/* Splice element E and all elements with a larger index from
+   bitmap HEAD, convert the spliced elements to the linked-list
+   view, and return the head of the list (which should be E again),  */
+
+static bitmap_element *
+bitmap_tree_listify_from (bitmap head, bitmap_element *e)
+{
+  bitmap_element *t, *erb;
+
+  /* Detach the right branch from E (all elements with indx > E->indx),
+     and splay E to the root.  */
+  erb = e->next;
+  e->next = NULL;
+  t = bitmap_tree_splay (head, head->first, e->indx);
+  gcc_checking_assert (t == e);
+
+  /* Because E has no right branch, and we rotated it to the root,
+     the left branch is the new root.  */
+  t = e->prev;
+  head->first = t;
+  head->current = t;
+  head->indx = (t != NULL) ? t->indx : 0;
+
+  /* Detach the tree from E, and re-attach the right branch of E.  */
+  e->prev = NULL;
+  e->next = erb;
+
+  /* The tree is now valid again.  Now we need to "un-tree" E.
+     It is imperative that a non-recursive implementation is used
+     for this, because splay trees have a worst case depth of O(N)
+     for a tree with N nodes.  A recursive implementation could
+     result in a stack overflow for a sufficiently large, unbalanced
+     bitmap tree.  */
+
+  auto_vec<bitmap_element *, 32> stack;
+  auto_vec<bitmap_element *, 32> sorted_elements;
+  bitmap_element *n = e;
+
+  while (true)
+    {
+      while (n != NULL)
+	{
+	  stack.safe_push (n);
+	  n = n->prev;
+	}
+
+      if (stack.is_empty ())
+	break;
+
+      n = stack.pop ();
+      sorted_elements.safe_push (n);
+      n = n->next;
+    }
+
+  gcc_assert (sorted_elements[0] == e);
+
+  bitmap_element *prev = NULL;
+  unsigned ix;
+  FOR_EACH_VEC_ELT (sorted_elements, ix, n)
+    {
+      if (prev != NULL)
+        prev->next = n;
+      n->prev = prev;
+      n->next = NULL;
+      prev = n;
+    }
+
+  return e;
+}
+
+/* Convert bitmap HEAD from splay-tree view to linked-list view.  */
+
+void
+bitmap_list_view (bitmap head)
+{
+  bitmap_element *ptr;
+
+  gcc_assert (head->tree_form);
+
+  ptr = head->first;
+  if (ptr)
+    {
+      while (ptr->prev)
+	bitmap_tree_rotate_right (ptr);
+      head->first = ptr;
+      head->first = bitmap_tree_listify_from (head, ptr);
     }
+
+  head->tree_form = false;
 }
 
-/* Clear a bitmap by freeing the linked list.  */
+/* Convert bitmap HEAD from linked-list view to splay-tree view.
+   This is simply a matter of dropping the prev or next pointers
+   and setting the tree_form flag.  The tree will balance itself
+   if and when it is used.  */
+
+void
+bitmap_tree_view (bitmap head)
+{
+  bitmap_element *ptr;
+
+  gcc_assert (! head->tree_form);
+
+  ptr = head->first;
+  while (ptr)
+    {
+      ptr->prev = NULL;
+      ptr = ptr->next;
+    }
+
+  head->tree_form = true;
+}
+\f
+/* Clear a bitmap by freeing all its elements.  */
 
 void
 bitmap_clear (bitmap head)
 {
-  if (head->first)
-    bitmap_elt_clear_from (head, head->first);
+  if (head->first == NULL)
+    return;
+  if (head->tree_form)
+    {
+      bitmap_element *e, *t;
+      for (e = head->first; e->prev; e = e->prev)
+	/* Loop to find the element with the smallest index.  */ ;
+      t = bitmap_tree_splay (head, head->first, e->indx);
+      gcc_checking_assert (t == e);
+      head->first = t;
+    }
+  bitmap_elt_clear_from (head, head->first);
 }
 \f
 /* Initialize a bitmap obstack.  If BIT_OBSTACK is NULL, initialize
@@ -344,96 +819,6 @@ bitmap_element_zerop (const bitmap_eleme
 #endif
 }
 \f
-/* Link the bitmap element into the current bitmap linked list.  */
-
-static inline void
-bitmap_element_link (bitmap head, bitmap_element *element)
-{
-  unsigned int indx = element->indx;
-  bitmap_element *ptr;
-
-  /* If this is the first and only element, set it in.  */
-  if (head->first == 0)
-    {
-      element->next = element->prev = 0;
-      head->first = element;
-    }
-
-  /* If this index is less than that of the current element, it goes someplace
-     before the current element.  */
-  else if (indx < head->indx)
-    {
-      for (ptr = head->current;
-	   ptr->prev != 0 && ptr->prev->indx > indx;
-	   ptr = ptr->prev)
-	;
-
-      if (ptr->prev)
-	ptr->prev->next = element;
-      else
-	head->first = element;
-
-      element->prev = ptr->prev;
-      element->next = ptr;
-      ptr->prev = element;
-    }
-
-  /* Otherwise, it must go someplace after the current element.  */
-  else
-    {
-      for (ptr = head->current;
-	   ptr->next != 0 && ptr->next->indx < indx;
-	   ptr = ptr->next)
-	;
-
-      if (ptr->next)
-	ptr->next->prev = element;
-
-      element->next = ptr->next;
-      element->prev = ptr;
-      ptr->next = element;
-    }
-
-  /* Set up so this is the first element searched.  */
-  head->current = element;
-  head->indx = indx;
-}
-
-/* Insert a new uninitialized element into bitmap HEAD after element
-   ELT.  If ELT is NULL, insert the element at the start.  Return the
-   new element.  */
-
-static bitmap_element *
-bitmap_elt_insert_after (bitmap head, bitmap_element *elt, unsigned int indx)
-{
-  bitmap_element *node = bitmap_element_allocate (head);
-  node->indx = indx;
-
-  if (!elt)
-    {
-      if (!head->current)
-	{
-	  head->current = node;
-	  head->indx = indx;
-	}
-      node->next = head->first;
-      if (node->next)
-	node->next->prev = node;
-      head->first = node;
-      node->prev = NULL;
-    }
-  else
-    {
-      gcc_checking_assert (head->current);
-      node->next = elt->next;
-      if (node->next)
-	node->next->prev = node;
-      elt->next = node;
-      node->prev = elt;
-    }
-  return node;
-}
-\f
 /* Copy a bitmap to another bitmap.  */
 
 void
@@ -442,6 +827,8 @@ bitmap_copy (bitmap to, const_bitmap fro
   const bitmap_element *from_ptr;
   bitmap_element *to_ptr = 0;
 
+  gcc_checking_assert (!to->tree_form && !from->tree_form);
+
   bitmap_clear (to);
 
   /* Copy elements in forward direction one at a time.  */
@@ -452,8 +839,9 @@ bitmap_copy (bitmap to, const_bitmap fro
       to_elt->indx = from_ptr->indx;
       memcpy (to_elt->bits, from_ptr->bits, sizeof (to_elt->bits));
 
-      /* Here we have a special case of bitmap_element_link, for the case
-	 where we know the links are being entered in sequence.  */
+      /* Here we have a special case of bitmap_list_link_element,
+         for the case where we know the links are being entered
+	 in sequence.  */
       if (to_ptr == 0)
 	{
 	  to->first = to->current = to_elt;
@@ -492,85 +880,18 @@ bitmap_move (bitmap to, bitmap from)
     }
 }
 \f
-/* Find a bitmap element that would hold a bitmap's bit.
-   Update the `current' field even if we can't find an element that
-   would hold the bitmap's bit to make eventual allocation
-   faster.  */
-
-static inline bitmap_element *
-bitmap_find_bit (bitmap head, unsigned int bit)
-{
-  bitmap_element *element;
-  unsigned int indx = bit / BITMAP_ELEMENT_ALL_BITS;
-
-  if (head->current == NULL
-      || head->indx == indx)
-    return head->current;
-  if (head->current == head->first
-      && head->first->next == NULL)
-    return NULL;
-
-  /* Usage can be NULL due to allocated bitmaps for which we do not
-     call initialize function.  */
-  bitmap_usage *usage = NULL;
-  if (GATHER_STATISTICS)
-    usage = bitmap_mem_desc.get_descriptor_for_instance (head);
-
-  /* This bitmap has more than one element, and we're going to look
-     through the elements list.  Count that as a search.  */
-  if (GATHER_STATISTICS && usage)
-    usage->m_nsearches++;
-
-  if (head->indx < indx)
-    /* INDX is beyond head->indx.  Search from head->current
-       forward.  */
-    for (element = head->current;
-	 element->next != 0 && element->indx < indx;
-	 element = element->next)
-      {
-	if (GATHER_STATISTICS && usage)
-	  usage->m_search_iter++;
-      }
-
-  else if (head->indx / 2 < indx)
-    /* INDX is less than head->indx and closer to head->indx than to
-       0.  Search from head->current backward.  */
-    for (element = head->current;
-	 element->prev != 0 && element->indx > indx;
-	 element = element->prev)
-      {
-	if (GATHER_STATISTICS && usage)
-	  usage->m_search_iter++;
-      }
-
-  else
-    /* INDX is less than head->indx and closer to 0 than to
-       head->indx.  Search from head->first forward.  */
-    for (element = head->first;
-	 element->next != 0 && element->indx < indx;
-	 element = element->next)
-      if (GATHER_STATISTICS && usage)
-	{
-	  usage->m_search_iter++;
-	}
-
-  /* `element' is the nearest to the one we want.  If it's not the one we
-     want, the one we want doesn't exist.  */
-  head->current = element;
-  head->indx = element->indx;
-  if (element->indx != indx)
-    element = 0;
-
-  return element;
-}
-\f
 /* Clear a single bit in a bitmap.  Return true if the bit changed.  */
 
 bool
 bitmap_clear_bit (bitmap head, int bit)
 {
-  bitmap_element *const ptr = bitmap_find_bit (head, bit);
+  unsigned int indx = bit / BITMAP_ELEMENT_ALL_BITS;
+  bitmap_element *ptr;
 
+  if (!head->tree_form)
+    ptr = bitmap_list_find_element (head, indx);
+  else
+    ptr = bitmap_tree_find_element (head, indx);
   if (ptr != 0)
     {
       unsigned bit_num  = bit % BITMAP_WORD_BITS;
@@ -583,7 +904,12 @@ bitmap_clear_bit (bitmap head, int bit)
 	  /* If we cleared the entire word, free up the element.  */
 	  if (!ptr->bits[word_num]
 	      && bitmap_element_zerop (ptr))
-	    bitmap_element_free (head, ptr);
+	    {
+	      if (!head->tree_form)
+		bitmap_list_unlink_element (head, ptr);
+	      else
+		bitmap_tree_unlink_element (head, ptr);
+	    }
 	}
 
       return res;
@@ -597,26 +923,32 @@ bitmap_clear_bit (bitmap head, int bit)
 bool
 bitmap_set_bit (bitmap head, int bit)
 {
-  bitmap_element *ptr = bitmap_find_bit (head, bit);
+  unsigned indx = bit / BITMAP_ELEMENT_ALL_BITS;
+  bitmap_element *ptr;
+  if (!head->tree_form)
+    ptr = bitmap_list_find_element (head, indx);
+  else
+    ptr = bitmap_tree_find_element (head, indx);
   unsigned word_num = bit / BITMAP_WORD_BITS % BITMAP_ELEMENT_WORDS;
   unsigned bit_num  = bit % BITMAP_WORD_BITS;
   BITMAP_WORD bit_val = ((BITMAP_WORD) 1) << bit_num;
 
-  if (ptr == 0)
-    {
-      ptr = bitmap_element_allocate (head);
-      ptr->indx = bit / BITMAP_ELEMENT_ALL_BITS;
-      ptr->bits[word_num] = bit_val;
-      bitmap_element_link (head, ptr);
-      return true;
-    }
-  else
+  if (ptr != 0)
     {
       bool res = (ptr->bits[word_num] & bit_val) == 0;
       if (res)
 	ptr->bits[word_num] |= bit_val;
       return res;
     }
+
+  ptr = bitmap_element_allocate (head);
+  ptr->indx = bit / BITMAP_ELEMENT_ALL_BITS;
+  ptr->bits[word_num] = bit_val;
+  if (!head->tree_form)
+    bitmap_list_link_element (head, ptr);
+  else
+    bitmap_tree_link_element (head, ptr);
+  return true;
 }
 
 /* Return whether a bit is set within a bitmap.  */
@@ -624,11 +956,15 @@ bitmap_set_bit (bitmap head, int bit)
 int
 bitmap_bit_p (bitmap head, int bit)
 {
+  unsigned int indx = bit / BITMAP_ELEMENT_ALL_BITS;
   bitmap_element *ptr;
   unsigned bit_num;
   unsigned word_num;
 
-  ptr = bitmap_find_bit (head, bit);
+  if (!head->tree_form)
+    ptr = bitmap_list_find_element (head, indx);
+  else
+    ptr = bitmap_tree_find_element (head, indx);
   if (ptr == 0)
     return 0;
 
@@ -692,6 +1028,7 @@ bitmap_count_bits (const_bitmap a)
   unsigned long count = 0;
   const bitmap_element *elt;
 
+  gcc_checking_assert (!a->tree_form);
   for (elt = a->first; elt; elt = elt->next)
     count += bitmap_count_bits_in_word (elt->bits);
 
@@ -748,9 +1085,11 @@ bitmap_single_bit_set_p (const_bitmap a)
     return false;
 
   elt = a->first;
+
   /* As there are no completely empty bitmap elements, a second one
      means we have more than one bit set.  */
-  if (elt->next != NULL)
+  if (elt->next != NULL
+      && (!a->tree_form || elt->prev != NULL))
     return false;
 
   for (ix = 0; ix != BITMAP_ELEMENT_WORDS; ix++)
@@ -782,6 +1121,11 @@ bitmap_first_set_bit (const_bitmap a)
   unsigned ix;
 
   gcc_checking_assert (elt);
+
+  if (a->tree_form)
+    while (elt->prev)
+      elt = elt->prev;
+
   bit_no = elt->indx * BITMAP_ELEMENT_ALL_BITS;
   for (ix = 0; ix != BITMAP_ELEMENT_WORDS; ix++)
     {
@@ -827,14 +1171,20 @@ bitmap_first_set_bit (const_bitmap a)
 unsigned
 bitmap_last_set_bit (const_bitmap a)
 {
-  const bitmap_element *elt = a->current ? a->current : a->first;
+  const bitmap_element *elt;
   unsigned bit_no;
   BITMAP_WORD word;
   int ix;
 
+  if (a->tree_form)
+    elt = a->first;
+  else
+    elt = a->current ? a->current : a->first;
   gcc_checking_assert (elt);
+
   while (elt->next)
     elt = elt->next;
+
   bit_no = elt->indx * BITMAP_ELEMENT_ALL_BITS;
   for (ix = BITMAP_ELEMENT_WORDS - 1; ix >= 0; ix--)
     {
@@ -876,6 +1226,7 @@ bitmap_and (bitmap dst, const_bitmap a,
   const bitmap_element *b_elt = b->first;
   bitmap_element *dst_prev = NULL;
 
+  gcc_checking_assert (!dst->tree_form && !a->tree_form && !b->tree_form);
   gcc_assert (dst != a && dst != b);
 
   if (a == b)
@@ -897,7 +1248,8 @@ bitmap_and (bitmap dst, const_bitmap a,
 	  BITMAP_WORD ior = 0;
 
 	  if (!dst_elt)
-	    dst_elt = bitmap_elt_insert_after (dst, dst_prev, a_elt->indx);
+	    dst_elt = bitmap_list_insert_element_after (dst, dst_prev,
+							a_elt->indx);
 	  else
 	    dst_elt->indx = a_elt->indx;
 	  for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++)
@@ -934,6 +1286,8 @@ bitmap_and_into (bitmap a, const_bitmap
   bitmap_element *next;
   bool changed = false;
 
+  gcc_checking_assert (!a->tree_form && !b->tree_form);
+
   if (a == b)
     return false;
 
@@ -942,7 +1296,7 @@ bitmap_and_into (bitmap a, const_bitmap
       if (a_elt->indx < b_elt->indx)
 	{
 	  next = a_elt->next;
-	  bitmap_element_free (a, a_elt);
+	  bitmap_list_unlink_element (a, a_elt);
 	  a_elt = next;
 	  changed = true;
 	}
@@ -964,7 +1318,7 @@ bitmap_and_into (bitmap a, const_bitmap
 	    }
 	  next = a_elt->next;
 	  if (!ior)
-	    bitmap_element_free (a, a_elt);
+	    bitmap_list_unlink_element (a, a_elt);
 	  a_elt = next;
 	  b_elt = b_elt->next;
 	}
@@ -1006,7 +1360,8 @@ bitmap_elt_copy (bitmap dst, bitmap_elem
     {
       changed = true;
       if (!dst_elt)
-	dst_elt = bitmap_elt_insert_after (dst, dst_prev, src_elt->indx);
+	dst_elt = bitmap_list_insert_element_after (dst, dst_prev,
+						    src_elt->indx);
       else
 	dst_elt->indx = src_elt->indx;
       memcpy (dst_elt->bits, src_elt->bits, sizeof (dst_elt->bits));
@@ -1028,6 +1383,7 @@ bitmap_and_compl (bitmap dst, const_bitm
   bitmap_element **dst_prev_pnext = &dst->first;
   bool changed = false;
 
+  gcc_checking_assert (!dst->tree_form && !a->tree_form && !b->tree_form);
   gcc_assert (dst != a && dst != b);
 
   if (a == b)
@@ -1076,7 +1432,8 @@ bitmap_and_compl (bitmap dst, const_bitm
 	      bool new_element;
 	      if (!dst_elt || dst_elt->indx > a_elt->indx)
 		{
-		  dst_elt = bitmap_elt_insert_after (dst, dst_prev, a_elt->indx);
+		  dst_elt = bitmap_list_insert_element_after (dst, dst_prev,
+							      a_elt->indx);
 		  new_element = true;
 		}
 	      else
@@ -1098,7 +1455,7 @@ bitmap_and_compl (bitmap dst, const_bitm
 	      else
 	        {
 	          changed |= !new_element;
-		  bitmap_element_free (dst, dst_elt);
+		  bitmap_list_unlink_element (dst, dst_elt);
 		  dst_elt = *dst_prev_pnext;
 		}
 	    }
@@ -1139,6 +1496,8 @@ bitmap_and_compl_into (bitmap a, const_b
   bitmap_element *next;
   BITMAP_WORD changed = 0;
 
+  gcc_checking_assert (!a->tree_form && !b->tree_form);
+
   if (a == b)
     {
       if (bitmap_empty_p (a))
@@ -1173,7 +1532,7 @@ bitmap_and_compl_into (bitmap a, const_b
 	    }
 	  next = a_elt->next;
 	  if (!ior)
-	    bitmap_element_free (a, a_elt);
+	    bitmap_list_unlink_element (a, a_elt);
 	  a_elt = next;
 	  b_elt = b_elt->next;
 	}
@@ -1191,6 +1550,8 @@ bitmap_set_range (bitmap head, unsigned
   bitmap_element *elt, *elt_prev;
   unsigned int i;
 
+  gcc_checking_assert (!head->tree_form);
+
   if (!count)
     return;
 
@@ -1203,9 +1564,9 @@ bitmap_set_range (bitmap head, unsigned
   first_index = start / BITMAP_ELEMENT_ALL_BITS;
   end_bit_plus1 = start + count;
   last_index = (end_bit_plus1 - 1) / BITMAP_ELEMENT_ALL_BITS;
-  elt = bitmap_find_bit (head, start);
+  elt = bitmap_list_find_element (head, first_index);
 
-  /* If bitmap_find_bit returns zero, the current is the closest block
+  /* If bitmap_list_find_element returns zero, the current is the closest block
      to the result.  Otherwise, just use bitmap_element_allocate to
      ensure ELT is set; in the loop below, ELT == NULL means "insert
      at the end of the bitmap".  */
@@ -1213,7 +1574,7 @@ bitmap_set_range (bitmap head, unsigned
     {
       elt = bitmap_element_allocate (head);
       elt->indx = first_index;
-      bitmap_element_link (head, elt);
+      bitmap_list_link_element (head, elt);
     }
 
   gcc_checking_assert (elt->indx == first_index);
@@ -1230,7 +1591,7 @@ bitmap_set_range (bitmap head, unsigned
       unsigned int ix;
 
       if (!elt || elt->indx != i)
-	elt = bitmap_elt_insert_after (head, elt_prev, i);
+	elt = bitmap_list_insert_element_after (head, elt_prev, i);
 
       if (elt_start_bit <= start)
 	{
@@ -1296,6 +1657,8 @@ bitmap_clear_range (bitmap head, unsigne
   unsigned int first_index, end_bit_plus1, last_index;
   bitmap_element *elt;
 
+  gcc_checking_assert (!head->tree_form);
+
   if (!count)
     return;
 
@@ -1308,9 +1671,9 @@ bitmap_clear_range (bitmap head, unsigne
   first_index = start / BITMAP_ELEMENT_ALL_BITS;
   end_bit_plus1 = start + count;
   last_index = (end_bit_plus1 - 1) / BITMAP_ELEMENT_ALL_BITS;
-  elt = bitmap_find_bit (head, start);
+  elt = bitmap_list_find_element (head, first_index);
 
-  /* If bitmap_find_bit returns zero, the current is the closest block
+  /* If bitmap_list_find_element returns zero, the current is the closest block
      to the result.  If the current is less than first index, find the
      next one.  Otherwise, just set elt to be current.  */
   if (!elt)
@@ -1339,7 +1702,7 @@ bitmap_clear_range (bitmap head, unsigne
 
       if (elt_start_bit >= start && elt_end_bit_plus1 <= end_bit_plus1)
 	/* Get rid of the entire elt and go to the next one.  */
-	bitmap_element_free (head, elt);
+	bitmap_list_unlink_element (head, elt);
       else
 	{
 	  /* Going to have to knock out some bits in this elt.  */
@@ -1409,7 +1772,7 @@ bitmap_clear_range (bitmap head, unsigne
 	      }
 	  /* Check to see if there are any bits left.  */
 	  if (clear)
-	    bitmap_element_free (head, elt);
+	    bitmap_list_unlink_element (head, elt);
 	}
       elt = next_elt;
     }
@@ -1431,6 +1794,7 @@ bitmap_compl_and_into (bitmap a, const_b
   bitmap_element *a_prev = NULL;
   bitmap_element *next;
 
+  gcc_checking_assert (!a->tree_form && !b->tree_form);
   gcc_assert (a != b);
 
   if (bitmap_empty_p (a))
@@ -1451,13 +1815,13 @@ bitmap_compl_and_into (bitmap a, const_b
 	  /* A is before B.  Remove A */
 	  next = a_elt->next;
 	  a_prev = a_elt->prev;
-	  bitmap_element_free (a, a_elt);
+	  bitmap_list_unlink_element (a, a_elt);
 	  a_elt = next;
 	}
       else if (!a_elt || b_elt->indx < a_elt->indx)
 	{
 	  /* B is before A.  Copy B. */
-	  next = bitmap_elt_insert_after (a, a_prev, b_elt->indx);
+	  next = bitmap_list_insert_element_after (a, a_prev, b_elt->indx);
 	  memcpy (next->bits, b_elt->bits, sizeof (next->bits));
 	  a_prev = next;
 	  b_elt = b_elt->next;
@@ -1478,7 +1842,7 @@ bitmap_compl_and_into (bitmap a, const_b
 	    }
 	  next = a_elt->next;
 	  if (!ior)
-	    bitmap_element_free (a, a_elt);
+	    bitmap_list_unlink_element (a, a_elt);
 	  else
 	    a_prev = a_elt;
 	  a_elt = next;
@@ -1523,7 +1887,8 @@ bitmap_elt_ior (bitmap dst, bitmap_eleme
 	{
 	  changed = true;
 	  if (!dst_elt)
-	    dst_elt = bitmap_elt_insert_after (dst, dst_prev, a_elt->indx);
+	    dst_elt = bitmap_list_insert_element_after (dst, dst_prev,
+							a_elt->indx);
 	  else
 	    dst_elt->indx = a_elt->indx;
 	  for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++)
@@ -1562,6 +1927,7 @@ bitmap_ior (bitmap dst, const_bitmap a,
   bitmap_element **dst_prev_pnext = &dst->first;
   bool changed = false;
 
+  gcc_checking_assert (!dst->tree_form && !a->tree_form && !b->tree_form);
   gcc_assert (dst != a && dst != b);
 
   while (a_elt || b_elt)
@@ -1610,6 +1976,7 @@ bitmap_ior_into (bitmap a, const_bitmap
   bitmap_element **a_prev_pnext = &a->first;
   bool changed = false;
 
+  gcc_checking_assert (!a->tree_form && !b->tree_form);
   if (a == b)
     return false;
 
@@ -1648,7 +2015,9 @@ bitmap_xor (bitmap dst, const_bitmap a,
   const bitmap_element *b_elt = b->first;
   bitmap_element *dst_prev = NULL;
 
+  gcc_checking_assert (!dst->tree_form && !a->tree_form && !b->tree_form);
   gcc_assert (dst != a && dst != b);
+
   if (a == b)
     {
       bitmap_clear (dst);
@@ -1664,7 +2033,8 @@ bitmap_xor (bitmap dst, const_bitmap a,
 	  BITMAP_WORD ior = 0;
 
 	  if (!dst_elt)
-	    dst_elt = bitmap_elt_insert_after (dst, dst_prev, a_elt->indx);
+	    dst_elt = bitmap_list_insert_element_after (dst, dst_prev,
+							a_elt->indx);
 	  else
 	    dst_elt->indx = a_elt->indx;
 	  for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++)
@@ -1699,7 +2069,8 @@ bitmap_xor (bitmap dst, const_bitmap a,
 	    }
 
 	  if (!dst_elt)
-	    dst_elt = bitmap_elt_insert_after (dst, dst_prev, src->indx);
+	    dst_elt = bitmap_list_insert_element_after (dst, dst_prev,
+							src->indx);
 	  else
 	    dst_elt->indx = src->indx;
 	  memcpy (dst_elt->bits, src->bits, sizeof (dst_elt->bits));
@@ -1724,6 +2095,8 @@ bitmap_xor_into (bitmap a, const_bitmap
   const bitmap_element *b_elt = b->first;
   bitmap_element *a_prev = NULL;
 
+  gcc_checking_assert (!a->tree_form && !b->tree_form);
+
   if (a == b)
     {
       bitmap_clear (a);
@@ -1735,7 +2108,8 @@ bitmap_xor_into (bitmap a, const_bitmap
       if (!a_elt || b_elt->indx < a_elt->indx)
 	{
 	  /* Copy b_elt.  */
-	  bitmap_element *dst = bitmap_elt_insert_after (a, a_prev, b_elt->indx);
+	  bitmap_element *dst = bitmap_list_insert_element_after (a, a_prev,
+								  b_elt->indx);
 	  memcpy (dst->bits, b_elt->bits, sizeof (dst->bits));
 	  a_prev = dst;
 	  b_elt = b_elt->next;
@@ -1763,7 +2137,7 @@ bitmap_xor_into (bitmap a, const_bitmap
 	  if (ior)
 	    a_prev = a_elt;
 	  else
-	    bitmap_element_free (a, a_elt);
+	    bitmap_list_unlink_element (a, a_elt);
 	  a_elt = next;
 	}
     }
@@ -1783,6 +2157,8 @@ bitmap_equal_p (const_bitmap a, const_bi
   const bitmap_element *b_elt;
   unsigned ix;
 
+  gcc_checking_assert (!a->tree_form && !b->tree_form);
+
   for (a_elt = a->first, b_elt = b->first;
        a_elt && b_elt;
        a_elt = a_elt->next, b_elt = b_elt->next)
@@ -1805,6 +2181,8 @@ bitmap_intersect_p (const_bitmap a, cons
   const bitmap_element *b_elt;
   unsigned ix;
 
+  gcc_checking_assert (!a->tree_form && !b->tree_form);
+
   for (a_elt = a->first, b_elt = b->first;
        a_elt && b_elt;)
     {
@@ -1832,6 +2210,9 @@ bitmap_intersect_compl_p (const_bitmap a
   const bitmap_element *a_elt;
   const bitmap_element *b_elt;
   unsigned ix;
+
+  gcc_checking_assert (!a->tree_form && !b->tree_form);
+
   for (a_elt = a->first, b_elt = b->first;
        a_elt && b_elt;)
     {
@@ -1866,6 +2247,8 @@ bitmap_ior_and_compl (bitmap dst, const_
   bitmap_element *dst_prev = NULL;
   bitmap_element **dst_prev_pnext = &dst->first;
 
+  gcc_checking_assert (!dst->tree_form && !a->tree_form && !b->tree_form
+		       && !kill->tree_form);
   gcc_assert (dst != a && dst != b && dst != kill);
 
   /* Special cases.  We don't bother checking for bitmap_equal_p (b, kill).  */
@@ -1958,16 +2341,18 @@ bitmap_ior_and_compl (bitmap dst, const_
   return changed;
 }
 
-/* A |= (FROM1 & ~FROM2).  Return true if A changes.  */
+/* A |= (B & ~C).  Return true if A changes.  */
 
 bool
-bitmap_ior_and_compl_into (bitmap a, const_bitmap from1, const_bitmap from2)
+bitmap_ior_and_compl_into (bitmap a, const_bitmap b, const_bitmap c)
 {
   bitmap_head tmp;
   bool changed;
 
+  gcc_checking_assert (!a->tree_form && !b->tree_form && !c->tree_form);
+
   bitmap_initialize (&tmp, &bitmap_default_obstack);
-  bitmap_and_compl (&tmp, from1, from2);
+  bitmap_and_compl (&tmp, b, c);
   changed = bitmap_ior_into (a, &tmp);
   bitmap_clear (&tmp);
 
@@ -1988,6 +2373,8 @@ bitmap_ior_and_into (bitmap a, const_bit
   bool changed = false;
   unsigned ix;
 
+  gcc_checking_assert (!a->tree_form && !b->tree_form && !c->tree_form);
+
   if (b == c)
     return bitmap_ior_into (a, b);
   if (bitmap_empty_p (b) || bitmap_empty_p (c))
@@ -2054,6 +2441,7 @@ bitmap_ior_and_into (bitmap a, const_bit
 }
 
 /* Compute hash of bitmap (for purposes of hashing).  */
+
 hashval_t
 bitmap_hash (const_bitmap head)
 {
@@ -2061,6 +2449,8 @@ bitmap_hash (const_bitmap head)
   BITMAP_WORD hash = 0;
   int ix;
 
+  gcc_checking_assert (!head->tree_form);
+
   for (ptr = head->first; ptr; ptr = ptr->next)
     {
       hash ^= ptr->indx;
@@ -2071,6 +2461,61 @@ bitmap_hash (const_bitmap head)
 }
 
 \f
+/* Function to obtain a vector of bitmap elements in bit order from
+   HEAD in tree view.  */
+
+static void
+bitmap_tree_to_vec (vec<bitmap_element *> &elts, const_bitmap head)
+{
+  gcc_checking_assert (head->tree_form);
+  auto_vec<bitmap_element *, 32> stack;
+  bitmap_element *e = head->first;
+  while (true)
+    {
+      while (e != NULL)
+	{
+	  stack.safe_push (e);
+	  e = e->prev;
+	}
+      if (stack.is_empty ())
+	break;
+
+      e = stack.pop ();
+      elts.safe_push (e);
+      e = e->next;
+    }
+}
+
+/* Debugging function to print out the contents of a bitmap element.  */
+
+DEBUG_FUNCTION void
+debug_bitmap_elt_file (FILE *file, const bitmap_element *ptr)
+{
+  unsigned int i, j, col = 26;
+
+  fprintf (file, "\t" HOST_PTR_PRINTF " next = " HOST_PTR_PRINTF
+	   " prev = " HOST_PTR_PRINTF " indx = %u\n\t\tbits = {",
+	   (const void*) ptr, (const void*) ptr->next,
+	   (const void*) ptr->prev, ptr->indx);
+
+  for (i = 0; i < BITMAP_ELEMENT_WORDS; i++)
+    for (j = 0; j < BITMAP_WORD_BITS; j++)
+      if ((ptr->bits[i] >> j) & 1)
+	{
+	  if (col > 70)
+	    {
+	      fprintf (file, "\n\t\t\t");
+	      col = 24;
+	    }
+
+	  fprintf (file, " %u", (ptr->indx * BITMAP_ELEMENT_ALL_BITS
+				 + i * BITMAP_WORD_BITS + j));
+	  col += 4;
+	}
+
+  fprintf (file, " }\n");
+}
+
 /* Debugging function to print out the contents of a bitmap.  */
 
 DEBUG_FUNCTION void
@@ -2082,32 +2527,16 @@ debug_bitmap_file (FILE *file, const_bit
 	   " current = " HOST_PTR_PRINTF " indx = %u\n",
 	   (void *) head->first, (void *) head->current, head->indx);
 
-  for (ptr = head->first; ptr; ptr = ptr->next)
+  if (head->tree_form)
     {
-      unsigned int i, j, col = 26;
-
-      fprintf (file, "\t" HOST_PTR_PRINTF " next = " HOST_PTR_PRINTF
-	       " prev = " HOST_PTR_PRINTF " indx = %u\n\t\tbits = {",
-	       (const void*) ptr, (const void*) ptr->next,
-	       (const void*) ptr->prev, ptr->indx);
-
-      for (i = 0; i < BITMAP_ELEMENT_WORDS; i++)
-	for (j = 0; j < BITMAP_WORD_BITS; j++)
-	  if ((ptr->bits[i] >> j) & 1)
-	    {
-	      if (col > 70)
-		{
-		  fprintf (file, "\n\t\t\t");
-		  col = 24;
-		}
-
-	      fprintf (file, " %u", (ptr->indx * BITMAP_ELEMENT_ALL_BITS
-				     + i * BITMAP_WORD_BITS + j));
-	      col += 4;
-	    }
-
-      fprintf (file, " }\n");
+      auto_vec<bitmap_element *, 32> elts;
+      bitmap_tree_to_vec (elts, head);
+      for (unsigned i = 0; i < elts.length (); ++i)
+	debug_bitmap_elt_file (file, elts[i]);
     }
+  else
+    for (ptr = head->first; ptr; ptr = ptr->next)
+      debug_bitmap_elt_file (file, ptr);
 }
 
 /* Function to be called from the debugger to print the contents
@@ -2128,13 +2557,34 @@ bitmap_print (FILE *file, const_bitmap h
 {
   const char *comma = "";
   unsigned i;
-  bitmap_iterator bi;
 
   fputs (prefix, file);
-  EXECUTE_IF_SET_IN_BITMAP (head, 0, i, bi)
+  if (head->tree_form)
+    {
+      auto_vec<bitmap_element *, 32> elts;
+      bitmap_tree_to_vec (elts, head);
+      for (i = 0; i < elts.length (); ++i)
+	for (unsigned ix = 0; ix != BITMAP_ELEMENT_WORDS; ++ix)
+	  {
+	    BITMAP_WORD word = elts[i]->bits[ix];
+	    for (unsigned bit = 0; bit != BITMAP_WORD_BITS; ++bit)
+	      if (word & ((BITMAP_WORD)1 << bit))
+		{
+		  fprintf (file, "%s%d", comma,
+			   (bit + BITMAP_WORD_BITS * ix
+			    + elts[i]->indx * BITMAP_ELEMENT_ALL_BITS));
+		  comma = ", ";
+		}
+	  }
+    }
+  else
     {
-      fprintf (file, "%s%d", comma, i);
-      comma = ", ";
+      bitmap_iterator bi;
+      EXECUTE_IF_SET_IN_BITMAP (head, 0, i, bi)
+	{
+	  fprintf (file, "%s%d", comma, i);
+	  comma = ", ";
+	}
     }
   fputs (suffix, file);
 }
Index: gcc/bitmap.h
===================================================================
--- gcc/bitmap.h	(revision 265317)
+++ gcc/bitmap.h	(working copy)
@@ -20,16 +20,21 @@ along with GCC; see the file COPYING3.
 #ifndef GCC_BITMAP_H
 #define GCC_BITMAP_H
 
-/* Implementation of sparse integer sets as a linked list.
+/* Implementation of sparse integer sets as a linked list or tree.
 
    This sparse set representation is suitable for sparse sets with an
-   unknown (a priori) universe.  The set is represented as a double-linked
-   list of container nodes (struct bitmap_element).  Each node consists
-   of an index for the first member that could be held in the container,
-   a small array of integers that represent the members in the container,
-   and pointers to the next and previous element in the linked list.  The
-   elements in the list are sorted in ascending order, i.e. the head of
+   unknown (a priori) universe.
+
+   Sets are represented as double-linked lists of container nodes of
+   type "struct bitmap_element" or as a binary trees of the same
+   container nodes.  Each container node consists of an index for the
+   first member that could be held in the container, a small array of
+   integers that represent the members in the container, and pointers
+   to the next and previous element in the linked list, or left and
+   right children in the tree.  In linked-list form, the container
+   nodes in the list are sorted in ascending order, i.e. the head of
    the list holds the element with the smallest member of the set.
+   In tree form, nodes to the left have a smaller container index.
 
    For a given member I in the set:
      - the element for I will have index is I / (bits per element)
@@ -42,34 +47,97 @@ along with GCC; see the file COPYING3.
    high storage overhead *per element*, but a small overall overhead if
    the set is very sparse.
 
-   The downside is that many operations are relatively slow because the
-   linked list has to be traversed to test membership (i.e. member_p/
-   add_member/remove_member).  To improve the performance of this set
-   representation, the last accessed element and its index are cached.
-   For membership tests on members close to recently accessed members,
-   the cached last element improves membership test to a constant-time
-   operation.
+   The storage requirements for linked-list sparse sets are O(E), with E->N
+   in the worst case (a sparse set with large distances between the values
+   of the set members).
 
-   The following operations can always be performed in O(1) time:
+   This representation also works well for data flow problems where the size
+   of the set may grow dynamically, but care must be taken that the member_p,
+   add_member, and remove_member operations occur with a suitable access
+   pattern.
+
+   The linked-list set representation works well for problems involving very
+   sparse sets.  The canonical example in GCC is, of course, the "set of
+   sets" for some CFG-based data flow problems (liveness analysis, dominance
+   frontiers, etc.).
+   
+   For random-access sparse sets of unknown universe, the binary tree
+   representation is likely to be a more suitable choice.  Theoretical
+   access times for the binary tree representation are better than those
+   for the linked-list, but in practice this is only true for truely
+   random access.
+
+   Often the most suitable representation during construction of the set
+   is not the best choice for the usage of the set.  For such cases, the
+   "view" of the set can be changed from one representation to the other.
+   This is an O(E) operation:
+
+     * from list to tree view	: bitmap_tree_view
+     * from tree to list view	: bitmap_list_view
+
+   Traversing linked lists or trees can be cache-unfriendly.  Performance
+   can be improved by keeping container nodes in the set grouped together
+   in  memory, using a dedicated obstack for a set (or group of related
+   sets).  Elements allocated on obstacks are released to a free-list and
+   taken off the free list.  If multiple sets are allocated on the same
+   obstack, elements freed from one set may be re-used for one of the other
+   sets.  This usually helps avoid cache misses.
+
+   A single free-list is used for all sets allocated in GGC space.  This is
+   bad for persistent sets, so persistent sets should be allocated on an
+   obstack whenever possible.
+
+   For random-access sets with a known, relatively small universe size, the
+   SparseSet or simple bitmap representations may be more efficient than a
+   linked-list set.
+
+
+   LINKED LIST FORM
+   ================
+
+   In linked-list form, in-order iterations of the set can be executed
+   efficiently.  The downside is that many random-access operations are
+   relatively slow, because the linked list has to be traversed to test
+   membership (i.e. member_p/ add_member/remove_member).
+   
+   To improve the performance of this set representation, the last
+   accessed element and its index are cached.  For membership tests on
+   members close to recently accessed members, the cached last element
+   improves membership test to a constant-time operation.
+
+   The following operations can always be performed in O(1) time in
+   list view:
 
      * clear			: bitmap_clear
+     * smallest_member		: bitmap_first_set_bit
      * choose_one		: (not implemented, but could be
-				   implemented in constant time)
+				   in constant time)
 
-   The following operations can be performed in O(E) time worst-case (with
-   E the number of elements in the linked list), but in O(1) time with a
-   suitable access patterns:
+   The following operations can be performed in O(E) time worst-case in
+   list view (with E the number of elements in the linked list), but in
+   O(1) time with a suitable access patterns:
 
      * member_p			: bitmap_bit_p
-     * add_member		: bitmap_set_bit
-     * remove_member		: bitmap_clear_bit
+     * add_member		: bitmap_set_bit / bitmap_set_range
+     * remove_member		: bitmap_clear_bit / bitmap_clear_range
 
-   The following operations can be performed in O(E) time:
+   The following operations can be performed in O(E) time in list view:
 
      * cardinality		: bitmap_count_bits
-     * set_size			: bitmap_last_set_bit (but this could
+     * largest_member		: bitmap_last_set_bit (but this could
 				  in constant time with a pointer to
 				  the last element in the chain)
+     * set_size			: bitmap_last_set_bit
+
+   In tree view the following operations can all be performed in O(log E)
+   amortized time with O(E) worst-case behavior.
+
+     * smallest_member
+     * largest_member
+     * set_size
+     * member_p
+     * add_member
+     * remove_member
 
    Additionally, the linked-list sparse set representation supports
    enumeration of the members in O(E) time:
@@ -93,39 +161,53 @@ along with GCC; see the file COPYING3.
      * A | (B & ~C)		: bitmap_ior_and_compl /
 				  bitmap_ior_and_compl_into
 
-   The storage requirements for linked-list sparse sets are O(E), with E->N
-   in the worst case (a sparse set with large distances between the values
-   of the set members).
 
-   The linked-list set representation works well for problems involving very
-   sparse sets.  The canonical example in GCC is, of course, the "set of
-   sets" for some CFG-based data flow problems (liveness analysis, dominance
-   frontiers, etc.).
+   BINARY TREE FORM
+   ================
+   An alternate "view" of a bitmap is its binary tree representation.
+   For this representation, splay trees are used because they can be
+   implemented using the same data structures as the linked list, with
+   no overhead for meta-data (like color, or rank) on the tree nodes.
+
+   In binary tree form, random-access to the set is much more efficient
+   than for the linked-list representation.  Downsides are the high cost
+   of clearing the set, and the relatively large number of operations
+   necessary to balance the tree.  Also, iterating the set members is
+   not supported.
    
-   This representation also works well for data flow problems where the size
-   of the set may grow dynamically, but care must be taken that the member_p,
-   add_member, and remove_member operations occur with a suitable access
-   pattern.
-   
-   For random-access sets with a known, relatively small universe size, the
-   SparseSet or simple bitmap representations may be more efficient than a
-   linked-list set.  For random-access sets of unknown universe, a hash table
-   or a balanced binary tree representation is likely to be a more suitable
-   choice.
+   As for the linked-list representation, the last accessed element and
+   its index are cached, so that membership tests on the latest accessed
+   members is a constant-time operation.  Other lookups take O(logE)
+   time amortized (but O(E) time worst-case).
 
-   Traversing linked lists is usually cache-unfriendly, even with the last
-   accessed element cached.
-   
-   Cache performance can be improved by keeping the elements in the set
-   grouped together in memory, using a dedicated obstack for a set (or group
-   of related sets).  Elements allocated on obstacks are released to a
-   free-list and taken off the free list.  If multiple sets are allocated on
-   the same obstack, elements freed from one set may be re-used for one of
-   the other sets.  This usually helps avoid cache misses.
+   The following operations can always be performed in O(1) time:
 
-   A single free-list is used for all sets allocated in GGC space.  This is
-   bad for persistent sets, so persistent sets should be allocated on an
-   obstack whenever possible.  */
+     * choose_one		: (not implemented, but could be
+				   implemented in constant time)
+
+   The following operations can be performed in O(logE) time amortized
+   but O(E) time worst-case, but in O(1) time if the same element is
+   accessed.
+
+     * member_p			: bitmap_bit_p
+     * add_member		: bitmap_set_bit
+     * remove_member		: bitmap_clear_bit
+
+   The following operations can be performed in O(logE) time amortized
+   but O(E) time worst-case:
+
+     * smallest_member		: bitmap_first_set_bit
+     * largest_member		: bitmap_last_set_bit
+     * set_size			: bitmap_last_set_bit
+
+   The following operations can be performed in O(E) time:
+
+     * clear			: bitmap_clear
+
+   The binary tree sparse set representation does *not* support any form
+   of enumeration, and does also *not* support logical operations on sets.
+   The binary tree representation is only supposed to be used for sets
+   on which many random-access membership tests will happen.  */
 
 #include "obstack.h"
 
@@ -225,24 +307,34 @@ struct GTY (()) bitmap_obstack {
    linear in the number of elements to be freed.  */
 
 struct GTY((chain_next ("%h.next"), chain_prev ("%h.prev"))) bitmap_element {
-  struct bitmap_element *next;	/* Next element.  */
-  struct bitmap_element *prev;	/* Previous element.  */
-  unsigned int indx;			/* regno/BITMAP_ELEMENT_ALL_BITS.  */
-  BITMAP_WORD bits[BITMAP_ELEMENT_WORDS]; /* Bits that are set.  */
+  /* In list form, the next element in the linked list;
+     in tree form, the left child node in the tree.  */
+  struct bitmap_element *next;
+  /* In list form, the previous element in the linked list;
+     in tree form, the right child node in the tree.  */
+  struct bitmap_element *prev;
+  /* regno/BITMAP_ELEMENT_ALL_BITS.  */
+  unsigned int indx;
+  /* Bits that are set, counting from INDX, inclusive  */
+  BITMAP_WORD bits[BITMAP_ELEMENT_WORDS];
 };
 
 /* Head of bitmap linked list.  The 'current' member points to something
    already pointed to by the chain started by first, so GTY((skip)) it.  */
 
 struct GTY(()) bitmap_head {
-  unsigned int indx;			/* Index of last element looked at.  */
-  unsigned int descriptor_id;		/* Unique identifier for the allocation
-					   site of this bitmap, for detailed
-					   statistics gathering.  */
-  bitmap_element *first;		/* First element in linked list.  */
-  bitmap_element * GTY((skip(""))) current; /* Last element looked at.  */
-  bitmap_obstack *obstack;		/* Obstack to allocate elements from.
-					   If NULL, then use GGC allocation.  */
+  /* Index of last element looked at.  */
+  unsigned int indx;
+  /* False if the bitmap is in list form; true if the bitmap is in tree form.
+     Bitmap iterators only work on bitmaps in list form.  */
+  bool tree_form;
+  /* In list form, the first element in the linked list;
+     in tree form, the root of the tree.   */
+  bitmap_element *first;
+  /* Last element looked at.  */
+  bitmap_element * GTY((skip(""))) current;
+  /* Obstack to allocate elements from.  If NULL, then use GGC allocation.  */
+  bitmap_obstack *obstack;
   void dump ();
 };
 
@@ -250,6 +342,10 @@ struct GTY(()) bitmap_head {
 extern bitmap_element bitmap_zero_bits;	/* Zero bitmap element */
 extern bitmap_obstack bitmap_default_obstack;   /* Default bitmap obstack */
 
+/* Change the view of the bitmap to list, or tree.  */
+void bitmap_list_view (bitmap);
+void bitmap_tree_view (bitmap);
+
 /* Clear a bitmap by freeing up the linked list.  */
 extern void bitmap_clear (bitmap);
 
@@ -316,10 +412,10 @@ extern bool bitmap_clear_bit (bitmap, in
 /* Set a single bit in a bitmap.  Return true if the bit changed.  */
 extern bool bitmap_set_bit (bitmap, int);
 
-/* Return true if a register is set in a register set.  */
+/* Return true if a bit is set in a bitmap.  */
 extern int bitmap_bit_p (bitmap, int);
 
-/* Debug functions to print a bitmap linked list.  */
+/* Debug functions to print a bitmap.  */
 extern void debug_bitmap (const_bitmap);
 extern void debug_bitmap_file (FILE *, const_bitmap);
 
@@ -339,6 +435,7 @@ static inline void
 bitmap_initialize (bitmap head, bitmap_obstack *obstack CXX_MEM_STAT_INFO)
 {
   head->first = head->current = NULL;
+  head->indx = head->tree_form = 0;
   head->obstack = obstack;
   if (GATHER_STATISTICS)
     bitmap_register (head PASS_MEM_STAT);
@@ -398,6 +495,8 @@ bmp_iter_set_init (bitmap_iterator *bi,
   bi->elt1 = map->first;
   bi->elt2 = NULL;
 
+  gcc_checking_assert (!map->tree_form);
+
   /* Advance elt1 until it is not before the block containing start_bit.  */
   while (1)
     {
@@ -440,6 +539,8 @@ bmp_iter_and_init (bitmap_iterator *bi,
   bi->elt1 = map1->first;
   bi->elt2 = map2->first;
 
+  gcc_checking_assert (!map1->tree_form && !map2->tree_form);
+
   /* Advance elt1 until it is not before the block containing
      start_bit.  */
   while (1)
@@ -498,8 +599,7 @@ bmp_iter_and_init (bitmap_iterator *bi,
   *bit_no = start_bit;
 }
 
-/* Initialize an iterator to iterate over the bits in MAP1 & ~MAP2.
-   */
+/* Initialize an iterator to iterate over the bits in MAP1 & ~MAP2.  */
 
 static inline void
 bmp_iter_and_compl_init (bitmap_iterator *bi,
@@ -509,6 +609,8 @@ bmp_iter_and_compl_init (bitmap_iterator
   bi->elt1 = map1->first;
   bi->elt2 = map2->first;
 
+  gcc_checking_assert (!map1->tree_form && !map2->tree_form);
+
   /* Advance elt1 until it is not before the block containing start_bit.  */
   while (1)
     {
Index: gcc/tree-ssa-coalesce.c
===================================================================
--- gcc/tree-ssa-coalesce.c	(revision 265317)
+++ gcc/tree-ssa-coalesce.c	(working copy)
@@ -1703,9 +1703,11 @@ coalesce_ssa_name (var_map map)
   coalesce_list *cl;
   auto_bitmap used_in_copies;
 
+  bitmap_tree_view (used_in_copies);
   cl = create_coalesce_list_for_region (map, used_in_copies);
   if (map->outofssa_p)
     populate_coalesce_list_for_outofssa (cl, used_in_copies);
+  bitmap_list_view (used_in_copies);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     dump_var_map (dump_file, map);
Index: gcc/tree-ssa-propagate.c
===================================================================
--- gcc/tree-ssa-propagate.c	(revision 265317)
+++ gcc/tree-ssa-propagate.c	(working copy)
@@ -381,6 +381,8 @@ ssa_prop_init (void)
   /* Worklists of SSA edges.  */
   ssa_edge_worklist = BITMAP_ALLOC (NULL);
   ssa_edge_worklist_back = BITMAP_ALLOC (NULL);
+  bitmap_tree_view (ssa_edge_worklist);
+  bitmap_tree_view (ssa_edge_worklist_back);
 
   /* Worklist of basic-blocks.  */
   bb_to_cfg_order = XNEWVEC (int, last_basic_block_for_fn (cfun) + 1);

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-10-22 10:31 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-18 13:58 [PATCH] Add splay-tree "view" for bitmap Richard Biener
2018-10-18 15:43 ` Richard Sandiford
2018-10-18 15:46   ` Richard Biener
2018-10-18 22:27     ` Richard Sandiford
2018-10-19  7:20       ` Richard Biener
2018-10-19  8:44         ` Richard Sandiford
2018-10-19 13:33           ` Richard Biener
2018-10-19 14:32             ` Richard Biener
2018-10-20 12:38             ` Richard Sandiford
2018-10-22 12:17               ` Richard Biener
2018-10-19  9:10         ` Steven Bosscher
2018-10-19 13:14           ` Richard Biener
2018-10-18 15:53 ` David Malcolm
2018-10-19 13:37   ` Richard Biener
2018-10-18 18:01 ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).