public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* RFB: patch to fix PR37534
@ 2008-11-10 17:26 Vladimir Makarov
  2008-11-10 17:53 ` Luis Machado
  0 siblings, 1 reply; 3+ messages in thread
From: Vladimir Makarov @ 2008-11-10 17:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: luisgpm

[-- Attachment #1: Type: text/plain, Size: 1394 bytes --]

Hi, Luis.

Could you test the following patch.  It uses another spill heuristic.  
Usually spill priority is spill cost divided by # of left conflicts for 
the corresponding node in the graph.  Some RA literature mentions 
division by #left conflicts in power 2.  The patch uses a heuristic 
close to the second one.  I don't know why but instead of 17% 
degradation it gives 10% improvement on facerec (for -O2 -mtune=power6).

Actually I tried many things to solve the problem:

o different coalescing algorithms (iterative and optimistic ones)

o usage of union of cover classes for pseudos.  This is one drawback I 
see in IRA with comparison with the old RA.  When pseudo is used only 
for transferring memory value (r<-m ... m<-r), it can be done through 
float or integer registers.  When coloring is not possible for one class 
(e.g. integer registers) we could  use another class (e.g. float registers).

o using coloring with different spill heuristics and choosing the best 
one (Bernstein's approach)

o different spill heuristics in reload.

But the best result is achieved by this patch which is ironically the 
simplest one.

In general the patch does not improve overall SPEC rates on other 
platforms.  Because of NP-complete nature of RA, I don't believe that we 
can achieve better IRA behaviour on all tests but I think we should 
achieve not worse overall SPEC results.




[-- Attachment #2: better-spill-heuristic.patch --]
[-- Type: text/x-patch, Size: 11599 bytes --]

Index: ira-color.c
===================================================================
--- ira-color.c	(revision 141500)
+++ ira-color.c	(working copy)
@@ -84,6 +84,78 @@ static VEC(ira_allocno_t,heap) *removed_
 
 \f
 
+/* Array whose elements contain accumulated live ranges length for the
+   corresponding allocno.  */
+static int *allocno_live_range_lengths;
+
+/* Return accumulated live ranges length of ALLOCNO.  */
+static int
+get_allocno_live_range_length (ira_allocno_t allocno)
+{
+  int allocno_lr = 0;
+  allocno_live_range_t r;
+  
+  for (r = ALLOCNO_LIVE_RANGES (allocno); r != NULL; r = r->next)
+    allocno_lr += r->finish - r->start + 1;
+  ira_assert (allocno_lr >= 0);
+  return allocno_lr;
+}
+
+/* Initiate array allocno_live_range_lengths.  */
+static void
+initiate_allocno_live_range_lengths (void)
+{
+  int i, length;
+  ira_allocno_t a, parent_a;
+  ira_loop_tree_node_t parent;
+
+  allocno_live_range_lengths
+    = (int *) ira_allocate (ira_allocnos_num * sizeof (int));
+  memset (allocno_live_range_lengths, 0, ira_allocnos_num * sizeof (int));
+  for (i = max_reg_num () - 1; i >= FIRST_PSEUDO_REGISTER; i--)
+    for (a = ira_regno_allocno_map[i];
+	 a != NULL;
+	 a = ALLOCNO_NEXT_REGNO_ALLOCNO (a))
+      {
+	length = get_allocno_live_range_length (a);
+	allocno_live_range_lengths[ALLOCNO_NUM (a)] += length;
+	if ((parent_a = ALLOCNO_CAP (a)) != NULL
+	    || ((parent = ALLOCNO_LOOP_TREE_NODE (a)->parent) != NULL
+		&& (parent_a = parent->regno_allocno_map[i]) != NULL))
+	  allocno_live_range_lengths[ALLOCNO_NUM (parent_a)]
+	    += allocno_live_range_lengths[ALLOCNO_NUM (a)];
+      }
+}
+
+/* Free array allocno_live_range_lengths.  */
+static void
+finish_allocno_live_range_lengths (void)
+{
+  ira_free (allocno_live_range_lengths);
+}
+
+\f
+
+/* Cost multiplier to use to calculate allocno spill priority more
+   accurately.  */
+static int cost_multiplier;
+
+/* Return spill priority of allocno A.  */
+static int
+allocno_spill_priority (ira_allocno_t a)
+{
+  int cost;
+
+  cost = ALLOCNO_TEMP (a) * cost_multiplier;
+  return (cost
+	  / (ALLOCNO_LEFT_CONFLICTS_NUM (a)
+	     * ALLOCNO_EXCESS_PRESSURE_POINTS_NUM (a)
+	     * ira_reg_class_nregs[ALLOCNO_COVER_CLASS (a)][ALLOCNO_MODE (a)]
+	     + 1));
+}
+
+\f
+
 /* This page contains functions used to choose hard registers for
    allocnos.  */
 
@@ -374,6 +446,9 @@ print_coalesced_allocno (ira_allocno_t a
     }
 }
 
+/* The current cost of the allocation.  */
+static int curr_loop_coloring_cost;
+
 /* Choose a hard register for ALLOCNO (or for all coalesced allocnos
    represented by ALLOCNO).  If RETRY_P is TRUE, it means that the
    function called from function `ira_reassign_conflict_allocnos' and
@@ -532,10 +607,9 @@ assign_hard_reg (ira_allocno_t allocno, 
 	  cost += add_cost;
 	  full_cost += add_cost;
 	}
-      if (min_cost > cost)
-	min_cost = cost;
       if (min_full_cost > full_cost)
 	{
+	  min_cost = cost;
 	  min_full_cost = full_cost;
 	  best_hard_regno = hard_regno;
 	  ira_assert (hard_regno >= 0);
@@ -576,8 +650,13 @@ assign_hard_reg (ira_allocno_t allocno, 
 	}
       return false;
     }
-  if (best_hard_regno >= 0)
-    allocated_hardreg_p[best_hard_regno] = true;
+  if (best_hard_regno < 0)
+    curr_loop_coloring_cost += mem_cost;
+  else
+    {
+      allocated_hardreg_p[best_hard_regno] = true;
+      curr_loop_coloring_cost += min_cost;
+    }
   for (a = ALLOCNO_NEXT_COALESCED_ALLOCNO (allocno);;
        a = ALLOCNO_NEXT_COALESCED_ALLOCNO (a))
     {
@@ -775,6 +854,26 @@ static splay_tree uncolorable_allocnos_s
    into account.  */
 #define USE_SPLAY_P(CLASS) (uncolorable_allocnos_num[CLASS] > 4000)
 
+/* Print the splay tree node NODE to stderr.  */
+static int
+print_splay_tree_node (splay_tree_node node, void *data ATTRIBUTE_UNUSED)
+{
+  ira_allocno_t a = (ira_allocno_t) node->key;
+
+  fprintf (stderr, "a%d(%d)\n", ALLOCNO_NUM (a), allocno_spill_priority (a));
+  return 0;
+}
+
+extern void debug_splay_tree (splay_tree tree);
+
+/* Print the splay tree to stderr.  */
+void
+debug_splay_tree (splay_tree tree)
+{
+  splay_tree_foreach (tree, print_splay_tree_node, NULL);
+  fprintf (stderr, "\n");
+}
+
 /* Put ALLOCNO onto the coloring stack without removing it from its
    bucket.  Pushing allocno to the coloring stack can result in moving
    conflicting allocnos from the uncolorable bucket to the colorable
@@ -995,18 +1094,15 @@ allocno_spill_priority_compare (splay_tr
   int pri1, pri2, diff;
   ira_allocno_t a1 = (ira_allocno_t) k1, a2 = (ira_allocno_t) k2;
   
-  pri1 = (ALLOCNO_TEMP (a1)
-	  / (ALLOCNO_LEFT_CONFLICTS_NUM (a1)
-	     * ira_reg_class_nregs[ALLOCNO_COVER_CLASS (a1)][ALLOCNO_MODE (a1)]
-	     + 1));
-  pri2 = (ALLOCNO_TEMP (a2)
-	  / (ALLOCNO_LEFT_CONFLICTS_NUM (a2)
-	     * ira_reg_class_nregs[ALLOCNO_COVER_CLASS (a2)][ALLOCNO_MODE (a2)]
-	     + 1));
+  pri1 = allocno_spill_priority (a1);
+  pri2 = allocno_spill_priority (a2);
   if ((diff = pri1 - pri2) != 0)
     return diff;
   if ((diff = ALLOCNO_TEMP (a1) - ALLOCNO_TEMP (a2)) != 0)
     return diff;
+  if ((diff = (allocno_live_range_lengths[ALLOCNO_NUM (a2)]
+	       - allocno_live_range_lengths[ALLOCNO_NUM (a1)])) != 0)
+    return diff;
   return ALLOCNO_NUM (a1) - ALLOCNO_NUM (a2);
 }
 
@@ -1049,10 +1145,12 @@ push_allocnos_to_stack (void)
 {
   ira_allocno_t allocno, a, i_allocno, *allocno_vec;
   enum reg_class cover_class, rclass;
-  int allocno_pri, i_allocno_pri, allocno_cost, i_allocno_cost;
+  int allocno_pri, i_allocno_pri;
+  int allocno_cost, i_allocno_cost, allocno_lr, i_allocno_lr;
   int i, j, num, cover_class_allocnos_num[N_REG_CLASSES];
   ira_allocno_t *cover_class_allocnos[N_REG_CLASSES];
-  int cost;
+  int cost, max_cost;
+  bool set_p;
 
   /* Initialize.  */
   VEC_truncate(ira_allocno_t, removed_splay_allocno_vec, 0);
@@ -1063,6 +1161,7 @@ push_allocnos_to_stack (void)
       cover_class_allocnos[cover_class] = NULL;
       uncolorable_allocnos_splay_tree[cover_class] = NULL;
     }
+  max_cost = 0;
   /* Calculate uncolorable allocno spill costs.  */
   for (allocno = uncolorable_allocno_bucket;
        allocno != NULL;
@@ -1081,7 +1180,12 @@ push_allocnos_to_stack (void)
 	/* ??? Remove cost of copies between the coalesced
 	   allocnos.  */
 	ALLOCNO_TEMP (allocno) = cost;
+	if (cost < 0)
+	  cost = -cost;
+	if (cost > max_cost)
+	  max_cost = cost;
       }
+  cost_multiplier = max_cost == 0 ? 1 : INT_MAX / max_cost;
   /* Define place where to put uncolorable allocnos of the same cover
      class.  */
   for (num = i = 0; i < ira_reg_class_cover_size; i++)
@@ -1156,7 +1260,7 @@ push_allocnos_to_stack (void)
 	  ira_assert (num > 0);
 	  allocno_vec = cover_class_allocnos[cover_class];
 	  allocno = NULL;
-	  allocno_pri = allocno_cost = 0;
+	  allocno_pri = allocno_cost = 0; allocno_lr = -1;
 	  /* Sort uncolorable allocno to find the one with the lowest
 	     spill cost.  */
 	  for (i = 0, j = num - 1; i <= j;)
@@ -1172,35 +1276,35 @@ push_allocnos_to_stack (void)
 	      if (ALLOCNO_IN_GRAPH_P (i_allocno))
 		{
 		  i++;
-		  if (ALLOCNO_TEMP (i_allocno) == INT_MAX)
+		  i_allocno_cost = ALLOCNO_TEMP (i_allocno);
+		  i_allocno_pri = allocno_spill_priority (i_allocno);
+		  i_allocno_lr = -1;
+		  if (allocno == NULL)
+		    set_p = true;
+		  else if (allocno_pri > i_allocno_pri)
+		    set_p = true;
+		  else if (allocno_pri != i_allocno_pri)
+		    set_p = false;
+		  else if (allocno_cost > i_allocno_cost)
+		    set_p = true;
+		  else if (allocno_cost != i_allocno_cost)
+		    set_p = false;
+		  else
 		    {
-		      ira_allocno_t a;
-		      int cost = 0;
-		      
-		      for (a = ALLOCNO_NEXT_COALESCED_ALLOCNO (i_allocno);;
-			   a = ALLOCNO_NEXT_COALESCED_ALLOCNO (a))
-			{
-			  cost += calculate_allocno_spill_cost (i_allocno);
-			  if (a == i_allocno)
-			    break;
-			}
-		      /* ??? Remove cost of copies between the coalesced
-			 allocnos.  */
-		      ALLOCNO_TEMP (i_allocno) = cost;
+		      i_allocno_lr
+			= allocno_live_range_lengths[ALLOCNO_NUM (i_allocno)];
+		      if (allocno_lr < 0)
+			allocno_lr
+			  = allocno_live_range_lengths[ALLOCNO_NUM (allocno)];
+		      if (allocno_lr < i_allocno_lr)
+			set_p = true;
+		      else if (allocno_lr != i_allocno_lr)
+			set_p = false;
+		      else
+			set_p
+			  = ALLOCNO_NUM (allocno) > ALLOCNO_NUM (i_allocno);
 		    }
-		  i_allocno_cost = ALLOCNO_TEMP (i_allocno);
-		  i_allocno_pri
-		    = (i_allocno_cost
-		       / (ALLOCNO_LEFT_CONFLICTS_NUM (i_allocno)
-			  * ira_reg_class_nregs[ALLOCNO_COVER_CLASS
-						(i_allocno)]
-			  [ALLOCNO_MODE (i_allocno)] + 1));
-		  if (allocno == NULL || allocno_pri > i_allocno_pri
-		      || (allocno_pri == i_allocno_pri
-			  && (allocno_cost > i_allocno_cost
-			      || (allocno_cost == i_allocno_cost 
-				  && (ALLOCNO_NUM (allocno)
-				      > ALLOCNO_NUM (i_allocno))))))
+		  if (set_p)
 		    {
 		      allocno = i_allocno;
 		      allocno_cost = i_allocno_cost;
@@ -1260,16 +1364,23 @@ pop_allocnos_from_stack (void)
 	  if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
 	    fprintf (ira_dump_file, "assign memory\n");
 	}
-      else if (assign_hard_reg (allocno, false))
-	{
-	  if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
-	    fprintf (ira_dump_file, "assign reg %d\n",
-		     ALLOCNO_HARD_REGNO (allocno));
-	}
-      else if (ALLOCNO_ASSIGNED_P (allocno))
+      else
 	{
-	  if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
-	    fprintf (ira_dump_file, "spill\n");
+	  int cost_before = curr_loop_coloring_cost;
+
+	  if (assign_hard_reg (allocno, false))
+	    {
+	      if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
+		fprintf (ira_dump_file, "assign reg %d (cost %d)\n",
+			 ALLOCNO_HARD_REGNO (allocno),
+			 curr_loop_coloring_cost - cost_before);
+	    }
+	  else if (ALLOCNO_ASSIGNED_P (allocno))
+	    {
+	      if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
+		fprintf (ira_dump_file, "spill (cost %d)\n",
+			 curr_loop_coloring_cost - cost_before);
+	    }
 	}
       ALLOCNO_IN_GRAPH_P (allocno) = true;
     }
@@ -1619,6 +1730,15 @@ color_allocnos (void)
   /* Put the allocnos into the corresponding buckets.  */
   colorable_allocno_bucket = NULL;
   uncolorable_allocno_bucket = NULL;
+  curr_loop_coloring_cost = 0;
+  EXECUTE_IF_SET_IN_BITMAP (coloring_allocno_bitmap, 0, i, bi)
+    {
+      a = ira_allocnos[i];
+      ALLOCNO_HARD_REGNO (a) = -1;
+      ALLOCNO_MAY_BE_SPILLED_P (a) = false;
+      ALLOCNO_ASSIGNED_P (a) = false;
+      ALLOCNO_SPLAY_REMOVED_P (a) = false;
+    }
   EXECUTE_IF_SET_IN_BITMAP (coloring_allocno_bitmap, 0, i, bi)
     {
       a = ira_allocnos[i];
@@ -1722,6 +1842,8 @@ color_pass (ira_loop_tree_node_t loop_tr
     }
   /* Color all mentioned allocnos including transparent ones.  */
   color_allocnos ();
+  if (internal_flag_ira_verbose > 1 && ira_dump_file != NULL)
+    fprintf (ira_dump_file, "  Cost %d\n", curr_loop_coloring_cost);
   /* Process caps.  They are processed just once.  */
   if (flag_ira_algorithm == IRA_ALGORITHM_MIXED
       || flag_ira_algorithm == IRA_ALGORITHM_REGIONAL)
@@ -1863,8 +1985,12 @@ do_coloring (void)
   if (internal_flag_ira_verbose > 0 && ira_dump_file != NULL)
     fprintf (ira_dump_file, "\n**** Allocnos coloring:\n\n");
   
+  initiate_allocno_live_range_lengths ();
+
   ira_traverse_loop_tree (false, ira_loop_tree_root, color_pass, NULL);
 
+  finish_allocno_live_range_lengths ();
+  
   if (internal_flag_ira_verbose > 1 && ira_dump_file != NULL)
     ira_print_disposition (ira_dump_file);
 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: RFB: patch to fix PR37534
  2008-11-10 17:26 RFB: patch to fix PR37534 Vladimir Makarov
@ 2008-11-10 17:53 ` Luis Machado
  2008-11-11 15:31   ` Luis Machado
  0 siblings, 1 reply; 3+ messages in thread
From: Luis Machado @ 2008-11-10 17:53 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: gcc-patches

Thanks Vladimir. I'll give it a try and will report back.

Regards,
Luis

On Mon, 2008-11-10 at 12:13 -0500, Vladimir Makarov wrote:
> Hi, Luis.
> 
> Could you test the following patch.  It uses another spill heuristic.  
> Usually spill priority is spill cost divided by # of left conflicts for 
> the corresponding node in the graph.  Some RA literature mentions 
> division by #left conflicts in power 2.  The patch uses a heuristic 
> close to the second one.  I don't know why but instead of 17% 
> degradation it gives 10% improvement on facerec (for -O2 -mtune=power6).
> 
> Actually I tried many things to solve the problem:
> 
> o different coalescing algorithms (iterative and optimistic ones)
> 
> o usage of union of cover classes for pseudos.  This is one drawback I 
> see in IRA with comparison with the old RA.  When pseudo is used only 
> for transferring memory value (r<-m ... m<-r), it can be done through 
> float or integer registers.  When coloring is not possible for one class 
> (e.g. integer registers) we could  use another class (e.g. float registers).
> 
> o using coloring with different spill heuristics and choosing the best 
> one (Bernstein's approach)
> 
> o different spill heuristics in reload.
> 
> But the best result is achieved by this patch which is ironically the 
> simplest one.
> 
> In general the patch does not improve overall SPEC rates on other 
> platforms.  Because of NP-complete nature of RA, I don't believe that we 
> can achieve better IRA behaviour on all tests but I think we should 
> achieve not worse overall SPEC results.
> 
> 
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: RFB: patch to fix PR37534
  2008-11-10 17:53 ` Luis Machado
@ 2008-11-11 15:31   ` Luis Machado
  0 siblings, 0 replies; 3+ messages in thread
From: Luis Machado @ 2008-11-11 15:31 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: gcc-patches

Vladimir,

The results look good. There's overall improvement for CPU2K, and some
small one digit degradations here and there for a few INT benchmarks
(some are probably noise).

Facerec is up around 22% for 32-bit and 23% for 64-bit.

32-bit

INT base: -0.12%
INT peak: -0.56%
FP base: 2.08%
FP peak: 1.89%
Overall INT: -0.34%
Overall FP: 1.99%
Overall: 0.82%

64-bit

INT base: -0.49%
INT peak: 0.05%
FP base: 3.46%
FP peak: 1.39%
Overall INT: -0.22%
Overall FP: 2.42%
Overall: 1.09%


Thanks,
Luis

On Mon, 2008-11-10 at 15:25 -0200, Luis Machado wrote:
> Thanks Vladimir. I'll give it a try and will report back.
> 
> Regards,
> Luis
> 
> On Mon, 2008-11-10 at 12:13 -0500, Vladimir Makarov wrote:
> > Hi, Luis.
> > 
> > Could you test the following patch.  It uses another spill heuristic.  
> > Usually spill priority is spill cost divided by # of left conflicts for 
> > the corresponding node in the graph.  Some RA literature mentions 
> > division by #left conflicts in power 2.  The patch uses a heuristic 
> > close to the second one.  I don't know why but instead of 17% 
> > degradation it gives 10% improvement on facerec (for -O2 -mtune=power6).
> > 
> > Actually I tried many things to solve the problem:
> > 
> > o different coalescing algorithms (iterative and optimistic ones)
> > 
> > o usage of union of cover classes for pseudos.  This is one drawback I 
> > see in IRA with comparison with the old RA.  When pseudo is used only 
> > for transferring memory value (r<-m ... m<-r), it can be done through 
> > float or integer registers.  When coloring is not possible for one class 
> > (e.g. integer registers) we could  use another class (e.g. float registers).
> > 
> > o using coloring with different spill heuristics and choosing the best 
> > one (Bernstein's approach)
> > 
> > o different spill heuristics in reload.
> > 
> > But the best result is achieved by this patch which is ironically the 
> > simplest one.
> > 
> > In general the patch does not improve overall SPEC rates on other 
> > platforms.  Because of NP-complete nature of RA, I don't believe that we 
> > can achieve better IRA behaviour on all tests but I think we should 
> > achieve not worse overall SPEC results.
> > 
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-11-11 15:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-10 17:26 RFB: patch to fix PR37534 Vladimir Makarov
2008-11-10 17:53 ` Luis Machado
2008-11-11 15:31   ` Luis Machado

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).