public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC] reg-class algorithm tweek
@ 2002-12-16 12:21 Jan Hubicka
  0 siblings, 0 replies; only message in thread
From: Jan Hubicka @ 2002-12-16 12:21 UTC (permalink / raw)
  To: gcc-patches, rth


Hi,
I tried the idea of working on "atomic" classes for regclass (ie make regclass
to collect array of the smallest classes returned via REGNO_REG_CLASS, compute
costs purely on these and the propagate them back into the "union" classes).

I believe it can solve the problem we are having with multiple alternative
constratins.  IE when the constraint is "r,f" and both alternatives are cheap,
the preferred class "rf".  In the next iteration the alternative 0 gets added
cost of moving rf->r, that is the worst case, ie cost of moving f->r and
alternative1 gets cost of moving f->r.  Whatever cost is cheaper, alternative
wins, so we constrain register allocator into eighter "r" or "f".

I am having patch for this but we never got into conclusion on it as it breaks
the property that union classes have cost of maximum of the subclasses, as
class "r" is cheap, class "f" is cheap, but "rf" is expensive.

With the new scheme, there is no "rf" class to be computed incorrectly and
whole concept gets cleaner as it is close to the regclass computing costs of
individual hard registers, just grouping together those that are truly
equivalent for the regalloc.

It also scales better, as currently we need 2^n (where n in number of atomic
classes) classes to represent all possible unions and when we omit some, the
algorithm starts to behave increasingly crazilly.

Additionally it allows us to make number of tables smaller. Thats about the
positives.

Now the problems.  At first it adds some new constraint on how register classes
should behave.  For instance the algorithm gets confused when class 0 is
refined to contain r0 and r1, while class 2 contains r1 and r2.  There is no
good class to return in REGNO_REG_CLASS for r1 so it is already quite broken,
on the other hand, i386 do have exactly this for rbp, that is only intersection
of NON_Q_REGS and INDEX_REGS.  We would need here "BP_REG" class to solve it.

With such funny definition of classes it is interesting problem how to compute
preffered and alternate classes from the set.  At the moment I am looking for
the smallest class that is supperset of all atomic classes matching the
criteria that is bit too optimistic but should work.  Next we can simply change
the interface to store hard-reg-sets instead of classes (or even invent
reg-class-sets).  It will make some of regalloc more accurate too.

Another problem are the wide registers.  For DImode register I don't consider
class AD_REGS on i386, but class A_REGS and D_REGS individually.  How to behave
here?  We can make register_move_cost and friends to simply deal with A_REGS as
if it was AD_REGS and teach the combining algorithm to realize that it is
really AD_REGS but it is quite confusing.

Alternativly we can have different set of atomic classes for each mode.  That
would make the n^2 algorithms even cheaper but will make things somewhat more
crazy.

Any toughts?  Does this all seem to make sense?

*** regclass.c.old	Mon Dec 16 02:18:07 2002
--- regclass.c.new	Mon Dec 16 22:57:51 2002
*************** static const unsigned int_reg_class_cont
*** 162,167 ****
--- 162,170 ----
  
  unsigned int reg_class_size[N_REG_CLASSES];
  
+ int n_atomic_classes;
+ enum reg_class atomic_class[N_REG_CLASSES];
+ 
  /* For each reg class, table listing all the containing classes.  */
  
  enum reg_class reg_class_superclasses[N_REG_CLASSES][N_REG_CLASSES];
*************** init_reg_sets_1 ()
*** 565,570 ****
--- 568,586 ----
        }
      }
  #endif /* CLASS_CANNOT_CHANGE_MODE */
+ 
+   n_atomic_classes = 0;
+   for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+      if (!fixed_regs[i])
+        {
+ 	 enum reg_class class = REGNO_REG_CLASS (i);
+ 	 int y;
+ 	 for (y = 0; y < n_atomic_classes; y++)
+ 	   if (atomic_class[y] == class)
+ 	     break;
+ 	 if (y == n_atomic_classes)
+ 	   atomic_class[n_atomic_classes++] = class;
+        }
  }
  
  /* Compute the table of register modes.
*************** dump_regclass (dump)
*** 942,964 ****
    int i;
    for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
      {
-       int /* enum reg_class */ class;
        if (REG_N_REFS (i))
  	{
  	  fprintf (dump, "  Register %i costs:", i);
! 	  for (class = 0; class < (int) N_REG_CLASSES; class++)
! 	    if (contains_reg_of_mode [(enum reg_class) class][PSEUDO_REGNO_MODE (i)]
  #ifdef FORBIDDEN_INC_DEC_CLASSES
! 		&& (!in_inc_dec[i]
! 		    || !forbidden_inc_dec_class[(enum reg_class) class])
  #endif
  #ifdef CLASS_CANNOT_CHANGE_MODE
! 		&& (!REGNO_REG_SET_P (reg_changes_mode, i)
! 		     || class_can_change_mode [(enum reg_class) class])
  #endif
! 		)
! 	    fprintf (dump, " %s:%i", reg_class_names[class],
! 		     costs[i].cost[(enum reg_class) class]);
  	  fprintf (dump, " MEM:%i\n", costs[i].mem_cost);
  	}
      }
--- 958,984 ----
    int i;
    for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
      {
        if (REG_N_REFS (i))
  	{
+ 	  int n;
  	  fprintf (dump, "  Register %i costs:", i);
! 	  for (n = 0; n < n_atomic_classes; n++)
! 	    {
! 	      enum reg_class class = atomic_class[n];
! 
! 	      if (contains_reg_of_mode [class][PSEUDO_REGNO_MODE (i)]
  #ifdef FORBIDDEN_INC_DEC_CLASSES
! 		  && (!in_inc_dec[i]
! 		      || !forbidden_inc_dec_class[class])
  #endif
  #ifdef CLASS_CANNOT_CHANGE_MODE
! 		  && (!REGNO_REG_SET_P (reg_changes_mode, i)
! 		       || class_can_change_mode [class])
  #endif
! 		  )
! 		fprintf (dump, " %s:%i", reg_class_names[class],
! 			 costs[i].cost[n]);
! 	     }
  	  fprintf (dump, " MEM:%i\n", costs[i].mem_cost);
  	}
      }
*************** regclass (f, nregs, dump)
*** 1310,1328 ****
        for (i = FIRST_PSEUDO_REGISTER; i < nregs; i++)
  	{
  	  int best_cost = (1 << (HOST_BITS_PER_INT - 2)) - 1;
  	  enum reg_class best = ALL_REGS, alt = NO_REGS;
  	  /* This is an enum reg_class, but we call it an int
  	     to save lots of casts.  */
- 	  int class;
  	  struct costs *p = &costs[i];
  
  	  /* In non-optimizing compilation REG_N_REFS is not initialized
  	     yet.  */
  	  if (optimize && !REG_N_REFS (i) && !REG_N_SETS (i))
  	    continue;
  
! 	  for (class = (int) ALL_REGS - 1; class > 0; class--)
  	    {
  	      /* Ignore classes that are too small for this operand or
  		 invalid for an operand that was auto-incremented.  */
  	      if (!contains_reg_of_mode [class][PSEUDO_REGNO_MODE (i)]
--- 1330,1353 ----
        for (i = FIRST_PSEUDO_REGISTER; i < nregs; i++)
  	{
  	  int best_cost = (1 << (HOST_BITS_PER_INT - 2)) - 1;
+ 	  enum reg_class best_atomic_class[N_REG_CLASSES];
+ 	  int n_best_atomic_classes = 0;
+ 	  enum reg_class alt_atomic_class[N_REG_CLASSES];
+ 	  int n_alt_atomic_classes = 0;
  	  enum reg_class best = ALL_REGS, alt = NO_REGS;
  	  /* This is an enum reg_class, but we call it an int
  	     to save lots of casts.  */
  	  struct costs *p = &costs[i];
+ 	  int n;
  
  	  /* In non-optimizing compilation REG_N_REFS is not initialized
  	     yet.  */
  	  if (optimize && !REG_N_REFS (i) && !REG_N_SETS (i))
  	    continue;
  
! 	  for (n = 0; n < n_atomic_classes; n++)
  	    {
+ 	      enum reg_class class = atomic_class [n];
  	      /* Ignore classes that are too small for this operand or
  		 invalid for an operand that was auto-incremented.  */
  	      if (!contains_reg_of_mode [class][PSEUDO_REGNO_MODE (i)]
*************** regclass (f, nregs, dump)
*** 1334,1349 ****
  		      && ! class_can_change_mode [class])
  #endif
  		  )
! 		;
! 	      else if (p->cost[class] < best_cost)
  		{
! 		  best_cost = p->cost[class];
! 		  best = (enum reg_class) class;
  		}
- 	      else if (p->cost[class] == best_cost)
- 		best = reg_class_subunion[(int) best][class];
  	    }
! 
  	  /* Record the alternate register class; i.e., a class for which
  	     every register in it is better than using memory.  If adding a
  	     class would make a smaller class (i.e., no union of just those
--- 1359,1400 ----
  		      && ! class_can_change_mode [class])
  #endif
  		  )
! 		continue;
! 	      if (p->cost[n] == best_cost)
! 		{
! 		  best_atomic_class[n_best_atomic_classes++] = class;
! 		}
! 	      else if (p->cost[n] < best_cost)
! 		{
! 		  best_cost = p->cost[n];
! 		  best_atomic_class[0] = class;
! 		  n_best_atomic_classes = 1;
! 		}
! 	      if (p->cost[n] < p->mem_cost)
  		{
! 		  alt_atomic_class[n_alt_atomic_classes++] = class;
  		}
  	    }
! 	  for (best = 0;; best++)
! 	    {
! 	      /* Ignore classes that are too small for this operand or
! 		 invalid for an operand that was auto-incremented.  */
! 	      if (!contains_reg_of_mode [best][PSEUDO_REGNO_MODE (i)]
! #ifdef FORBIDDEN_INC_DEC_CLASSES
! 		  || (in_inc_dec[i] && forbidden_inc_dec_class[best])
! #endif
! #ifdef CLASS_CANNOT_CHANGE_MODE
! 		  || (REGNO_REG_SET_P (reg_changes_mode, i)
! 		      && ! class_can_change_mode [best])
! #endif
! 		  )
! 		continue;
! 	      for (n = 0; n < n_best_atomic_classes; n++)
! 		if (!reg_class_subset_p (best_atomic_class[n], best))
! 		  break;
! 	      if (n == n_best_atomic_classes)
! 		break;
! 	    }
  	  /* Record the alternate register class; i.e., a class for which
  	     every register in it is better than using memory.  If adding a
  	     class would make a smaller class (i.e., no union of just those
*************** regclass (f, nregs, dump)
*** 1352,1370 ****
  	     will be doing it again later.  */
  
  	  if ((pass == 1  || dump) || ! flag_expensive_optimizations)
! 	    for (class = 0; class < N_REG_CLASSES; class++)
! 	      if (p->cost[class] < p->mem_cost
! 		  && (reg_class_size[(int) reg_class_subunion[(int) alt][class]]
! 		      > reg_class_size[(int) alt])
  #ifdef FORBIDDEN_INC_DEC_CLASSES
! 		  && ! (in_inc_dec[i] && forbidden_inc_dec_class[class])
  #endif
  #ifdef CLASS_CANNOT_CHANGE_MODE
! 		  && ! (REGNO_REG_SET_P (reg_changes_mode, i)
! 			&& ! class_can_change_mode [class])
  #endif
! 		  )
! 		alt = reg_class_subunion[(int) alt][class];
  
  	  /* If we don't add any classes, nothing to try.  */
  	  if (alt == best)
--- 1403,1428 ----
  	     will be doing it again later.  */
  
  	  if ((pass == 1  || dump) || ! flag_expensive_optimizations)
! 	    for (alt = 0;; alt++)
! 	      {
! 		/* Ignore classes that are too small for this operand or
! 		   invalid for an operand that was auto-incremented.  */
! 		if (!contains_reg_of_mode [alt][PSEUDO_REGNO_MODE (i)]
  #ifdef FORBIDDEN_INC_DEC_CLASSES
! 		    || (in_inc_dec[i] && forbidden_inc_dec_class[alt])
  #endif
  #ifdef CLASS_CANNOT_CHANGE_MODE
! 		    || (REGNO_REG_SET_P (reg_changes_mode, i)
! 			&& ! class_can_change_mode [alt])
  #endif
! 		    )
! 		  continue;
! 		for (n = 0; n < n_alt_atomic_classes; n++)
! 		  if (!reg_class_subset_p (alt_atomic_class[n], alt))
! 		    break;
! 		if (n == n_alt_atomic_classes)
! 		  break;
! 	      }
  
  	  /* If we don't add any classes, nothing to try.  */
  	  if (alt == best)
*************** record_reg_classes (n_alts, n_ops, ops, 
*** 1534,1547 ****
  		     copy, which is one instruction.  */
  
  		  struct costs *pp = &this_op_costs[i];
  
! 		  for (class = 0; class < N_REG_CLASSES; class++)
! 		    pp->cost[class]
  		      = ((recog_data.operand_type[i] != OP_OUT
! 			  ? may_move_in_cost[mode][class][(int) classes[i]]
  			  : 0)
  			 + (recog_data.operand_type[i] != OP_IN
! 			    ? may_move_out_cost[mode][(int) classes[i]][class]
  			    : 0));
  
  		  /* If the alternative actually allows memory, make things
--- 1592,1606 ----
  		     copy, which is one instruction.  */
  
  		  struct costs *pp = &this_op_costs[i];
+ 		  int n;
  
! 		  for (n = 0; n < n_atomic_classes; n++)
! 		    pp->cost[n]
  		      = ((recog_data.operand_type[i] != OP_OUT
! 			  ? may_move_in_cost[mode][atomic_class[n]][(int) classes[i]]
  			  : 0)
  			 + (recog_data.operand_type[i] != OP_IN
! 			    ? may_move_out_cost[mode][(int) classes[i]][atomic_class[n]]
  			    : 0));
  
  		  /* If the alternative actually allows memory, make things
*************** record_reg_classes (n_alts, n_ops, ops, 
*** 1759,1772 ****
  	      else
  		{
  		  struct costs *pp = &this_op_costs[i];
  
! 		  for (class = 0; class < N_REG_CLASSES; class++)
! 		    pp->cost[class]
  		      = ((recog_data.operand_type[i] != OP_OUT
! 			  ? may_move_in_cost[mode][class][(int) classes[i]]
  			  : 0)
  			 + (recog_data.operand_type[i] != OP_IN
! 			    ? may_move_out_cost[mode][(int) classes[i]][class]
  			    : 0));
  
  		  /* If the alternative actually allows memory, make things
--- 1818,1832 ----
  	      else
  		{
  		  struct costs *pp = &this_op_costs[i];
+ 		  int n;
  
! 		  for (n = 0; n < N_REG_CLASSES; n++)
! 		    pp->cost[n]
  		      = ((recog_data.operand_type[i] != OP_OUT
! 			  ? may_move_in_cost[mode][atomic_class[n]][(int) classes[i]]
  			  : 0)
  			 + (recog_data.operand_type[i] != OP_IN
! 			    ? may_move_out_cost[mode][(int) classes[i]][atomic_class[n]]
  			    : 0));
  
  		  /* If the alternative actually allows memory, make things
*************** record_reg_classes (n_alts, n_ops, ops, 
*** 1836,1848 ****
  	  {
  	    struct costs *pp = &op_costs[i], *qq = &this_op_costs[i];
  	    int scale = 1 + (recog_data.operand_type[i] == OP_INOUT);
  
  	    pp->mem_cost = MIN (pp->mem_cost,
  				(qq->mem_cost + alt_cost) * scale);
  
! 	    for (class = 0; class < N_REG_CLASSES; class++)
! 	      pp->cost[class] = MIN (pp->cost[class],
! 				     (qq->cost[class] + alt_cost) * scale);
  	  }
      }
  
--- 1896,1909 ----
  	  {
  	    struct costs *pp = &op_costs[i], *qq = &this_op_costs[i];
  	    int scale = 1 + (recog_data.operand_type[i] == OP_INOUT);
+ 	    int n;
  
  	    pp->mem_cost = MIN (pp->mem_cost,
  				(qq->mem_cost + alt_cost) * scale);
  
! 	    for (n = 0; n < n_atomic_classes; n++)
! 	      pp->cost[n] = MIN (pp->cost[n],
! 				 (qq->cost[n] + alt_cost) * scale);
  	  }
      }
  
*************** record_reg_classes (n_alts, n_ops, ops, 
*** 1860,1865 ****
--- 1921,1927 ----
       arbitrarily expensive).  This avoids losing when the preferred class
       is very expensive as the source of a copy instruction.  */
  
+ #if 0
    if ((set = single_set (insn)) != 0
        && ops[0] == SET_DEST (set) && ops[1] == SET_SRC (set)
        && GET_CODE (ops[0]) == REG && GET_CODE (ops[1]) == REG
*************** record_reg_classes (n_alts, n_ops, ops, 
*** 1902,1907 ****
--- 1964,1970 ----
  		    }
  		}
  	}
+ #endif
  }
  \f
  /* Compute the cost of loading X into (if TO_P is nonzero) or from (if
*************** record_address_regs (x, class, scale)
*** 2137,2144 ****
  
  	pp->mem_cost += (MEMORY_MOVE_COST (Pmode, class, 1) * scale) / 2;
  
! 	for (i = 0; i < N_REG_CLASSES; i++)
! 	  pp->cost[i] += (may_move_in_cost[Pmode][i][(int) class] * scale) / 2;
        }
        break;
  
--- 2200,2207 ----
  
  	pp->mem_cost += (MEMORY_MOVE_COST (Pmode, class, 1) * scale) / 2;
  
! 	for (i = 0; i < n_atomic_classes; i++)
! 	  pp->cost[i] += (may_move_in_cost[Pmode][atomic_class[i]][(int) class] * scale) / 2;
        }
        break;
  

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2002-12-16 20:21 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-12-16 12:21 [RFC] reg-class algorithm tweek Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).