public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Patch: Revert find_reloads part of the IRA merge
@ 2010-07-09 11:11 Bernd Schmidt
  2010-07-09 13:28 ` Richard Earnshaw
  2010-07-14 11:28 ` Bernd Schmidt
  0 siblings, 2 replies; 6+ messages in thread
From: Bernd Schmidt @ 2010-07-09 11:11 UTC (permalink / raw)
  To: GCC Patches; +Cc: Vladimir N. Makarov, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 3315 bytes --]

For a while now I've been struggling with a few Thumb-1 patches, because
trying to tune the backend's code generation always seemed to produce
some inexplicable results.  I've finally tracked the problem down to a
change in reload.

As part of the big IRA merge, find_reloads was changed to prefer
alternatives for which fewer alternatives require a
SMALL_REGISTER_CLASS_P class.  (I'll repeat what I said at the time,
changes such as this should have been posted and reviewed separately
from IRA.)

This change causes problems for Thumb-1, because its general register
class, LO_REGS, is CLASS_LIKELY_SPILLED_P, while the high registers
(which are fewer in number and harder to access) aren't.  As a result,
we get bizarre results where reload prefers alternatives using high
registers over alternatives using low registers, or even preferring
alternatives with fewer operands allowing registers.  Another defect is
that the code doesn't take goal_alternative_win into account.

There are other conceivable ways of tuning this heuristic, e.g.
computing the smallest register class size over all operands for an
alternative, and then picking the alternative that has the highest such
minimum.  I tested a patch for this, but thinking about it some more,
this also has the potential for unpleasant surprises: imagine a Thumb-1
insn with one alternative allowing "l" (LO_REGS) and another allowing
"lk" (LO_REGS + the stack pointer); we'd always prefer the second one
for no reason.  At this point I decided that the best thing to do for
now would be to revert this change.

It should be possible to do better here.  The thourough, but likely
expensive way, would be to record reloads for all alternatives with the
same cost, then choosing the best one (requiring the least spills) in
the caller using find_reload_regs.  A simpler way that is closer to
Vlad's change here would be to determine if an operand is input, output,
or both, then picking the right regset(s) from the insn_chain, looking
if there are any registers available, and then counting the number of
known forced spills for each alternative.

It might also be possible to achieve a similar effect by just reordering
alternatives in a port that wishes to prevent reload from using
likely-spilled classes if possible.

The patch below has bootstrapped on i686-linux; regression tests in
progress.  I've also tested completely different patches for the
problem, but I guess no one wants to know that. :-)

Typical code generation differences on Thumb-1 look like this:

-       mov     r2, #192
-       mov     r3, r2
-       add     r3, r3, r9
+       mov     r3, r9
+       add     r3, r3, #192

There's a small arm-specific bit in the patch, which switches two
alternatives on the grounds that synthesizing a negative constant in a
register is more expensive than reloading one register into another.
Hence, the alternative using constants should be preferred if it can
match, and thus it should go first:

-       mov     r5, #32
-       neg     r5, r5
-       add     r4, r2, r5
+       mov     r4, r2
+       sub     r4, r4, #32
        bmi     .L2

This was a case where the find_reloads code produced better results for
entirely the wrong reasons.

I'll commit the reload patch if no one objects soonish and if the ARM
part is approved.


Bernd

[-- Attachment #2: smallclass.diff --]
[-- Type: text/plain, Size: 2751 bytes --]

	* reload.c (find_reloads): Revert code to penalize small register
	classes that was brought in with the IRA merge.
	* config/arm/arm.md (addsi3_cbranch): Switch alternatives 0 and 1.

Index: reload.c
===================================================================
--- reload.c	(revision 161987)
+++ reload.c	(working copy)
@@ -2602,7 +2602,6 @@ find_reloads (rtx insn, int replace, int
   char goal_alternative_earlyclobber[MAX_RECOG_OPERANDS];
   int goal_alternative_swapped;
   int best;
-  int best_small_class_operands_num;
   int commutative;
   char operands_match[MAX_RECOG_OPERANDS][MAX_RECOG_OPERANDS];
   rtx substed_operand[MAX_RECOG_OPERANDS];
@@ -2928,7 +2927,6 @@ find_reloads (rtx insn, int replace, int
      all the operands together against the register constraints.  */
 
   best = MAX_RECOG_OPERANDS * 2 + 600;
-  best_small_class_operands_num = 0;
 
   swapped = 0;
   goal_alternative_swapped = 0;
@@ -3714,27 +3712,7 @@ find_reloads (rtx insn, int replace, int
 	 record it as the chosen goal for reloading.  */
       if (! bad)
 	{
-	  bool change_p = false;
-	  int small_class_operands_num = 0;
-
-	  if (best >= losers)
-	    {
-	      for (i = 0; i < noperands; i++)
-		small_class_operands_num
-		  += SMALL_REGISTER_CLASS_P (this_alternative[i]) ? 1 : 0;
-	      if (best > losers
-		  || (best == losers
-		      /* If the cost of the reloads is the same,
-			 prefer alternative which requires minimal
-			 number of small register classes for the
-			 operands.  This improves chances of reloads
-			 for insn requiring small register
-			 classes.  */
-		      && (small_class_operands_num
-			  < best_small_class_operands_num)))
-		change_p = true;
-	    }
-	  if (change_p)
+	  if (best > losers)
 	    {
 	      for (i = 0; i < noperands; i++)
 		{
@@ -3750,7 +3728,6 @@ find_reloads (rtx insn, int replace, int
 		}
 	      goal_alternative_swapped = swapped;
 	      best = losers;
-	      best_small_class_operands_num = small_class_operands_num;
 	      goal_alternative_number = this_alternative_number;
 	      goal_earlyclobber = this_earlyclobber;
 	    }
Index: config/arm/arm.md
===================================================================
--- config/arm/arm.md	(revision 161987)
+++ config/arm/arm.md	(working copy)
@@ -7459,8 +7459,8 @@ (define_insn "*addsi3_cbranch"
 	(if_then_else
 	 (match_operator 4 "arm_comparison_operator"
 	  [(plus:SI
-	    (match_operand:SI 2 "s_register_operand" "%l,0,*l,1,1,1")
-	    (match_operand:SI 3 "reg_or_int_operand" "lL,IJ,*l,lIJ,lIJ,lIJ"))
+	    (match_operand:SI 2 "s_register_operand" "%0,l,*l,1,1,1")
+	    (match_operand:SI 3 "reg_or_int_operand" "IJ,lL,*l,lIJ,lIJ,lIJ"))
 	   (const_int 0)])
 	 (label_ref (match_operand 5 "" ""))
 	 (pc)))

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Patch: Revert find_reloads part of the IRA merge
  2010-07-09 11:11 Patch: Revert find_reloads part of the IRA merge Bernd Schmidt
@ 2010-07-09 13:28 ` Richard Earnshaw
  2010-07-14 11:28 ` Bernd Schmidt
  1 sibling, 0 replies; 6+ messages in thread
From: Richard Earnshaw @ 2010-07-09 13:28 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches, Vladimir N. Makarov


On Fri, 2010-07-09 at 13:09 +0200, Bernd Schmidt wrote:
> For a while now I've been struggling with a few Thumb-1 patches, because
> trying to tune the backend's code generation always seemed to produce
> some inexplicable results.  I've finally tracked the problem down to a
> change in reload.
> 
> As part of the big IRA merge, find_reloads was changed to prefer
> alternatives for which fewer alternatives require a
> SMALL_REGISTER_CLASS_P class.  (I'll repeat what I said at the time,
> changes such as this should have been posted and reviewed separately
> from IRA.)
> 
> This change causes problems for Thumb-1, because its general register
> class, LO_REGS, is CLASS_LIKELY_SPILLED_P, while the high registers
> (which are fewer in number and harder to access) aren't.  As a result,
> we get bizarre results where reload prefers alternatives using high
> registers over alternatives using low registers, or even preferring
> alternatives with fewer operands allowing registers.  Another defect is
> that the code doesn't take goal_alternative_win into account.
> 
> There are other conceivable ways of tuning this heuristic, e.g.
> computing the smallest register class size over all operands for an
> alternative, and then picking the alternative that has the highest such
> minimum.  I tested a patch for this, but thinking about it some more,
> this also has the potential for unpleasant surprises: imagine a Thumb-1
> insn with one alternative allowing "l" (LO_REGS) and another allowing
> "lk" (LO_REGS + the stack pointer); we'd always prefer the second one
> for no reason.  At this point I decided that the best thing to do for
> now would be to revert this change.
> 
> It should be possible to do better here.  The thourough, but likely
> expensive way, would be to record reloads for all alternatives with the
> same cost, then choosing the best one (requiring the least spills) in
> the caller using find_reload_regs.  A simpler way that is closer to
> Vlad's change here would be to determine if an operand is input, output,
> or both, then picking the right regset(s) from the insn_chain, looking
> if there are any registers available, and then counting the number of
> known forced spills for each alternative.
> 
> It might also be possible to achieve a similar effect by just reordering
> alternatives in a port that wishes to prevent reload from using
> likely-spilled classes if possible.
> 
> The patch below has bootstrapped on i686-linux; regression tests in
> progress.  I've also tested completely different patches for the
> problem, but I guess no one wants to know that. :-)
> 
> Typical code generation differences on Thumb-1 look like this:
> 
> -       mov     r2, #192
> -       mov     r3, r2
> -       add     r3, r3, r9
> +       mov     r3, r9
> +       add     r3, r3, #192
> 
> There's a small arm-specific bit in the patch, which switches two
> alternatives on the grounds that synthesizing a negative constant in a
> register is more expensive than reloading one register into another.
> Hence, the alternative using constants should be preferred if it can
> match, and thus it should go first:
> 
> -       mov     r5, #32
> -       neg     r5, r5
> -       add     r4, r2, r5
> +       mov     r4, r2
> +       sub     r4, r4, #32
>         bmi     .L2
> 
> This was a case where the find_reloads code produced better results for
> entirely the wrong reasons.
> 
> I'll commit the reload patch if no one objects soonish and if the ARM
> part is approved.
> 
> 
> Bernd

The ARM bit is fine.

R.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Patch: Revert find_reloads part of the IRA merge
  2010-07-09 11:11 Patch: Revert find_reloads part of the IRA merge Bernd Schmidt
  2010-07-09 13:28 ` Richard Earnshaw
@ 2010-07-14 11:28 ` Bernd Schmidt
  2010-07-14 19:03   ` Jeff Law
  1 sibling, 1 reply; 6+ messages in thread
From: Bernd Schmidt @ 2010-07-14 11:28 UTC (permalink / raw)
  To: GCC Patches; +Cc: Vladimir N. Makarov, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 1152 bytes --]

I've now reverted the find_reloads change.

Attached is a patch which tries to solve the problem in a different way;
I fixed some other problems that showed up in testing on ARM.  Specifically:
  * Immediately marking used_spill_regs as ever_live can cause a
    situation where the next iteration will use a different set of
    hard regs, and we end up pushing registers that are never used.
    If we delay settings regs_ever_live until nothing else has
    changed, we can avoid that, at the cost of a little bit of
    complexity, and the risk of requiring more iterations.  Not
    sure this is worth it.
  * When evaluating costs of alternatives, reload isn't taking into
    account whether something is going to need secondary reloads.
  * Use live_throughout sets to determine if a reload will force
    a spill.  This seems to work better than the previous heuristic,
    but it can still fail (dead_or_set isn't taken into account
    since it's quite a useless set, the test isn't run on address
    reloads, etc.)

Not committed.  It would be nice to get reports from some target
maintainers about what this does to code quality.


Bernd

[-- Attachment #2: different-fix.diff --]
[-- Type: text/plain, Size: 12115 bytes --]

Index: gcc/reload1.c
===================================================================
--- gcc.orig/reload1.c
+++ gcc/reload1.c
@@ -197,7 +197,7 @@ static HARD_REG_SET bad_spill_regs;
    insn.  This includes registers used for user variables and registers that
    we can't eliminate.  A register that appears in this set also can't be used
    to retry register allocation.  */
-static HARD_REG_SET bad_spill_regs_global;
+HARD_REG_SET bad_spill_regs_global;
 
 /* Describes order of use of registers for reloading
    of spilled pseudo-registers.  `n_spills' is the number of
@@ -227,6 +227,10 @@ static HARD_REG_SET *pseudo_forbidden_re
    marked in this set.  */
 static HARD_REG_SET used_spill_regs;
 
+/* Keeps track of hard registers used for spills in the current iteration.
+   This set will be merged into used_spill_regs in finish_spills.  */
+static HARD_REG_SET current_used_spill_regs;
+
 /* Index of last register assigned as a spill register.  We allocate in
    a round-robin fashion.  */
 static int last_spill_reg;
@@ -823,6 +827,7 @@ reload (rtx first, int global)
 
   /* Spill any hard regs that we know we can't eliminate.  */
   CLEAR_HARD_REG_SET (used_spill_regs);
+  CLEAR_HARD_REG_SET (current_used_spill_regs);
   /* There can be multiple ways to eliminate a register;
      they should be listed adjacently.
      Elimination for any register fails only if all possible ways fail.  */
@@ -1511,7 +1516,7 @@ calculate_needs_all_insns (int global)
 	    did_elimination = eliminate_regs_in_insn (insn, 0);
 
 	  /* Analyze the instruction.  */
-	  operands_changed = find_reloads (insn, 0, spill_indirect_levels,
+	  operands_changed = find_reloads (chain, 0, spill_indirect_levels,
 					   global, spill_reg_order);
 
 	  /* If a no-op set needs more than one reload, this is likely
@@ -2042,7 +2047,7 @@ find_reload_regs (struct insn_chain *cha
     }
 
   COPY_HARD_REG_SET (chain->used_spill_regs, used_spill_regs_local);
-  IOR_HARD_REG_SET (used_spill_regs, used_spill_regs_local);
+  IOR_HARD_REG_SET (current_used_spill_regs, used_spill_regs_local);
 
   memcpy (chain->rld, rld, n_reloads * sizeof (struct reload));
 }
@@ -2052,6 +2057,7 @@ select_reload_regs (void)
 {
   struct insn_chain *chain;
 
+  CLEAR_HARD_REG_SET (current_used_spill_regs);
   /* Try to satisfy the needs for each insn.  */
   for (chain = insns_need_reload; chain != 0;
        chain = chain->next_need_reload)
@@ -4279,34 +4285,10 @@ finish_spills (int global)
 {
   struct insn_chain *chain;
   int something_changed = 0;
+  int new_regs_used;
   unsigned i;
   reg_set_iterator rsi;
 
-  /* Build the spill_regs array for the function.  */
-  /* If there are some registers still to eliminate and one of the spill regs
-     wasn't ever used before, additional stack space may have to be
-     allocated to store this register.  Thus, we may have changed the offset
-     between the stack and frame pointers, so mark that something has changed.
-
-     One might think that we need only set VAL to 1 if this is a call-used
-     register.  However, the set of registers that must be saved by the
-     prologue is not identical to the call-used set.  For example, the
-     register used by the call insn for the return PC is a call-used register,
-     but must be saved by the prologue.  */
-
-  n_spills = 0;
-  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-    if (TEST_HARD_REG_BIT (used_spill_regs, i))
-      {
-	spill_reg_order[i] = n_spills;
-	spill_regs[n_spills++] = i;
-	if (num_eliminable && ! df_regs_ever_live_p (i))
-	  something_changed = 1;
-	df_set_regs_ever_live (i, true);
-      }
-    else
-      spill_reg_order[i] = -1;
-
   EXECUTE_IF_SET_IN_REG_SET (&spilled_pseudos, FIRST_PSEUDO_REGISTER, i, rsi)
     if (! ira_conflicts_p || reg_renumber[i] >= 0)
       {
@@ -4388,6 +4370,8 @@ finish_spills (int global)
 	 makes inheritance work somewhat better.  */
       if (chain->need_reload)
 	{
+	  HARD_REG_SET all_used_spill_regs;
+
 	  REG_SET_TO_HARD_REG_SET (used_by_pseudos, &chain->live_throughout);
 	  REG_SET_TO_HARD_REG_SET (used_by_pseudos2, &chain->dead_or_set);
 	  IOR_HARD_REG_SET (used_by_pseudos, used_by_pseudos2);
@@ -4398,8 +4382,10 @@ finish_spills (int global)
 	     may be not included in the value calculated here because
 	     of possible removing caller-saves insns (see function
 	     delete_caller_save_insns.  */
+	  COPY_HARD_REG_SET (all_used_spill_regs, used_spill_regs);
+	  IOR_HARD_REG_SET (all_used_spill_regs, current_used_spill_regs);
 	  COMPL_HARD_REG_SET (chain->used_spill_regs, used_by_pseudos);
-	  AND_HARD_REG_SET (chain->used_spill_regs, used_spill_regs);
+	  AND_HARD_REG_SET (chain->used_spill_regs, all_used_spill_regs);
 	}
     }
 
@@ -4425,6 +4411,43 @@ finish_spills (int global)
 	}
     }
 
+  /* Build the spill_regs array for the function.  */
+  /* If there are some registers still to eliminate and one of the spill regs
+     wasn't ever used before, additional stack space may have to be
+     allocated to store this register.  Thus, we may have changed the offset
+     between the stack and frame pointers, so mark that something has changed.
+     However, only do this if nothing else has changed - if we spilled a pseudo,
+     we may not end up using the same spill reg in the next iteration.
+
+     One might think that we need only set VAL to 1 if this is a call-used
+     register.  However, the set of registers that must be saved by the
+     prologue is not identical to the call-used set.  For example, the
+     register used by the call insn for the return PC is a call-used register,
+     but must be saved by the prologue.  */
+
+  n_spills = 0;
+  new_regs_used = 0;
+  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    {
+      if (!something_changed && TEST_HARD_REG_BIT (current_used_spill_regs, i))
+	{
+	  if (num_eliminable && ! df_regs_ever_live_p (i))
+	    new_regs_used++;
+	  df_set_regs_ever_live (i, true);
+	  SET_HARD_REG_BIT (used_spill_regs, i);
+	}
+
+      if (TEST_HARD_REG_BIT (used_spill_regs, i))      
+	{
+	  spill_reg_order[i] = n_spills;
+	  spill_regs[n_spills++] = i;
+	}
+      else
+	spill_reg_order[i] = -1;
+    }
+  if (new_regs_used)
+    something_changed = 1;
+
   return something_changed;
 }
 \f
@@ -4587,7 +4610,7 @@ reload_as_needed (int live_known)
 	      CLEAR_REG_SET (&reg_has_output_reload);
 	      CLEAR_HARD_REG_SET (reg_is_output_reload);
 
-	      find_reloads (insn, 1, spill_indirect_levels, live_known,
+	      find_reloads (chain, 1, spill_indirect_levels, live_known,
 			    spill_reg_order);
 	    }
 
Index: gcc/reload.c
===================================================================
--- gcc.orig/reload.c
+++ gcc/reload.c
@@ -2533,8 +2533,8 @@ safe_from_earlyclobber (rtx op, rtx clob
   return immune_p (op, clobber, early_data);
 }
 \f
-/* Main entry point of this file: search the body of INSN
-   for values that need reloading and record them with push_reload.
+/* Main entry point of this file: search the body of the insn found in
+   CHAIN for values that need reloading and record them with push_reload.
    REPLACE nonzero means record also where the values occur
    so that subst_reloads can be used.
 
@@ -2556,9 +2556,10 @@ safe_from_earlyclobber (rtx op, rtx clob
    commutative operands, reg_equiv_address substitution, or whatever.  */
 
 int
-find_reloads (rtx insn, int replace, int ind_levels, int live_known,
-	      short *reload_reg_p)
+find_reloads (struct insn_chain *chain, int replace, int ind_levels,
+	      int live_known, short *reload_reg_p)
 {
+  rtx insn = chain->insn;
   int insn_code_number;
   int i, j;
   int noperands;
@@ -2601,7 +2602,7 @@ find_reloads (rtx insn, int replace, int
   char goal_alternative_offmemok[MAX_RECOG_OPERANDS];
   char goal_alternative_earlyclobber[MAX_RECOG_OPERANDS];
   int goal_alternative_swapped;
-  int best;
+  int best, best_forced_reloads;
   int commutative;
   char operands_match[MAX_RECOG_OPERANDS][MAX_RECOG_OPERANDS];
   rtx substed_operand[MAX_RECOG_OPERANDS];
@@ -2808,7 +2809,7 @@ find_reloads (rtx insn, int replace, int
 		  || GET_CODE (recog_data.operand[i]) == PLUS))
 	    {
 	      INSN_CODE (insn) = -1;
-	      retval = find_reloads (insn, replace, ind_levels, live_known,
+	      retval = find_reloads (chain, replace, ind_levels, live_known,
 				     reload_reg_p);
 	      return retval;
 	    }
@@ -2927,6 +2928,7 @@ find_reloads (rtx insn, int replace, int
      all the operands together against the register constraints.  */
 
   best = MAX_RECOG_OPERANDS * 2 + 600;
+  best_forced_reloads = 0;
 
   swapped = 0;
   goal_alternative_swapped = 0;
@@ -3484,6 +3486,11 @@ find_reloads (rtx insn, int replace, int
 
 	      this_alternative_offmemok[i] = offmemok;
 	      losers++;
+	      if (secondary_reload_class (modified[i] != RELOAD_WRITE,
+					  this_alternative[i],
+					  operand_mode[i], operand) != NO_REGS)
+		reject += 2;
+
 	      if (badop)
 		bad = 1;
 	      /* Alternative loses if it has no regs for a reg operand.  */
@@ -3713,7 +3720,33 @@ find_reloads (rtx insn, int replace, int
 	 record it as the chosen goal for reloading.  */
       if (! bad)
 	{
-	  if (best > losers)
+	  int forced_reloads = 0;
+	  bool change_p = best > losers;
+	  if (best >= losers)
+	    {
+	      for (i = 0; i < noperands; i++)
+		{
+		  HARD_REG_SET tmp, used_by_pseudos;
+		  enum reg_class alt_class = this_alternative[i];
+		  if (this_alternative_win[i]
+		      || this_alternative_match_win[i]
+		      || this_alternative_matches[i] >= 0
+		      || alt_class == NO_REGS)
+		    continue;
+		  COPY_HARD_REG_SET (tmp, reg_class_contents[alt_class]);
+		  AND_COMPL_HARD_REG_SET (tmp, bad_spill_regs_global);
+		  REG_SET_TO_HARD_REG_SET (used_by_pseudos,
+					   &chain->live_throughout);
+		  compute_use_by_pseudos (&used_by_pseudos,
+					  &chain->live_throughout);
+		  AND_COMPL_HARD_REG_SET (tmp, used_by_pseudos);
+		  if (hard_reg_set_empty_p (tmp))
+		    forced_reloads++;
+		}
+	      if (best == losers && forced_reloads < best_forced_reloads)
+		change_p = true;
+	    }
+	  if (change_p)
 	    {
 	      for (i = 0; i < noperands; i++)
 		{
@@ -3729,6 +3762,7 @@ find_reloads (rtx insn, int replace, int
 		}
 	      goal_alternative_swapped = swapped;
 	      best = losers;
+	      best_forced_reloads = forced_reloads;
 	      goal_alternative_number = this_alternative_number;
 	      goal_earlyclobber = this_earlyclobber;
 	    }
Index: gcc/reload.h
===================================================================
--- gcc.orig/reload.h
+++ gcc/reload.h
@@ -241,9 +241,20 @@ extern struct insn_chain *reload_insn_ch
 
 /* Allocate a new insn_chain structure.  */
 extern struct insn_chain *new_insn_chain (void);
+
+/* Search the body of the insn in CHAIN for values that need reloading
+   and record them with push_reload.  REPLACE nonzero means record
+   also where the values occur so that subst_reloads can be used.  */
+extern int find_reloads (struct insn_chain *, int, int, int, short *);
 #endif
 
 #if defined SET_HARD_REG_BIT
+/* These are the hard registers that can't be used as spill register for any
+   insn.  This includes registers used for user variables and registers that
+   we can't eliminate.  A register that appears in this set also can't be used
+   to retry register allocation.  */
+extern HARD_REG_SET bad_spill_regs_global;
+
 extern void compute_use_by_pseudos (HARD_REG_SET *, bitmap);
 #endif
 
@@ -282,11 +293,6 @@ extern int operands_match_p (rtx, rtx);
 /* Return 1 if altering OP will not modify the value of CLOBBER.  */
 extern int safe_from_earlyclobber (rtx, rtx);
 
-/* Search the body of INSN for values that need reloading and record them
-   with push_reload.  REPLACE nonzero means record also where the values occur
-   so that subst_reloads can be used.  */
-extern int find_reloads (rtx, int, int, int, short *);
-
 /* Compute the sum of X and Y, making canonicalizations assumed in an
    address, namely: sum constant integers, surround the sum of two
    constants with a CONST, put the constant as the second operand, and

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Patch: Revert find_reloads part of the IRA merge
  2010-07-14 11:28 ` Bernd Schmidt
@ 2010-07-14 19:03   ` Jeff Law
  2010-07-14 21:36     ` Bernd Schmidt
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff Law @ 2010-07-14 19:03 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches, Vladimir N. Makarov, Richard Earnshaw

On 07/14/10 05:28, Bernd Schmidt wrote:
> I've now reverted the find_reloads change.
>
> Attached is a patch which tries to solve the problem in a different way;
> I fixed some other problems that showed up in testing on ARM.  Specifically:
>    * Immediately marking used_spill_regs as ever_live can cause a
>      situation where the next iteration will use a different set of
>      hard regs, and we end up pushing registers that are never used.
>      If we delay settings regs_ever_live until nothing else has
>      changed, we can avoid that, at the cost of a little bit of
>      complexity, and the risk of requiring more iterations.  Not
>      sure this is worth it.
>    
How often have you seen this occur in practice?  The complexity is 
trivial, so if it's occuring with any regularity, I'd say go for it.


>    * When evaluating costs of alternatives, reload isn't taking into
>      account whether something is going to need secondary reloads.
>    
I'm all for more accurate cost modeling.

>    * Use live_throughout sets to determine if a reload will force
>      a spill.  This seems to work better than the previous heuristic,
>      but it can still fail (dead_or_set isn't taken into account
>      since it's quite a useless set, the test isn't run on address
>      reloads, etc.)
>    
Well, you're detecting one of (many) cases where we know a particular 
reload is going to cause a spill.  While it'd be nice to catch the other 
cases, I don't think detecting all the cases where we can prove that a 
reload is going to cause a spill is necessary.  What you're proposing is 
a clear incremental improvement IMHO.

Jeff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Patch: Revert find_reloads part of the IRA merge
  2010-07-14 19:03   ` Jeff Law
@ 2010-07-14 21:36     ` Bernd Schmidt
  2010-07-14 22:26       ` Jeff Law
  0 siblings, 1 reply; 6+ messages in thread
From: Bernd Schmidt @ 2010-07-14 21:36 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches, Vladimir N. Makarov, Richard Earnshaw

On 07/14/2010 09:03 PM, Jeff Law wrote:
> On 07/14/10 05:28, Bernd Schmidt wrote:
>> Attached is a patch which tries to solve the problem in a different way;
>> I fixed some other problems that showed up in testing on ARM. 
>> Specifically:
>>    * Immediately marking used_spill_regs as ever_live can cause a
>>      situation where the next iteration will use a different set of
>>      hard regs, and we end up pushing registers that are never used.
>>      If we delay settings regs_ever_live until nothing else has
>>      changed, we can avoid that, at the cost of a little bit of
>>      complexity, and the risk of requiring more iterations.  Not
>>      sure this is worth it.
>>    
> How often have you seen this occur in practice?  The complexity is
> trivial, so if it's occuring with any regularity, I'd say go for it.

Anecdotally, I seem to recall seeing this every now and then, and it's
been bugging me for a while.  Practically, in the last test run I think
it only affected one testcase out of the many I have collected for
comparing assembly output.  I've added this code because one of the two
other changes made it show up.

>>    * When evaluating costs of alternatives, reload isn't taking into
>>      account whether something is going to need secondary reloads.
>>    
> I'm all for more accurate cost modeling.

Yes.  I think I'll put this in (after a bit more thought and some
testing on its own), it seems like the right thing to do.

>>    * Use live_throughout sets to determine if a reload will force
>>      a spill.  This seems to work better than the previous heuristic,
>>      but it can still fail (dead_or_set isn't taken into account
>>      since it's quite a useless set, the test isn't run on address
>>      reloads, etc.)
>>    
> Well, you're detecting one of (many) cases where we know a particular
> reload is going to cause a spill.  While it'd be nice to catch the other
> cases, I don't think detecting all the cases where we can prove that a
> reload is going to cause a spill is necessary.  What you're proposing is
> a clear incremental improvement IMHO.

Not really, it suffers from the same conceptual problem as the previous
attempt I reverted.  If we can't get completely accurate answers (and at
this point, we can't), it can happen that we prefer an alternative for
the wrong reasons (e.g. both of them will force a spill, but we can
prove it only for one of them here).

In practice it seems to work reasonably well, at least on ARM.  More
testing would be good.


Bernd

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Patch: Revert find_reloads part of the IRA merge
  2010-07-14 21:36     ` Bernd Schmidt
@ 2010-07-14 22:26       ` Jeff Law
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff Law @ 2010-07-14 22:26 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches, Vladimir N. Makarov, Richard Earnshaw

On 07/14/10 15:35, Bernd Schmidt wrote:
> On 07/14/2010 09:03 PM, Jeff Law wrote:
>    
>> On 07/14/10 05:28, Bernd Schmidt wrote:
>>      
>>> Attached is a patch which tries to solve the problem in a different way;
>>> I fixed some other problems that showed up in testing on ARM.
>>> Specifically:
>>>     * Immediately marking used_spill_regs as ever_live can cause a
>>>       situation where the next iteration will use a different set of
>>>       hard regs, and we end up pushing registers that are never used.
>>>       If we delay settings regs_ever_live until nothing else has
>>>       changed, we can avoid that, at the cost of a little bit of
>>>       complexity, and the risk of requiring more iterations.  Not
>>>       sure this is worth it.
>>>
>>>        
>> How often have you seen this occur in practice?  The complexity is
>> trivial, so if it's occuring with any regularity, I'd say go for it.
>>      
> Anecdotally, I seem to recall seeing this every now and then, and it's
> been bugging me for a while.  Practically, in the last test run I think
> it only affected one testcase out of the many I have collected for
> comparing assembly output.  I've added this code because one of the two
> other changes made it show up.
>    
I've had patches like this (accumulating regs in the main reload loop 
and having them change on further iterations pessimizing code).  They 
didn't trigger often, but fixing it was trivial, so why not? :-)

> * Use live_throughout sets to determine if a reload will force
>>>       a spill.  This seems to work better than the previous heuristic,
>>>       but it can still fail (dead_or_set isn't taken into account
>>>       since it's quite a useless set, the test isn't run on address
>>>       reloads, etc.)
>>>
>>>        
>> Well, you're detecting one of (many) cases where we know a particular
>> reload is going to cause a spill.  While it'd be nice to catch the other
>> cases, I don't think detecting all the cases where we can prove that a
>> reload is going to cause a spill is necessary.  What you're proposing is
>> a clear incremental improvement IMHO.
>>      
> Not really, it suffers from the same conceptual problem as the previous
> attempt I reverted.  If we can't get completely accurate answers (and at
> this point, we can't), it can happen that we prefer an alternative for
> the wrong reasons (e.g. both of them will force a spill, but we can
> prove it only for one of them here).
>    
Good point.

> In practice it seems to work reasonably well, at least on ARM.  More
> testing would be good.
>    
Let's hope others chime in.

jeff


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-07-14 22:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-09 11:11 Patch: Revert find_reloads part of the IRA merge Bernd Schmidt
2010-07-09 13:28 ` Richard Earnshaw
2010-07-14 11:28 ` Bernd Schmidt
2010-07-14 19:03   ` Jeff Law
2010-07-14 21:36     ` Bernd Schmidt
2010-07-14 22:26       ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).