Ping: IRA-based register pressure calculation for RTL loop invariant motion

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Ping: IRA-based register pressure calculation for RTL loop invariant  motion
@ 2009-10-01  3:19 Vladimir Makarov
  2009-10-01  8:49 ` Zdenek Dvorak
  2009-10-13 22:25 ` Ping 2: " Vladimir Makarov
  0 siblings, 2 replies; 15+ messages in thread
From: Vladimir Makarov @ 2009-10-01  3:19 UTC (permalink / raw)
  To: gcc-patches

The patch was posted on

http://gcc.gnu.org/ml/gcc-patches/2009-09/msg01889.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ping: IRA-based register pressure calculation for RTL loop  invariant  motion
  2009-10-01  3:19 Ping: IRA-based register pressure calculation for RTL loop invariant motion Vladimir Makarov
@ 2009-10-01  8:49 ` Zdenek Dvorak
  2009-10-01 14:34   ` Vladimir Makarov
  2009-10-13 22:25 ` Ping 2: " Vladimir Makarov
  1 sibling, 1 reply; 15+ messages in thread
From: Zdenek Dvorak @ 2009-10-01  8:49 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: gcc-patches

Hi,

> The patch was posted on
>
> http://gcc.gnu.org/ml/gcc-patches/2009-09/msg01889.html

in gain_for_invariant:

> +      for (i = 0; i < ira_reg_class_cover_size; i++)
> +	{
> +	  cover_class = ira_reg_class_cover[i];
> +	  if ((diff = ((int) new_regs[cover_class]
> +		       + (int) regs_needed[cover_class]
> +		       + LOOP_DATA (curr_loop)->max_reg_pressure[cover_class]
> +		       + IRA_LOOP_RESERVED_REGS
> +		       - ira_available_class_regs[cover_class])) > 0)
> +	    break;
> +	}

diff is not used.

> +      if (i < ira_reg_class_cover_size)
> +	size_cost = comp_cost + 10;
> +      else
> +	size_cost = 0;

Including comp_cost in size_cost makes no sense (this would prevent us from
moving even very costly invariants out of the loop if we run out of registers).

Also, the magic constant 10 requires some explanation (especially since the
costs elsewhere in the invariant motion are expressed in terms of the costs of
basic operations as reported by rtx_cost, so using an absolute constant will
have different effects on different architectures, depending on the arbitrary
choice of the instruction cost scale).

> +	fprintf (dump_file, "Decided to move dependet invariant %d\n",
> +		 invno);

dependent

> @@ -1154,7 +1297,8 @@ find_invariants_to_move (bool speed)
>    /* We do not really do a good job in estimating number of registers used;
>       we put some initial bound here to stand for induction variables etc.
>       that we do not detect.  */
> -  regs_used = 2;
> +  if (! flag_ira_loop_pressure)
> +    regs_used = 2;
>  
>    for (i = 0; i < n_regs; i++)
>      {

If flag_ira_loop_pressure is true, regs_used is used uninitialized in the
following loop (you should be getting uninitialized variable warning on this).
The loop should be guarded by !flag_ira_loop_pressure as well.

> -  new_regs = 0;
> -  while (best_gain_for_invariant (&inv, &regs_needed, new_regs, regs_used, speed) > 0)
> +  if (! flag_ira_loop_pressure)
> +    new_regs[0] = regs_needed[0] = 0;
> +  else
>      {
> -      set_move_mark (inv->invno);
> -      new_regs += regs_needed;
> +      for (i = 0; (int) i < ira_reg_class_cover_size; i++)
> +	{
> +	  new_regs[ira_reg_class_cover[i]] = 0;
> +	  regs_needed[ira_reg_class_cover[i]] = 0;
> +	}
> +    }

It should not be necessary to zero the regs_needed array.

> +DEFPARAM (PARAM_IRA_LOOP_RESERVED_REGS,
> +	  "ira-loop-reserved-regs",
> +	  "The number of registers reserved for loop invariant motion",
> +	  2, 0, 0)

"The number of registers in each class kept unused by loop invariant motion"

would seem like a better description.

Zdenek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ping: IRA-based register pressure calculation for RTL loop invariant   motion
  2009-10-01  8:49 ` Zdenek Dvorak
@ 2009-10-01 14:34   ` Vladimir Makarov
  2009-10-14 15:21     ` Zdenek Dvorak
  0 siblings, 1 reply; 15+ messages in thread
From: Vladimir Makarov @ 2009-10-01 14:34 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4128 bytes --]

Zdenek Dvorak wrote:
> Hi,
>   
Zdenek, thanks for quick response and very useful review.
>   
>> The patch was posted on
>>
>> http://gcc.gnu.org/ml/gcc-patches/2009-09/msg01889.html
>>     
>
> in gain_for_invariant:
>
>   
>> +      for (i = 0; i < ira_reg_class_cover_size; i++)
>> +	{
>> +	  cover_class = ira_reg_class_cover[i];
>> +	  if ((diff = ((int) new_regs[cover_class]
>> +		       + (int) regs_needed[cover_class]
>> +		       + LOOP_DATA (curr_loop)->max_reg_pressure[cover_class]
>> +		       + IRA_LOOP_RESERVED_REGS
>> +		       - ira_available_class_regs[cover_class])) > 0)
>> +	    break;
>> +	}
>>     
>
> diff is not used.
>
>   
I missed this.  Diff was used for experimental code (I  tried a  lot of 
heuristics) which was based one spill cost vs invariant cost.  But I've 
rejected this code at some point.
>> +      if (i < ira_reg_class_cover_size)
>> +	size_cost = comp_cost + 10;
>> +      else
>> +	size_cost = 0;
>>     
>
> Including comp_cost in size_cost makes no sense (this would prevent us from
> moving even very costly invariants out of the loop if we run out of registers).
>
>   
That is exactly what I intended.  As I wrote above, I tried a lot of 
heuristics with different parameters which decided to move loop 
invariant depending on spill cost and loop invariant cost.  But they 
don't  work well at least for x86/x86_64 and power6.  I have some 
speculation for this.  x86/x86_64 is OOO processors these days.  And 
costly invariant will be hidden because usually the invariant has a lot 
of freedom to be executed out-of-order.  For power6, long latency is 
hidden by insn scheduling.  It is hard to me find a processor where it 
will be important.  Another reason for this, it is very hard to evaluate 
accurately spill cost at this stage.  So I decided not to use 
combination of register pressure and invariant cost in my approach.

I any case I've changed the code a bit (to rid of the magic number) and 
add the comment (see the attachment).
> Also, the magic constant 10 requires some explanation (especially since the
> costs elsewhere in the invariant motion are expressed in terms of the costs of
> basic operations as reported by rtx_cost, so using an absolute constant will
> have different effects on different architectures, depending on the arbitrary
> choice of the instruction cost scale).
>
>   
>> +	fprintf (dump_file, "Decided to move dependet invariant %d\n",
>> +		 invno);
>>     
>
> dependent
>
>   
Done.
>> @@ -1154,7 +1297,8 @@ find_invariants_to_move (bool speed)
>>    /* We do not really do a good job in estimating number of registers used;
>>       we put some initial bound here to stand for induction variables etc.
>>       that we do not detect.  */
>> -  regs_used = 2;
>> +  if (! flag_ira_loop_pressure)
>> +    regs_used = 2;
>>  
>>    for (i = 0; i < n_regs; i++)
>>      {
>>     
>
> If flag_ira_loop_pressure is true, regs_used is used uninitialized in the
> following loop (you should be getting uninitialized variable warning on this).
> The loop should be guarded by !flag_ira_loop_pressure as well.
>
>   
Done. For some reasons, I did not get this warning neither during the 
build nor during the bootstrap.
>> -  new_regs = 0;
>> -  while (best_gain_for_invariant (&inv, &regs_needed, new_regs, regs_used, speed) > 0)
>> +  if (! flag_ira_loop_pressure)
>> +    new_regs[0] = regs_needed[0] = 0;
>> +  else
>>      {
>> -      set_move_mark (inv->invno);
>> -      new_regs += regs_needed;
>> +      for (i = 0; (int) i < ira_reg_class_cover_size; i++)
>> +	{
>> +	  new_regs[ira_reg_class_cover[i]] = 0;
>> +	  regs_needed[ira_reg_class_cover[i]] = 0;
>> +	}
>> +    }
>>     
>
> It should not be necessary to zero the regs_needed array.
>
>   
I've removed this.
>> +DEFPARAM (PARAM_IRA_LOOP_RESERVED_REGS,
>> +	  "ira-loop-reserved-regs",
>> +	  "The number of registers reserved for loop invariant motion",
>> +	  2, 0, 0)
>>     
>
> "The number of registers in each class kept unused by loop invariant motion"
>
> would seem like a better description.
>   
Done.  Here the new variant of the files mentioned by you.



[-- Attachment #2: Changes --]
[-- Type: text/plain, Size: 5370 bytes --]

Index: loop-invariant.c
===================================================================
--- loop-invariant.c	(revision 152153)
+++ loop-invariant.c	(working copy)
@@ -1066,15 +1168,46 @@ get_inv_cost (struct invariant *inv, int
 
 static int
 gain_for_invariant (struct invariant *inv, unsigned *regs_needed,
-		    unsigned new_regs, unsigned regs_used, bool speed)
+		    unsigned *new_regs, unsigned regs_used, bool speed)
 {
   int comp_cost, size_cost;
 
-  get_inv_cost (inv, &comp_cost, regs_needed);
   actual_stamp++;
 
-  size_cost = (estimate_reg_pressure_cost (new_regs + *regs_needed, regs_used, speed)
-	       - estimate_reg_pressure_cost (new_regs, regs_used, speed));
+  get_inv_cost (inv, &comp_cost, regs_needed);
+
+  if (! flag_ira_loop_pressure)
+    {
+      size_cost = (estimate_reg_pressure_cost (new_regs[0] + regs_needed[0],
+					       regs_used, speed)
+		   - estimate_reg_pressure_cost (new_regs[0],
+						 regs_used, speed));
+    }
+  else
+    {
+      int i;
+      enum reg_class cover_class;
+
+      for (i = 0; i < ira_reg_class_cover_size; i++)
+	{
+	  cover_class = ira_reg_class_cover[i];
+	  if ((int) new_regs[cover_class]
+	      + (int) regs_needed[cover_class]
+	      + LOOP_DATA (curr_loop)->max_reg_pressure[cover_class]
+	      + IRA_LOOP_RESERVED_REGS
+	      - ira_available_class_regs[cover_class] > 0)
+	    break;
+	}
+      if (i < ira_reg_class_cover_size)
+	/* There will be register pressure excess and we want not to
+	   make this loop invariant motion.  All loop invariants with
+	   non-positive gains will be rejected in function
+	   find_invariants_to_move.  Therefore we return the negative
+	   number here.  */
+	return -1;
+      else
+	size_cost = 0;
+    }
 
   return comp_cost - size_cost;
 }
@@ -1118,7 +1258,7 @@ best_gain_for_invariant (struct invarian
 /* Marks invariant INVNO and all its dependencies for moving.  */
 
 static void
-set_move_mark (unsigned invno)
+set_move_mark (unsigned invno, int gain)
 {
   struct invariant *inv = VEC_index (invariant_p, invariants, invno);
   bitmap_iterator bi;
@@ -1131,11 +1271,18 @@ set_move_mark (unsigned invno)
   inv->move = true;
 
   if (dump_file)
-    fprintf (dump_file, "Decided to move invariant %d\n", invno);
+    {
+      if (gain >= 0)
+	fprintf (dump_file, "Decided to move invariant %d -- gain %d\n",
+		 invno, gain);
+      else
+	fprintf (dump_file, "Decided to move dependent invariant %d\n",
+		 invno);
+    };
 
   EXECUTE_IF_SET_IN_BITMAP (inv->depends_on, 0, invno, bi)
     {
-      set_move_mark (invno);
+      set_move_mark (invno, -1);
     }
 }
 
@@ -1144,32 +1291,54 @@ set_move_mark (unsigned invno)
 static void
 find_invariants_to_move (bool speed)
 {
-  unsigned i, regs_used, regs_needed = 0, new_regs;
+  int gain;
+  unsigned i, regs_used, regs_needed[N_REG_CLASSES], new_regs[N_REG_CLASSES];
   struct invariant *inv = NULL;
-  unsigned int n_regs = DF_REG_SIZE (df);
 
   if (!VEC_length (invariant_p, invariants))
     return;
 
-  /* We do not really do a good job in estimating number of registers used;
-     we put some initial bound here to stand for induction variables etc.
-     that we do not detect.  */
-  regs_used = 2;
-
-  for (i = 0; i < n_regs; i++)
-    {
-      if (!DF_REGNO_FIRST_DEF (i) && DF_REGNO_LAST_USE (i))
+  if (flag_ira_loop_pressure)
+    /* REGS_USED is actually never used when the flag is on.  */ 
+    regs_used = 0;
+  else
+    /* We do not really do a good job in estimating number of
+       registers used; we put some initial bound here to stand for
+       induction variables etc.  that we do not detect.  */
+    {
+      unsigned int n_regs = DF_REG_SIZE (df);
+
+      regs_used = 2;
+      
+      for (i = 0; i < n_regs; i++)
 	{
-	  /* This is a value that is used but not changed inside loop.  */
-	  regs_used++;
+	  if (!DF_REGNO_FIRST_DEF (i) && DF_REGNO_LAST_USE (i))
+	    {
+	      /* This is a value that is used but not changed inside loop.  */
+	      regs_used++;
+	    }
 	}
     }
 
-  new_regs = 0;
-  while (best_gain_for_invariant (&inv, &regs_needed, new_regs, regs_used, speed) > 0)
+  if (! flag_ira_loop_pressure)
+    new_regs[0] = regs_needed[0] = 0;
+  else
     {
-      set_move_mark (inv->invno);
-      new_regs += regs_needed;
+      for (i = 0; (int) i < ira_reg_class_cover_size; i++)
+	new_regs[ira_reg_class_cover[i]] = 0;
+    }
+  while ((gain = best_gain_for_invariant (&inv, regs_needed,
+					  new_regs, regs_used, speed)) > 0)
+    {
+      set_move_mark (inv->invno, gain);
+      if (! flag_ira_loop_pressure)
+	new_regs[0] += regs_needed[0];
+      else
+	{
+	  for (i = 0; (int) i < ira_reg_class_cover_size; i++)
+	    new_regs[ira_reg_class_cover[i]]
+	      += regs_needed[ira_reg_class_cover[i]];
+	}
     }
 }
 
Index: params.def
===================================================================
--- params.def	(revision 152153)
+++ params.def	(working copy)
@@ -719,6 +719,11 @@ DEFPARAM (PARAM_IRA_MAX_CONFLICT_TABLE_S
 	  "max size of conflict table in MB",
 	  1000, 0, 0)
 
+DEFPARAM (PARAM_IRA_LOOP_RESERVED_REGS,
+	  "ira-loop-reserved-regs",
+	  "The number of registers in each class kept unused by loop invariant motion",
+	  2, 0, 0)
+
 /* Switch initialization conversion will refuse to create arrays that are
    bigger than this parameter times the number of switch branches.  */
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Ping 2: IRA-based register pressure calculation for RTL loop invariant  motion
  2009-10-01  3:19 Ping: IRA-based register pressure calculation for RTL loop invariant motion Vladimir Makarov
  2009-10-01  8:49 ` Zdenek Dvorak
@ 2009-10-13 22:25 ` Vladimir Makarov
  2009-10-14  9:48   ` Richard Guenther
  1 sibling, 1 reply; 15+ messages in thread
From: Vladimir Makarov @ 2009-10-13 22:25 UTC (permalink / raw)
  To: gcc-patches

The patch was posted on

http://gcc.gnu.org/ml/gcc-patches/2009-09/msg01889.html

I've got Zdenek's review for loop-invariant.c (this is the biggest part 
of the patch) and params.def.  I am still waiting an approval for the 
rest files.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ping 2: IRA-based register pressure calculation for RTL loop   invariant motion
  2009-10-13 22:25 ` Ping 2: " Vladimir Makarov
@ 2009-10-14  9:48   ` Richard Guenther
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Guenther @ 2009-10-14  9:48 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: gcc-patches

On Wed, Oct 14, 2009 at 12:18 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
> The patch was posted on
>
> http://gcc.gnu.org/ml/gcc-patches/2009-09/msg01889.html
>
> I've got Zdenek's review for loop-invariant.c (this is the biggest part of
> the patch) and params.def.  I am still waiting an approval for the rest
> files.

The rest is ok.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ping: IRA-based register pressure calculation for RTL loop  invariant motion
  2009-10-01 14:34   ` Vladimir Makarov
@ 2009-10-14 15:21     ` Zdenek Dvorak
  2009-10-14 16:36       ` Vladimir Makarov
  0 siblings, 1 reply; 15+ messages in thread
From: Zdenek Dvorak @ 2009-10-14 15:21 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: gcc-patches

Hi,

>>> +      if (i < ira_reg_class_cover_size)
>>> +	size_cost = comp_cost + 10;
>>> +      else
>>> +	size_cost = 0;
>>>     
>>
>> Including comp_cost in size_cost makes no sense (this would prevent us from
>> moving even very costly invariants out of the loop if we run out of registers).
>>
>>   
> That is exactly what I intended.  As I wrote above, I tried a lot of  
> heuristics with different parameters which decided to move loop  
> invariant depending on spill cost and loop invariant cost.  But they  
> don't  work well at least for x86/x86_64 and power6.  I have some  
> speculation for this.  x86/x86_64 is OOO processors these days.  And  
> costly invariant will be hidden because usually the invariant has a lot  
> of freedom to be executed out-of-order.  For power6, long latency is  
> hidden by insn scheduling.  It is hard to me find a processor where it  
> will be important.  Another reason for this, it is very hard to evaluate  
> accurately spill cost at this stage.  So I decided not to use  
> combination of register pressure and invariant cost in my approach.

could you please add this reasoning to the comment?  Another reason why
preventing the invariant motion does not hurt might be that all expensive
invariants were already moved out of the loop by PRE and gimple invariant
motion pass.

> +      for (i = 0; i < ira_reg_class_cover_size; i++)
> +	{
> +	  cover_class = ira_reg_class_cover[i];
> +	  if ((int) new_regs[cover_class]
> +	      + (int) regs_needed[cover_class]
> +	      + LOOP_DATA (curr_loop)->max_reg_pressure[cover_class]
> +	      + IRA_LOOP_RESERVED_REGS
> +	      - ira_available_class_regs[cover_class] > 0)
> +	    break;
> +	}

It might be clearer to write this as ... > ira_available_class_regs[cover_class] instead
of ... - ira_available_class_regs[cover_class] > 0.  Otherwise, the patch is OK.

Zdenek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ping: IRA-based register pressure calculation for RTL loop invariant  motion
  2009-10-14 15:21     ` Zdenek Dvorak
@ 2009-10-14 16:36       ` Vladimir Makarov
  2009-10-16 21:58         ` Richard Guenther
  0 siblings, 1 reply; 15+ messages in thread
From: Vladimir Makarov @ 2009-10-14 16:36 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches

Zdenek Dvorak wrote:
> Hi,
>
>>>> +      if (i < ira_reg_class_cover_size)
>>>> +	size_cost = comp_cost + 10;
>>>> +      else
>>>> +	size_cost = 0;
>>>>     
>>> Including comp_cost in size_cost makes no sense (this would prevent us from
>>> moving even very costly invariants out of the loop if we run out of registers).
>>>
>>>   
>> That is exactly what I intended.  As I wrote above, I tried a lot of  
>> heuristics with different parameters which decided to move loop  
>> invariant depending on spill cost and loop invariant cost.  But they  
>> don't  work well at least for x86/x86_64 and power6.  I have some  
>> speculation for this.  x86/x86_64 is OOO processors these days.  And  
>> costly invariant will be hidden because usually the invariant has a lot  
>> of freedom to be executed out-of-order.  For power6, long latency is  
>> hidden by insn scheduling.  It is hard to me find a processor where it  
>> will be important.  Another reason for this, it is very hard to evaluate  
>> accurately spill cost at this stage.  So I decided not to use  
>> combination of register pressure and invariant cost in my approach.
>
> could you please add this reasoning to the comment?  Another reason why
> preventing the invariant motion does not hurt might be that all expensive
> invariants were already moved out of the loop by PRE and gimple invariant
> motion pass.
>
>> +      for (i = 0; i < ira_reg_class_cover_size; i++)
>> +	{
>> +	  cover_class = ira_reg_class_cover[i];
>> +	  if ((int) new_regs[cover_class]
>> +	      + (int) regs_needed[cover_class]
>> +	      + LOOP_DATA (curr_loop)->max_reg_pressure[cover_class]
>> +	      + IRA_LOOP_RESERVED_REGS
>> +	      - ira_available_class_regs[cover_class] > 0)
>> +	    break;
>> +	}
>
> It might be clearer to write this as ... > ira_available_class_regs[cover_class] instead
> of ... - ira_available_class_regs[cover_class] > 0.  Otherwise, the patch is OK.
>
Zdenek, thanks for the additional comments.  I incorporated them into 
the patch just before committing.  Here is the affected patch part:

@@ -1066,15 +1168,62 @@ get_inv_cost (struct invariant *inv, int
 
 static int
 gain_for_invariant (struct invariant *inv, unsigned *regs_needed,
-            unsigned new_regs, unsigned regs_used, bool speed)
+            unsigned *new_regs, unsigned regs_used, bool speed)
 {
   int comp_cost, size_cost;
 
-  get_inv_cost (inv, &comp_cost, regs_needed);
   actual_stamp++;
 
-  size_cost = (estimate_reg_pressure_cost (new_regs + *regs_needed, 
regs_used, speed)
-           - estimate_reg_pressure_cost (new_regs, regs_used, speed));
+  get_inv_cost (inv, &comp_cost, regs_needed);
+
+  if (! flag_ira_loop_pressure)
+    {
+      size_cost = (estimate_reg_pressure_cost (new_regs[0] + 
regs_needed[0],
+                           regs_used, speed)
+           - estimate_reg_pressure_cost (new_regs[0],
+                         regs_used, speed));
+    }
+  else
+    {
+      int i;
+      enum reg_class cover_class;
+
+      for (i = 0; i < ira_reg_class_cover_size; i++)
+    {
+      cover_class = ira_reg_class_cover[i];
+      if ((int) new_regs[cover_class]
+          + (int) regs_needed[cover_class]
+          + LOOP_DATA (curr_loop)->max_reg_pressure[cover_class]
+          + IRA_LOOP_RESERVED_REGS
+          > ira_available_class_regs[cover_class])
+        break;
+    }
+      if (i < ira_reg_class_cover_size)
+    /* There will be register pressure excess and we want not to
+       make this loop invariant motion.  All loop invariants with
+       non-positive gains will be rejected in function
+       find_invariants_to_move.  Therefore we return the negative
+       number here.
+
+       One could think that this rejects also expensive loop
+       invariant motions and this will hurt code performance.
+       However numerous experiments with different heuristics
+       taking invariant cost into account did not confirm this
+       assumption.  There are possible explanations for this
+       result:
+           o probably all expensive invariants were already moved out
+             of the loop by PRE and gimple invariant motion pass.
+           o expensive invariant execution will be hidden by insn
+             scheduling or OOO processor hardware because usually such
+             invariants have a lot of freedom to be executed
+             out-of-order.
+       Another reason for ignoring invariant cost vs spilling cost
+       heuristics is also in difficulties to evaluate accurately
+       spill cost at this stage.  */
+    return -1;
+      else
+    size_cost = 0;
+    }
 
   return comp_cost - size_cost;
 }

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ping: IRA-based register pressure calculation for RTL loop   invariant motion
  2009-10-14 16:36       ` Vladimir Makarov
@ 2009-10-16 21:58         ` Richard Guenther
  2009-10-17  5:32           ` Vladimir Makarov
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Guenther @ 2009-10-16 21:58 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Zdenek Dvorak, gcc-patches

On Wed, Oct 14, 2009 at 6:27 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
> Zdenek Dvorak wrote:
>>
>> Hi,
>>
>>>>> +      if (i < ira_reg_class_cover_size)
>>>>> +       size_cost = comp_cost + 10;
>>>>> +      else
>>>>> +       size_cost = 0;
>>>>>
>>>>
>>>> Including comp_cost in size_cost makes no sense (this would prevent us
>>>> from
>>>> moving even very costly invariants out of the loop if we run out of
>>>> registers).
>>>>
>>>>
>>>
>>> That is exactly what I intended.  As I wrote above, I tried a lot of
>>>  heuristics with different parameters which decided to move loop  invariant
>>> depending on spill cost and loop invariant cost.  But they  don't  work well
>>> at least for x86/x86_64 and power6.  I have some  speculation for this.
>>>  x86/x86_64 is OOO processors these days.  And  costly invariant will be
>>> hidden because usually the invariant has a lot  of freedom to be executed
>>> out-of-order.  For power6, long latency is  hidden by insn scheduling.  It
>>> is hard to me find a processor where it  will be important.  Another reason
>>> for this, it is very hard to evaluate  accurately spill cost at this stage.
>>>  So I decided not to use  combination of register pressure and invariant
>>> cost in my approach.
>>
>> could you please add this reasoning to the comment?  Another reason why
>> preventing the invariant motion does not hurt might be that all expensive
>> invariants were already moved out of the loop by PRE and gimple invariant
>> motion pass.
>>
>>> +      for (i = 0; i < ira_reg_class_cover_size; i++)
>>> +       {
>>> +         cover_class = ira_reg_class_cover[i];
>>> +         if ((int) new_regs[cover_class]
>>> +             + (int) regs_needed[cover_class]
>>> +             + LOOP_DATA (curr_loop)->max_reg_pressure[cover_class]
>>> +             + IRA_LOOP_RESERVED_REGS
>>> +             - ira_available_class_regs[cover_class] > 0)
>>> +           break;
>>> +       }
>>
>> It might be clearer to write this as ... >
>> ira_available_class_regs[cover_class] instead
>> of ... - ira_available_class_regs[cover_class] > 0.  Otherwise, the patch
>> is OK.
>>
> Zdenek, thanks for the additional comments.  I incorporated them into the
> patch just before committing.  Here is the affected patch part:

I think this consistently regressed both compile-time and runtime for
Polyhedron on x86_64.  For Itanium the story isn't clear, but effects
are seen there as well (it's also the only one I see off-noise effects
on SPEC 2000 - significant ups and downs).

Richard.

> @@ -1066,15 +1168,62 @@ get_inv_cost (struct invariant *inv, int
>
> static int
> gain_for_invariant (struct invariant *inv, unsigned *regs_needed,
> -            unsigned new_regs, unsigned regs_used, bool speed)
> +            unsigned *new_regs, unsigned regs_used, bool speed)
> {
>  int comp_cost, size_cost;
>
> -  get_inv_cost (inv, &comp_cost, regs_needed);
>  actual_stamp++;
>
> -  size_cost = (estimate_reg_pressure_cost (new_regs + *regs_needed,
> regs_used, speed)
> -           - estimate_reg_pressure_cost (new_regs, regs_used, speed));
> +  get_inv_cost (inv, &comp_cost, regs_needed);
> +
> +  if (! flag_ira_loop_pressure)
> +    {
> +      size_cost = (estimate_reg_pressure_cost (new_regs[0] +
> regs_needed[0],
> +                           regs_used, speed)
> +           - estimate_reg_pressure_cost (new_regs[0],
> +                         regs_used, speed));
> +    }
> +  else
> +    {
> +      int i;
> +      enum reg_class cover_class;
> +
> +      for (i = 0; i < ira_reg_class_cover_size; i++)
> +    {
> +      cover_class = ira_reg_class_cover[i];
> +      if ((int) new_regs[cover_class]
> +          + (int) regs_needed[cover_class]
> +          + LOOP_DATA (curr_loop)->max_reg_pressure[cover_class]
> +          + IRA_LOOP_RESERVED_REGS
> +          > ira_available_class_regs[cover_class])
> +        break;
> +    }
> +      if (i < ira_reg_class_cover_size)
> +    /* There will be register pressure excess and we want not to
> +       make this loop invariant motion.  All loop invariants with
> +       non-positive gains will be rejected in function
> +       find_invariants_to_move.  Therefore we return the negative
> +       number here.
> +
> +       One could think that this rejects also expensive loop
> +       invariant motions and this will hurt code performance.
> +       However numerous experiments with different heuristics
> +       taking invariant cost into account did not confirm this
> +       assumption.  There are possible explanations for this
> +       result:
> +           o probably all expensive invariants were already moved out
> +             of the loop by PRE and gimple invariant motion pass.
> +           o expensive invariant execution will be hidden by insn
> +             scheduling or OOO processor hardware because usually such
> +             invariants have a lot of freedom to be executed
> +             out-of-order.
> +       Another reason for ignoring invariant cost vs spilling cost
> +       heuristics is also in difficulties to evaluate accurately
> +       spill cost at this stage.  */
> +    return -1;
> +      else
> +    size_cost = 0;
> +    }
>
>  return comp_cost - size_cost;
> }
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ping: IRA-based register pressure calculation for RTL loop  invariant  motion
  2009-10-16 21:58         ` Richard Guenther
@ 2009-10-17  5:32           ` Vladimir Makarov
  2009-10-17 11:17             ` Richard Guenther
  0 siblings, 1 reply; 15+ messages in thread
From: Vladimir Makarov @ 2009-10-17  5:32 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Zdenek Dvorak, gcc-patches

Richard Guenther wrote:
> On Wed, Oct 14, 2009 at 6:27 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>   
>> Zdenek Dvorak wrote:
>>     
>>> Hi,
>>>
>>>       
>>>>>> +      if (i < ira_reg_class_cover_size)
>>>>>> +       size_cost = comp_cost + 10;
>>>>>> +      else
>>>>>> +       size_cost = 0;
>>>>>>
>>>>>>             
>>>>> Including comp_cost in size_cost makes no sense (this would prevent us
>>>>> from
>>>>> moving even very costly invariants out of the loop if we run out of
>>>>> registers).
>>>>>
>>>>>
>>>>>           
>>>> That is exactly what I intended.  As I wrote above, I tried a lot of
>>>>  heuristics with different parameters which decided to move loop  invariant
>>>> depending on spill cost and loop invariant cost.  But they  don't  work well
>>>> at least for x86/x86_64 and power6.  I have some  speculation for this.
>>>>  x86/x86_64 is OOO processors these days.  And  costly invariant will be
>>>> hidden because usually the invariant has a lot  of freedom to be executed
>>>> out-of-order.  For power6, long latency is  hidden by insn scheduling.  It
>>>> is hard to me find a processor where it  will be important.  Another reason
>>>> for this, it is very hard to evaluate  accurately spill cost at this stage.
>>>>  So I decided not to use  combination of register pressure and invariant
>>>> cost in my approach.
>>>>         
>>> could you please add this reasoning to the comment?  Another reason why
>>> preventing the invariant motion does not hurt might be that all expensive
>>> invariants were already moved out of the loop by PRE and gimple invariant
>>> motion pass.
>>>
>>>       
>>>> +      for (i = 0; i < ira_reg_class_cover_size; i++)
>>>> +       {
>>>> +         cover_class = ira_reg_class_cover[i];
>>>> +         if ((int) new_regs[cover_class]
>>>> +             + (int) regs_needed[cover_class]
>>>> +             + LOOP_DATA (curr_loop)->max_reg_pressure[cover_class]
>>>> +             + IRA_LOOP_RESERVED_REGS
>>>> +             - ira_available_class_regs[cover_class] > 0)
>>>> +           break;
>>>> +       }
>>>>         
>>> It might be clearer to write this as ... >
>>> ira_available_class_regs[cover_class] instead
>>> of ... - ira_available_class_regs[cover_class] > 0.  Otherwise, the patch
>>> is OK.
>>>
>>>       
>> Zdenek, thanks for the additional comments.  I incorporated them into the
>> patch just before committing.  Here is the affected patch part:
>>     
>
> I think this consistently regressed both compile-time and runtime for
> Polyhedron on x86_64.  For Itanium the story isn't clear, but effects
> are seen there as well (it's also the only one I see off-noise effects
> on SPEC 2000 - significant ups and downs).
>
>   
  Yes, it is expensive optimization (at least 3 additional passes
through RTL insns one for calculating register pressure and two very
expensive passes for finding register classes for pseudos).  It is
clearly seen from SPEC compilation time graphs on

http://vmakarov.fedorapeople.org/spec

for 2 last benchmarking.  Therefore I proposed it only for -O3.

Overall SPEC2000 scores are practically the same on x86/x86_64.

As for Polyhedron benchmarks, here is my results on Core I7:

first:  -ffast-math -funroll-loops -O3 -fno-ira-loop-pressure
second: -ffast-math -funroll-loops -O3 -fira-loop-pressure

x86:
Geometric Mean Execution Time =      12.84 seconds
Geometric Mean Execution Time =      12.82 seconds

x86_64:
Geometric Mean Execution Time =       9.89 seconds
Geometric Mean Execution Time =       9.91 seconds

On power6:
first:  -mtune=power6 -ffast-math -funroll-loops -O3 -fno-ira-loop-pressure
second: -mtune=power6 -ffast-math -funroll-loops -O3 -fira-loop-pressure

Geometric Mean Execution Time =      19.22 seconds
Geometric Mean Execution Time =      19.04 seconds

  As I wrote earlier the winner of the optimization usage will be
loops with pressure lower (but not too lower) than #registers.  For
x86/x86_64, practically all loops have pressure more than #registers.
For such loops, evaluation of invariant cost vs spill cost would be
important.  But at this stage, spill cost is impossible to evaluate
accurately.  So usage of old and new loop invariant motion criteria on
processors similar x86/x86_64 will give different results for particular
tests (some tests better, some worse) but overall score will be
practically the same.

  Probably, there is no sense to use IRA-based register pressure calculation
for all targets (including x86/x86_64) but for power it is a clear win as it
is seen from polyhedron and as I reported for SPEC2000.

  So we could switch it off by default for -O3.  What do you think about 
this
solution, Richard?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ping: IRA-based register pressure calculation for RTL loop   invariant motion
  2009-10-17  5:32           ` Vladimir Makarov
@ 2009-10-17 11:17             ` Richard Guenther
  2009-10-19 16:21               ` Vladimir Makarov
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Guenther @ 2009-10-17 11:17 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Zdenek Dvorak, gcc-patches

On Sat, Oct 17, 2009 at 5:34 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
> Richard Guenther wrote:
>>
>> On Wed, Oct 14, 2009 at 6:27 PM, Vladimir Makarov <vmakarov@redhat.com>
>> wrote:
>>
>>>
>>> Zdenek Dvorak wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>>>>
>>>>>>> +      if (i < ira_reg_class_cover_size)
>>>>>>> +       size_cost = comp_cost + 10;
>>>>>>> +      else
>>>>>>> +       size_cost = 0;
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Including comp_cost in size_cost makes no sense (this would prevent us
>>>>>> from
>>>>>> moving even very costly invariants out of the loop if we run out of
>>>>>> registers).
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> That is exactly what I intended.  As I wrote above, I tried a lot of
>>>>>  heuristics with different parameters which decided to move loop
>>>>>  invariant
>>>>> depending on spill cost and loop invariant cost.  But they  don't  work
>>>>> well
>>>>> at least for x86/x86_64 and power6.  I have some  speculation for this.
>>>>>  x86/x86_64 is OOO processors these days.  And  costly invariant will
>>>>> be
>>>>> hidden because usually the invariant has a lot  of freedom to be
>>>>> executed
>>>>> out-of-order.  For power6, long latency is  hidden by insn scheduling.
>>>>>  It
>>>>> is hard to me find a processor where it  will be important.  Another
>>>>> reason
>>>>> for this, it is very hard to evaluate  accurately spill cost at this
>>>>> stage.
>>>>>  So I decided not to use  combination of register pressure and
>>>>> invariant
>>>>> cost in my approach.
>>>>>
>>>>
>>>> could you please add this reasoning to the comment?  Another reason why
>>>> preventing the invariant motion does not hurt might be that all
>>>> expensive
>>>> invariants were already moved out of the loop by PRE and gimple
>>>> invariant
>>>> motion pass.
>>>>
>>>>
>>>>>
>>>>> +      for (i = 0; i < ira_reg_class_cover_size; i++)
>>>>> +       {
>>>>> +         cover_class = ira_reg_class_cover[i];
>>>>> +         if ((int) new_regs[cover_class]
>>>>> +             + (int) regs_needed[cover_class]
>>>>> +             + LOOP_DATA (curr_loop)->max_reg_pressure[cover_class]
>>>>> +             + IRA_LOOP_RESERVED_REGS
>>>>> +             - ira_available_class_regs[cover_class] > 0)
>>>>> +           break;
>>>>> +       }
>>>>>
>>>>
>>>> It might be clearer to write this as ... >
>>>> ira_available_class_regs[cover_class] instead
>>>> of ... - ira_available_class_regs[cover_class] > 0.  Otherwise, the
>>>> patch
>>>> is OK.
>>>>
>>>>
>>>
>>> Zdenek, thanks for the additional comments.  I incorporated them into the
>>> patch just before committing.  Here is the affected patch part:
>>>
>>
>> I think this consistently regressed both compile-time and runtime for
>> Polyhedron on x86_64.  For Itanium the story isn't clear, but effects
>> are seen there as well (it's also the only one I see off-noise effects
>> on SPEC 2000 - significant ups and downs).
>>
>>
>
>  Yes, it is expensive optimization (at least 3 additional passes
> through RTL insns one for calculating register pressure and two very
> expensive passes for finding register classes for pseudos).  It is
> clearly seen from SPEC compilation time graphs on
>
> http://vmakarov.fedorapeople.org/spec
>
> for 2 last benchmarking.  Therefore I proposed it only for -O3.
>
> Overall SPEC2000 scores are practically the same on x86/x86_64.
>
> As for Polyhedron benchmarks, here is my results on Core I7:
>
> first:  -ffast-math -funroll-loops -O3 -fno-ira-loop-pressure
> second: -ffast-math -funroll-loops -O3 -fira-loop-pressure
>
> x86:
> Geometric Mean Execution Time =      12.84 seconds
> Geometric Mean Execution Time =      12.82 seconds
>
> x86_64:
> Geometric Mean Execution Time =       9.89 seconds
> Geometric Mean Execution Time =       9.91 seconds
>
> On power6:
> first:  -mtune=power6 -ffast-math -funroll-loops -O3 -fno-ira-loop-pressure
> second: -mtune=power6 -ffast-math -funroll-loops -O3 -fira-loop-pressure
>
> Geometric Mean Execution Time =      19.22 seconds
> Geometric Mean Execution Time =      19.04 seconds
>
>  As I wrote earlier the winner of the optimization usage will be
> loops with pressure lower (but not too lower) than #registers.  For
> x86/x86_64, practically all loops have pressure more than #registers.
> For such loops, evaluation of invariant cost vs spill cost would be
> important.  But at this stage, spill cost is impossible to evaluate
> accurately.  So usage of old and new loop invariant motion criteria on
> processors similar x86/x86_64 will give different results for particular
> tests (some tests better, some worse) but overall score will be
> practically the same.
>
>  Probably, there is no sense to use IRA-based register pressure calculation
> for all targets (including x86/x86_64) but for power it is a clear win as it
> is seen from polyhedron and as I reported for SPEC2000.
>
>  So we could switch it off by default for -O3.  What do you think about this
> solution, Richard?

I think we could switch it on by default at -O3 for a selected group of
targets.  Itanium overall also improves with the new heuristics.  That would
make it power and Itanium.  Did you try restricting the heuristics to certain
register classes, like SSE registers on x86_64?

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ping: IRA-based register pressure calculation for RTL loop  invariant  motion
  2009-10-17 11:17             ` Richard Guenther
@ 2009-10-19 16:21               ` Vladimir Makarov
  2009-10-20  2:54                 ` David Edelsohn
  0 siblings, 1 reply; 15+ messages in thread
From: Vladimir Makarov @ 2009-10-19 16:21 UTC (permalink / raw)
  To: Richard Guenther, David Edelsohn, Steve Ellcey; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 6020 bytes --]

Richard Guenther wrote:
> On Sat, Oct 17, 2009 at 5:34 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>   
>> Richard Guenther wrote:
>>     
>>> On Wed, Oct 14, 2009 at 6:27 PM, Vladimir Makarov <vmakarov@redhat.com>
>>> wrote:
>>>
>>>       
>>>> Zdenek Dvorak wrote:
>>>>
>>>>         
>>>>> Hi,
>>>>>
>>>>>
>>>>>           
>>>>>>>> +      if (i < ira_reg_class_cover_size)
>>>>>>>> +       size_cost = comp_cost + 10;
>>>>>>>> +      else
>>>>>>>> +       size_cost = 0;
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> Including comp_cost in size_cost makes no sense (this would prevent us
>>>>>>> from
>>>>>>> moving even very costly invariants out of the loop if we run out of
>>>>>>> registers).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> That is exactly what I intended.  As I wrote above, I tried a lot of
>>>>>>  heuristics with different parameters which decided to move loop
>>>>>>  invariant
>>>>>> depending on spill cost and loop invariant cost.  But they  don't  work
>>>>>> well
>>>>>> at least for x86/x86_64 and power6.  I have some  speculation for this.
>>>>>>  x86/x86_64 is OOO processors these days.  And  costly invariant will
>>>>>> be
>>>>>> hidden because usually the invariant has a lot  of freedom to be
>>>>>> executed
>>>>>> out-of-order.  For power6, long latency is  hidden by insn scheduling.
>>>>>>  It
>>>>>> is hard to me find a processor where it  will be important.  Another
>>>>>> reason
>>>>>> for this, it is very hard to evaluate  accurately spill cost at this
>>>>>> stage.
>>>>>>  So I decided not to use  combination of register pressure and
>>>>>> invariant
>>>>>> cost in my approach.
>>>>>>
>>>>>>             
>>>>> could you please add this reasoning to the comment?  Another reason why
>>>>> preventing the invariant motion does not hurt might be that all
>>>>> expensive
>>>>> invariants were already moved out of the loop by PRE and gimple
>>>>> invariant
>>>>> motion pass.
>>>>>
>>>>>
>>>>>           
>>>>>> +      for (i = 0; i < ira_reg_class_cover_size; i++)
>>>>>> +       {
>>>>>> +         cover_class = ira_reg_class_cover[i];
>>>>>> +         if ((int) new_regs[cover_class]
>>>>>> +             + (int) regs_needed[cover_class]
>>>>>> +             + LOOP_DATA (curr_loop)->max_reg_pressure[cover_class]
>>>>>> +             + IRA_LOOP_RESERVED_REGS
>>>>>> +             - ira_available_class_regs[cover_class] > 0)
>>>>>> +           break;
>>>>>> +       }
>>>>>>
>>>>>>             
>>>>> It might be clearer to write this as ... >
>>>>> ira_available_class_regs[cover_class] instead
>>>>> of ... - ira_available_class_regs[cover_class] > 0.  Otherwise, the
>>>>> patch
>>>>> is OK.
>>>>>
>>>>>
>>>>>           
>>>> Zdenek, thanks for the additional comments.  I incorporated them into the
>>>> patch just before committing.  Here is the affected patch part:
>>>>
>>>>         
>>> I think this consistently regressed both compile-time and runtime for
>>> Polyhedron on x86_64.  For Itanium the story isn't clear, but effects
>>> are seen there as well (it's also the only one I see off-noise effects
>>> on SPEC 2000 - significant ups and downs).
>>>
>>>
>>>       
>>  Yes, it is expensive optimization (at least 3 additional passes
>> through RTL insns one for calculating register pressure and two very
>> expensive passes for finding register classes for pseudos).  It is
>> clearly seen from SPEC compilation time graphs on
>>
>> http://vmakarov.fedorapeople.org/spec
>>
>> for 2 last benchmarking.  Therefore I proposed it only for -O3.
>>
>> Overall SPEC2000 scores are practically the same on x86/x86_64.
>>
>> As for Polyhedron benchmarks, here is my results on Core I7:
>>
>> first:  -ffast-math -funroll-loops -O3 -fno-ira-loop-pressure
>> second: -ffast-math -funroll-loops -O3 -fira-loop-pressure
>>
>> x86:
>> Geometric Mean Execution Time =      12.84 seconds
>> Geometric Mean Execution Time =      12.82 seconds
>>
>> x86_64:
>> Geometric Mean Execution Time =       9.89 seconds
>> Geometric Mean Execution Time =       9.91 seconds
>>
>> On power6:
>> first:  -mtune=power6 -ffast-math -funroll-loops -O3 -fno-ira-loop-pressure
>> second: -mtune=power6 -ffast-math -funroll-loops -O3 -fira-loop-pressure
>>
>> Geometric Mean Execution Time =      19.22 seconds
>> Geometric Mean Execution Time =      19.04 seconds
>>
>>  As I wrote earlier the winner of the optimization usage will be
>> loops with pressure lower (but not too lower) than #registers.  For
>> x86/x86_64, practically all loops have pressure more than #registers.
>> For such loops, evaluation of invariant cost vs spill cost would be
>> important.  But at this stage, spill cost is impossible to evaluate
>> accurately.  So usage of old and new loop invariant motion criteria on
>> processors similar x86/x86_64 will give different results for particular
>> tests (some tests better, some worse) but overall score will be
>> practically the same.
>>
>>  Probably, there is no sense to use IRA-based register pressure calculation
>> for all targets (including x86/x86_64) but for power it is a clear win as it
>> is seen from polyhedron and as I reported for SPEC2000.
>>
>>  So we could switch it off by default for -O3.  What do you think about this
>> solution, Richard?
>>     
>
> I think we could switch it on by default at -O3 for a selected group of
> targets.  Itanium overall also improves with the new heuristics.  That would
> make it power and Itanium.
The patch is below.  Ok to commit?
>   Did you try restricting the heuristics to certain
> register classes, like SSE registers on x86_64?
>
>   
No, I did not try.  I am not sure it is worth to do  it.


2009-10-19  Vladimir Makarov  <vmakarov@redhat.com>

    * doc/invoke.texi (fira-loop-pressure): Update default value.
    * opts.c (decode_options): Remove default value setting for
    flag_ira_loop_pressure.
    * config/ia64/ia64.c (ia64_override_options): Set
    flag_ira_loop_pressure up for -O3.
    * config/rs6000/rs6000.c (rs6000_override_options): Ditto.
    


[-- Attachment #2: ira-pressure-loop.patch --]
[-- Type: text/plain, Size: 2985 bytes --]

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 152770)
+++ doc/invoke.texi	(working copy)
@@ -5720,8 +5720,7 @@ invoking @option{-O2} on programs that u
 Optimize yet more.  @option{-O3} turns on all optimizations specified
 by @option{-O2} and also turns on the @option{-finline-functions},
 @option{-funswitch-loops}, @option{-fpredictive-commoning},
-@option{-fgcse-after-reload}, @option{-ftree-vectorize} and
-@option{-fira-loop-pressure} options.
+@option{-fgcse-after-reload} and @option{-ftree-vectorize} options.
 
 @item -O0
 @opindex O0
@@ -6222,9 +6221,10 @@ architectures with big regular register 
 @opindex fira-loop-pressure
 Use IRA to evaluate register pressure in loops for decision to move
 loop invariants.  Usage of this option usually results in generation
-of faster and smaller code but can slow compiler down.
+of faster and smaller code on machines with big register files (>= 32
+registers) but it can slow compiler down.
 
-This option is enabled at level @option{-O3}.
+This option is enabled at level @option{-O3} for some targets.
 
 @item -fno-ira-share-save-slots
 @opindex fno-ira-share-save-slots
Index: opts.c
===================================================================
--- opts.c	(revision 152770)
+++ opts.c	(working copy)
@@ -917,7 +917,6 @@ decode_options (unsigned int argc, const
   flag_ipa_cp_clone = opt3;
   if (flag_ipa_cp_clone)
     flag_ipa_cp = 1;
-  flag_ira_loop_pressure = opt3;
 
   /* Just -O1/-O0 optimizations.  */
   opt1_max = (optimize <= 1);
Index: config/ia64/ia64.c
===================================================================
--- config/ia64/ia64.c	(revision 152769)
+++ config/ia64/ia64.c	(working copy)
@@ -5496,6 +5496,14 @@ ia64_override_options (void)
   if (TARGET_AUTO_PIC)
     target_flags |= MASK_CONST_GP;
 
+  /* Numerous experiment shows that IRA based loop pressure
+     calculation works better for RTL loop invariant motion on targets
+     with enough (>= 32) registers.  It is an expensive optimization.
+     So it is on only for peak performance.  */
+  if (optimize >= 3)
+    flag_ira_loop_pressure = 1;
+
+
   ia64_flag_schedule_insns2 = flag_schedule_insns_after_reload;
   flag_schedule_insns_after_reload = 0;
 
Index: config/rs6000/rs6000.c
===================================================================
--- config/rs6000/rs6000.c	(revision 152769)
+++ config/rs6000/rs6000.c	(working copy)
@@ -2266,6 +2266,13 @@ rs6000_override_options (const char *def
 		     | MASK_POPCNTD | MASK_VSX | MASK_ISEL | MASK_NO_UPDATE)
   };
 
+  /* Numerous experiment shows that IRA based loop pressure
+     calculation works better for RTL loop invariant motion on targets
+     with enough (>= 32) registers.  It is an expensive optimization.
+     So it is on only for peak performance.  */
+  if (optimize >= 3)
+    flag_ira_loop_pressure = 1;
+
   /* Set the pointer size.  */
   if (TARGET_64BIT)
     {

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ping: IRA-based register pressure calculation for RTL loop   invariant motion
  2009-10-19 16:21               ` Vladimir Makarov
@ 2009-10-20  2:54                 ` David Edelsohn
  2009-10-20  3:39                   ` Vladimir Makarov
  0 siblings, 1 reply; 15+ messages in thread
From: David Edelsohn @ 2009-10-20  2:54 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Richard Guenther, Steve Ellcey, gcc-patches

On Mon, Oct 19, 2009 at 12:17 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:

>> I think we could switch it on by default at -O3 for a selected group of
>> targets.  Itanium overall also improves with the new heuristics.  That
>> would
>> make it power and Itanium.
>
> The patch is below.  Ok to commit?
>>
>>  Did you try restricting the heuristics to certain
>> register classes, like SSE registers on x86_64?
>>
>>
>
> No, I did not try.  I am not sure it is worth to do  it.
>
>
> 2009-10-19  Vladimir Makarov  <vmakarov@redhat.com>
>
>   * doc/invoke.texi (fira-loop-pressure): Update default value.
>   * opts.c (decode_options): Remove default value setting for
>   flag_ira_loop_pressure.
>   * config/ia64/ia64.c (ia64_override_options): Set
>   flag_ira_loop_pressure up for -O3.
>   * config/rs6000/rs6000.c (rs6000_override_options): Ditto.

Tests inside IBM do not show this IRA feature as an overall win for
POWER.  If we figure out and fix why artificially limiting
rs6000_issue_rate to 1 for the first scheduler pass still helps (and
it does), then this would make sense.  Until then, please do not apply
this patch to rs6000.

Thanks, David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ping: IRA-based register pressure calculation for RTL loop  invariant  motion
  2009-10-20  2:54                 ` David Edelsohn
@ 2009-10-20  3:39                   ` Vladimir Makarov
  2009-10-20  9:29                     ` Richard Guenther
  2009-10-20 14:42                     ` David Edelsohn
  0 siblings, 2 replies; 15+ messages in thread
From: Vladimir Makarov @ 2009-10-20  3:39 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Richard Guenther, Steve Ellcey, gcc-patches

David Edelsohn wrote:
> On Mon, Oct 19, 2009 at 12:17 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>
>   
>>> I think we could switch it on by default at -O3 for a selected group of
>>> targets.  Itanium overall also improves with the new heuristics.  That
>>> would
>>> make it power and Itanium.
>>>       
>> The patch is below.  Ok to commit?
>>     
>>>  Did you try restricting the heuristics to certain
>>> register classes, like SSE registers on x86_64?
>>>
>>>
>>>       
>> No, I did not try.  I am not sure it is worth to do  it.
>>
>>
>> 2009-10-19  Vladimir Makarov  <vmakarov@redhat.com>
>>
>>   * doc/invoke.texi (fira-loop-pressure): Update default value.
>>   * opts.c (decode_options): Remove default value setting for
>>   flag_ira_loop_pressure.
>>   * config/ia64/ia64.c (ia64_override_options): Set
>>   flag_ira_loop_pressure up for -O3.
>>   * config/rs6000/rs6000.c (rs6000_override_options): Ditto.
>>     
>
> Tests inside IBM do not show this IRA feature as an overall win for
> POWER.  If we figure out and fix why artificially limiting
> rs6000_issue_rate to 1 for the first scheduler pass still helps (and
> it does), then this would make sense.  Until then, please do not apply
> this patch to rs6000.
>
>   
David, I think there is some misunderstanding.  It is a different 
patch.  You are probably talking about register pressure sensitive insn 
scheduling (by the way I am still working on its tuning).  This patch is 
about more accurate register pressure calculation to decide 
profitability to do RTL loop invariant motion.  SPEC2000 and polyhedron 
benchmarks for power6 shows that this patch is a clear win.  You could 
test and benchmark this patch internally in IBM to confirm or deny my 
observation if you wish.
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ping: IRA-based register pressure calculation for RTL loop   invariant motion
  2009-10-20  3:39                   ` Vladimir Makarov
@ 2009-10-20  9:29                     ` Richard Guenther
  2009-10-20 14:42                     ` David Edelsohn
  1 sibling, 0 replies; 15+ messages in thread
From: Richard Guenther @ 2009-10-20  9:29 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: David Edelsohn, Steve Ellcey, gcc-patches

On Tue, Oct 20, 2009 at 4:49 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
> David Edelsohn wrote:
>>
>> On Mon, Oct 19, 2009 at 12:17 PM, Vladimir Makarov <vmakarov@redhat.com>
>> wrote:
>>
>>
>>>>
>>>> I think we could switch it on by default at -O3 for a selected group of
>>>> targets.  Itanium overall also improves with the new heuristics.  That
>>>> would
>>>> make it power and Itanium.
>>>>
>>>
>>> The patch is below.  Ok to commit?
>>>
>>>>
>>>>  Did you try restricting the heuristics to certain
>>>> register classes, like SSE registers on x86_64?
>>>>
>>>>
>>>>
>>>
>>> No, I did not try.  I am not sure it is worth to do  it.
>>>
>>>
>>> 2009-10-19  Vladimir Makarov  <vmakarov@redhat.com>
>>>
>>>  * doc/invoke.texi (fira-loop-pressure): Update default value.
>>>  * opts.c (decode_options): Remove default value setting for
>>>  flag_ira_loop_pressure.
>>>  * config/ia64/ia64.c (ia64_override_options): Set
>>>  flag_ira_loop_pressure up for -O3.
>>>  * config/rs6000/rs6000.c (rs6000_override_options): Ditto.
>>>
>>
>> Tests inside IBM do not show this IRA feature as an overall win for
>> POWER.  If we figure out and fix why artificially limiting
>> rs6000_issue_rate to 1 for the first scheduler pass still helps (and
>> it does), then this would make sense.  Until then, please do not apply
>> this patch to rs6000.
>>
>>
>
> David, I think there is some misunderstanding.  It is a different patch.
>  You are probably talking about register pressure sensitive insn scheduling
> (by the way I am still working on its tuning).  This patch is about more
> accurate register pressure calculation to decide profitability to do RTL
> loop invariant motion.  SPEC2000 and polyhedron benchmarks for power6 shows
> that this patch is a clear win.  You could test and benchmark this patch
> internally in IBM to confirm or deny my observation if you wish.

The non-powerpc pieces of the patch are ok.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Ping: IRA-based register pressure calculation for RTL loop   invariant motion
  2009-10-20  3:39                   ` Vladimir Makarov
  2009-10-20  9:29                     ` Richard Guenther
@ 2009-10-20 14:42                     ` David Edelsohn
  1 sibling, 0 replies; 15+ messages in thread
From: David Edelsohn @ 2009-10-20 14:42 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Richard Guenther, Steve Ellcey, gcc-patches

On Mon, Oct 19, 2009 at 10:49 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:

> David, I think there is some misunderstanding.  It is a different patch.
>  You are probably talking about register pressure sensitive insn scheduling
> (by the way I am still working on its tuning).  This patch is about more
> accurate register pressure calculation to decide profitability to do RTL
> loop invariant motion.  SPEC2000 and polyhedron benchmarks for power6 shows
> that this patch is a clear win.  You could test and benchmark this patch
> internally in IBM to confirm or deny my observation if you wish.

Hi, Vlad

You are correct, I thought this was referring to the register-pressure
sensitive scheduling.  The loop-pressure sensitive analysis is fine.
You can apply that part of the patch to enable it for POWER at -O3.

Thanks, David

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-10-20 14:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-01  3:19 Ping: IRA-based register pressure calculation for RTL loop invariant motion Vladimir Makarov
2009-10-01  8:49 ` Zdenek Dvorak
2009-10-01 14:34   ` Vladimir Makarov
2009-10-14 15:21     ` Zdenek Dvorak
2009-10-14 16:36       ` Vladimir Makarov
2009-10-16 21:58         ` Richard Guenther
2009-10-17  5:32           ` Vladimir Makarov
2009-10-17 11:17             ` Richard Guenther
2009-10-19 16:21               ` Vladimir Makarov
2009-10-20  2:54                 ` David Edelsohn
2009-10-20  3:39                   ` Vladimir Makarov
2009-10-20  9:29                     ` Richard Guenther
2009-10-20 14:42                     ` David Edelsohn
2009-10-13 22:25 ` Ping 2: " Vladimir Makarov
2009-10-14  9:48   ` Richard Guenther

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).