Re: Ping: IRA-based register pressure calculation for RTL loop invariant motion

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Vladimir Makarov <vmakarov@redhat.com>
To: Richard Guenther <richard.guenther@gmail.com>,
	        David Edelsohn <dje@watson.ibm.com>,
	Steve Ellcey <sje@cup.hp.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: Ping: IRA-based register pressure calculation for RTL loop  invariant  motion
Date: Mon, 19 Oct 2009 16:21:00 -0000	[thread overview]
Message-ID: <4ADC9132.2030000@redhat.com> (raw)
In-Reply-To: <84fc9c000910170409r876afe9nf86986ffb1e698d3@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6020 bytes --]

Richard Guenther wrote:
> On Sat, Oct 17, 2009 at 5:34 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>   
>> Richard Guenther wrote:
>>     
>>> On Wed, Oct 14, 2009 at 6:27 PM, Vladimir Makarov <vmakarov@redhat.com>
>>> wrote:
>>>
>>>       
>>>> Zdenek Dvorak wrote:
>>>>
>>>>         
>>>>> Hi,
>>>>>
>>>>>
>>>>>           
>>>>>>>> +      if (i < ira_reg_class_cover_size)
>>>>>>>> +       size_cost = comp_cost + 10;
>>>>>>>> +      else
>>>>>>>> +       size_cost = 0;
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> Including comp_cost in size_cost makes no sense (this would prevent us
>>>>>>> from
>>>>>>> moving even very costly invariants out of the loop if we run out of
>>>>>>> registers).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> That is exactly what I intended.  As I wrote above, I tried a lot of
>>>>>>  heuristics with different parameters which decided to move loop
>>>>>>  invariant
>>>>>> depending on spill cost and loop invariant cost.  But they  don't  work
>>>>>> well
>>>>>> at least for x86/x86_64 and power6.  I have some  speculation for this.
>>>>>>  x86/x86_64 is OOO processors these days.  And  costly invariant will
>>>>>> be
>>>>>> hidden because usually the invariant has a lot  of freedom to be
>>>>>> executed
>>>>>> out-of-order.  For power6, long latency is  hidden by insn scheduling.
>>>>>>  It
>>>>>> is hard to me find a processor where it  will be important.  Another
>>>>>> reason
>>>>>> for this, it is very hard to evaluate  accurately spill cost at this
>>>>>> stage.
>>>>>>  So I decided not to use  combination of register pressure and
>>>>>> invariant
>>>>>> cost in my approach.
>>>>>>
>>>>>>             
>>>>> could you please add this reasoning to the comment?  Another reason why
>>>>> preventing the invariant motion does not hurt might be that all
>>>>> expensive
>>>>> invariants were already moved out of the loop by PRE and gimple
>>>>> invariant
>>>>> motion pass.
>>>>>
>>>>>
>>>>>           
>>>>>> +      for (i = 0; i < ira_reg_class_cover_size; i++)
>>>>>> +       {
>>>>>> +         cover_class = ira_reg_class_cover[i];
>>>>>> +         if ((int) new_regs[cover_class]
>>>>>> +             + (int) regs_needed[cover_class]
>>>>>> +             + LOOP_DATA (curr_loop)->max_reg_pressure[cover_class]
>>>>>> +             + IRA_LOOP_RESERVED_REGS
>>>>>> +             - ira_available_class_regs[cover_class] > 0)
>>>>>> +           break;
>>>>>> +       }
>>>>>>
>>>>>>             
>>>>> It might be clearer to write this as ... >
>>>>> ira_available_class_regs[cover_class] instead
>>>>> of ... - ira_available_class_regs[cover_class] > 0.  Otherwise, the
>>>>> patch
>>>>> is OK.
>>>>>
>>>>>
>>>>>           
>>>> Zdenek, thanks for the additional comments.  I incorporated them into the
>>>> patch just before committing.  Here is the affected patch part:
>>>>
>>>>         
>>> I think this consistently regressed both compile-time and runtime for
>>> Polyhedron on x86_64.  For Itanium the story isn't clear, but effects
>>> are seen there as well (it's also the only one I see off-noise effects
>>> on SPEC 2000 - significant ups and downs).
>>>
>>>
>>>       
>>  Yes, it is expensive optimization (at least 3 additional passes
>> through RTL insns one for calculating register pressure and two very
>> expensive passes for finding register classes for pseudos).  It is
>> clearly seen from SPEC compilation time graphs on
>>
>> http://vmakarov.fedorapeople.org/spec
>>
>> for 2 last benchmarking.  Therefore I proposed it only for -O3.
>>
>> Overall SPEC2000 scores are practically the same on x86/x86_64.
>>
>> As for Polyhedron benchmarks, here is my results on Core I7:
>>
>> first:  -ffast-math -funroll-loops -O3 -fno-ira-loop-pressure
>> second: -ffast-math -funroll-loops -O3 -fira-loop-pressure
>>
>> x86:
>> Geometric Mean Execution Time =      12.84 seconds
>> Geometric Mean Execution Time =      12.82 seconds
>>
>> x86_64:
>> Geometric Mean Execution Time =       9.89 seconds
>> Geometric Mean Execution Time =       9.91 seconds
>>
>> On power6:
>> first:  -mtune=power6 -ffast-math -funroll-loops -O3 -fno-ira-loop-pressure
>> second: -mtune=power6 -ffast-math -funroll-loops -O3 -fira-loop-pressure
>>
>> Geometric Mean Execution Time =      19.22 seconds
>> Geometric Mean Execution Time =      19.04 seconds
>>
>>  As I wrote earlier the winner of the optimization usage will be
>> loops with pressure lower (but not too lower) than #registers.  For
>> x86/x86_64, practically all loops have pressure more than #registers.
>> For such loops, evaluation of invariant cost vs spill cost would be
>> important.  But at this stage, spill cost is impossible to evaluate
>> accurately.  So usage of old and new loop invariant motion criteria on
>> processors similar x86/x86_64 will give different results for particular
>> tests (some tests better, some worse) but overall score will be
>> practically the same.
>>
>>  Probably, there is no sense to use IRA-based register pressure calculation
>> for all targets (including x86/x86_64) but for power it is a clear win as it
>> is seen from polyhedron and as I reported for SPEC2000.
>>
>>  So we could switch it off by default for -O3.  What do you think about this
>> solution, Richard?
>>     
>
> I think we could switch it on by default at -O3 for a selected group of
> targets.  Itanium overall also improves with the new heuristics.  That would
> make it power and Itanium.
The patch is below.  Ok to commit?
>   Did you try restricting the heuristics to certain
> register classes, like SSE registers on x86_64?
>
>   
No, I did not try.  I am not sure it is worth to do  it.


2009-10-19  Vladimir Makarov  <vmakarov@redhat.com>

    * doc/invoke.texi (fira-loop-pressure): Update default value.
    * opts.c (decode_options): Remove default value setting for
    flag_ira_loop_pressure.
    * config/ia64/ia64.c (ia64_override_options): Set
    flag_ira_loop_pressure up for -O3.
    * config/rs6000/rs6000.c (rs6000_override_options): Ditto.
    


[-- Attachment #2: ira-pressure-loop.patch --]
[-- Type: text/plain, Size: 2985 bytes --]

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 152770)
+++ doc/invoke.texi	(working copy)
@@ -5720,8 +5720,7 @@ invoking @option{-O2} on programs that u
 Optimize yet more.  @option{-O3} turns on all optimizations specified
 by @option{-O2} and also turns on the @option{-finline-functions},
 @option{-funswitch-loops}, @option{-fpredictive-commoning},
-@option{-fgcse-after-reload}, @option{-ftree-vectorize} and
-@option{-fira-loop-pressure} options.
+@option{-fgcse-after-reload} and @option{-ftree-vectorize} options.
 
 @item -O0
 @opindex O0
@@ -6222,9 +6221,10 @@ architectures with big regular register 
 @opindex fira-loop-pressure
 Use IRA to evaluate register pressure in loops for decision to move
 loop invariants.  Usage of this option usually results in generation
-of faster and smaller code but can slow compiler down.
+of faster and smaller code on machines with big register files (>= 32
+registers) but it can slow compiler down.
 
-This option is enabled at level @option{-O3}.
+This option is enabled at level @option{-O3} for some targets.
 
 @item -fno-ira-share-save-slots
 @opindex fno-ira-share-save-slots
Index: opts.c
===================================================================
--- opts.c	(revision 152770)
+++ opts.c	(working copy)
@@ -917,7 +917,6 @@ decode_options (unsigned int argc, const
   flag_ipa_cp_clone = opt3;
   if (flag_ipa_cp_clone)
     flag_ipa_cp = 1;
-  flag_ira_loop_pressure = opt3;
 
   /* Just -O1/-O0 optimizations.  */
   opt1_max = (optimize <= 1);
Index: config/ia64/ia64.c
===================================================================
--- config/ia64/ia64.c	(revision 152769)
+++ config/ia64/ia64.c	(working copy)
@@ -5496,6 +5496,14 @@ ia64_override_options (void)
   if (TARGET_AUTO_PIC)
     target_flags |= MASK_CONST_GP;
 
+  /* Numerous experiment shows that IRA based loop pressure
+     calculation works better for RTL loop invariant motion on targets
+     with enough (>= 32) registers.  It is an expensive optimization.
+     So it is on only for peak performance.  */
+  if (optimize >= 3)
+    flag_ira_loop_pressure = 1;
+
+
   ia64_flag_schedule_insns2 = flag_schedule_insns_after_reload;
   flag_schedule_insns_after_reload = 0;
 
Index: config/rs6000/rs6000.c
===================================================================
--- config/rs6000/rs6000.c	(revision 152769)
+++ config/rs6000/rs6000.c	(working copy)
@@ -2266,6 +2266,13 @@ rs6000_override_options (const char *def
 		     | MASK_POPCNTD | MASK_VSX | MASK_ISEL | MASK_NO_UPDATE)
   };
 
+  /* Numerous experiment shows that IRA based loop pressure
+     calculation works better for RTL loop invariant motion on targets
+     with enough (>= 32) registers.  It is an expensive optimization.
+     So it is on only for peak performance.  */
+  if (optimize >= 3)
+    flag_ira_loop_pressure = 1;
+
   /* Set the pointer size.  */
   if (TARGET_64BIT)
     {

next prev parent reply	other threads:[~2009-10-19 16:17 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-01  3:19 Vladimir Makarov
2009-10-01  8:49 ` Zdenek Dvorak
2009-10-01 14:34   ` Vladimir Makarov
2009-10-14 15:21     ` Zdenek Dvorak
2009-10-14 16:36       ` Vladimir Makarov
2009-10-16 21:58         ` Richard Guenther
2009-10-17  5:32           ` Vladimir Makarov
2009-10-17 11:17             ` Richard Guenther
2009-10-19 16:21               ` Vladimir Makarov [this message]
2009-10-20  2:54                 ` David Edelsohn
2009-10-20  3:39                   ` Vladimir Makarov
2009-10-20  9:29                     ` Richard Guenther
2009-10-20 14:42                     ` David Edelsohn
2009-10-13 22:25 ` Ping 2: " Vladimir Makarov
2009-10-14  9:48   ` Richard Guenther

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ADC9132.2030000@redhat.com \
    --to=vmakarov@redhat.com \
    --cc=dje@watson.ibm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.guenther@gmail.com \
    --cc=sje@cup.hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).