From: Vladimir Makarov <vmakarov@redhat.com>
To: Richard Guenther <richard.guenther@gmail.com>,
David Edelsohn <dje@watson.ibm.com>,
Steve Ellcey <sje@cup.hp.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: Ping: IRA-based register pressure calculation for RTL loop invariant motion
Date: Mon, 19 Oct 2009 16:21:00 -0000 [thread overview]
Message-ID: <4ADC9132.2030000@redhat.com> (raw)
In-Reply-To: <84fc9c000910170409r876afe9nf86986ffb1e698d3@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 6020 bytes --]
Richard Guenther wrote:
> On Sat, Oct 17, 2009 at 5:34 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>
>> Richard Guenther wrote:
>>
>>> On Wed, Oct 14, 2009 at 6:27 PM, Vladimir Makarov <vmakarov@redhat.com>
>>> wrote:
>>>
>>>
>>>> Zdenek Dvorak wrote:
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>>>>> + if (i < ira_reg_class_cover_size)
>>>>>>>> + size_cost = comp_cost + 10;
>>>>>>>> + else
>>>>>>>> + size_cost = 0;
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> Including comp_cost in size_cost makes no sense (this would prevent us
>>>>>>> from
>>>>>>> moving even very costly invariants out of the loop if we run out of
>>>>>>> registers).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> That is exactly what I intended. As I wrote above, I tried a lot of
>>>>>> heuristics with different parameters which decided to move loop
>>>>>> invariant
>>>>>> depending on spill cost and loop invariant cost. But they don't work
>>>>>> well
>>>>>> at least for x86/x86_64 and power6. I have some speculation for this.
>>>>>> x86/x86_64 is OOO processors these days. And costly invariant will
>>>>>> be
>>>>>> hidden because usually the invariant has a lot of freedom to be
>>>>>> executed
>>>>>> out-of-order. For power6, long latency is hidden by insn scheduling.
>>>>>> It
>>>>>> is hard to me find a processor where it will be important. Another
>>>>>> reason
>>>>>> for this, it is very hard to evaluate accurately spill cost at this
>>>>>> stage.
>>>>>> So I decided not to use combination of register pressure and
>>>>>> invariant
>>>>>> cost in my approach.
>>>>>>
>>>>>>
>>>>> could you please add this reasoning to the comment? Another reason why
>>>>> preventing the invariant motion does not hurt might be that all
>>>>> expensive
>>>>> invariants were already moved out of the loop by PRE and gimple
>>>>> invariant
>>>>> motion pass.
>>>>>
>>>>>
>>>>>
>>>>>> + for (i = 0; i < ira_reg_class_cover_size; i++)
>>>>>> + {
>>>>>> + cover_class = ira_reg_class_cover[i];
>>>>>> + if ((int) new_regs[cover_class]
>>>>>> + + (int) regs_needed[cover_class]
>>>>>> + + LOOP_DATA (curr_loop)->max_reg_pressure[cover_class]
>>>>>> + + IRA_LOOP_RESERVED_REGS
>>>>>> + - ira_available_class_regs[cover_class] > 0)
>>>>>> + break;
>>>>>> + }
>>>>>>
>>>>>>
>>>>> It might be clearer to write this as ... >
>>>>> ira_available_class_regs[cover_class] instead
>>>>> of ... - ira_available_class_regs[cover_class] > 0. Otherwise, the
>>>>> patch
>>>>> is OK.
>>>>>
>>>>>
>>>>>
>>>> Zdenek, thanks for the additional comments. I incorporated them into the
>>>> patch just before committing. Here is the affected patch part:
>>>>
>>>>
>>> I think this consistently regressed both compile-time and runtime for
>>> Polyhedron on x86_64. For Itanium the story isn't clear, but effects
>>> are seen there as well (it's also the only one I see off-noise effects
>>> on SPEC 2000 - significant ups and downs).
>>>
>>>
>>>
>> Yes, it is expensive optimization (at least 3 additional passes
>> through RTL insns one for calculating register pressure and two very
>> expensive passes for finding register classes for pseudos). It is
>> clearly seen from SPEC compilation time graphs on
>>
>> http://vmakarov.fedorapeople.org/spec
>>
>> for 2 last benchmarking. Therefore I proposed it only for -O3.
>>
>> Overall SPEC2000 scores are practically the same on x86/x86_64.
>>
>> As for Polyhedron benchmarks, here is my results on Core I7:
>>
>> first: -ffast-math -funroll-loops -O3 -fno-ira-loop-pressure
>> second: -ffast-math -funroll-loops -O3 -fira-loop-pressure
>>
>> x86:
>> Geometric Mean Execution Time = 12.84 seconds
>> Geometric Mean Execution Time = 12.82 seconds
>>
>> x86_64:
>> Geometric Mean Execution Time = 9.89 seconds
>> Geometric Mean Execution Time = 9.91 seconds
>>
>> On power6:
>> first: -mtune=power6 -ffast-math -funroll-loops -O3 -fno-ira-loop-pressure
>> second: -mtune=power6 -ffast-math -funroll-loops -O3 -fira-loop-pressure
>>
>> Geometric Mean Execution Time = 19.22 seconds
>> Geometric Mean Execution Time = 19.04 seconds
>>
>> As I wrote earlier the winner of the optimization usage will be
>> loops with pressure lower (but not too lower) than #registers. For
>> x86/x86_64, practically all loops have pressure more than #registers.
>> For such loops, evaluation of invariant cost vs spill cost would be
>> important. But at this stage, spill cost is impossible to evaluate
>> accurately. So usage of old and new loop invariant motion criteria on
>> processors similar x86/x86_64 will give different results for particular
>> tests (some tests better, some worse) but overall score will be
>> practically the same.
>>
>> Probably, there is no sense to use IRA-based register pressure calculation
>> for all targets (including x86/x86_64) but for power it is a clear win as it
>> is seen from polyhedron and as I reported for SPEC2000.
>>
>> So we could switch it off by default for -O3. What do you think about this
>> solution, Richard?
>>
>
> I think we could switch it on by default at -O3 for a selected group of
> targets. Itanium overall also improves with the new heuristics. That would
> make it power and Itanium.
The patch is below. Ok to commit?
> Did you try restricting the heuristics to certain
> register classes, like SSE registers on x86_64?
>
>
No, I did not try. I am not sure it is worth to do it.
2009-10-19 Vladimir Makarov <vmakarov@redhat.com>
* doc/invoke.texi (fira-loop-pressure): Update default value.
* opts.c (decode_options): Remove default value setting for
flag_ira_loop_pressure.
* config/ia64/ia64.c (ia64_override_options): Set
flag_ira_loop_pressure up for -O3.
* config/rs6000/rs6000.c (rs6000_override_options): Ditto.
[-- Attachment #2: ira-pressure-loop.patch --]
[-- Type: text/plain, Size: 2985 bytes --]
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi (revision 152770)
+++ doc/invoke.texi (working copy)
@@ -5720,8 +5720,7 @@ invoking @option{-O2} on programs that u
Optimize yet more. @option{-O3} turns on all optimizations specified
by @option{-O2} and also turns on the @option{-finline-functions},
@option{-funswitch-loops}, @option{-fpredictive-commoning},
-@option{-fgcse-after-reload}, @option{-ftree-vectorize} and
-@option{-fira-loop-pressure} options.
+@option{-fgcse-after-reload} and @option{-ftree-vectorize} options.
@item -O0
@opindex O0
@@ -6222,9 +6221,10 @@ architectures with big regular register
@opindex fira-loop-pressure
Use IRA to evaluate register pressure in loops for decision to move
loop invariants. Usage of this option usually results in generation
-of faster and smaller code but can slow compiler down.
+of faster and smaller code on machines with big register files (>= 32
+registers) but it can slow compiler down.
-This option is enabled at level @option{-O3}.
+This option is enabled at level @option{-O3} for some targets.
@item -fno-ira-share-save-slots
@opindex fno-ira-share-save-slots
Index: opts.c
===================================================================
--- opts.c (revision 152770)
+++ opts.c (working copy)
@@ -917,7 +917,6 @@ decode_options (unsigned int argc, const
flag_ipa_cp_clone = opt3;
if (flag_ipa_cp_clone)
flag_ipa_cp = 1;
- flag_ira_loop_pressure = opt3;
/* Just -O1/-O0 optimizations. */
opt1_max = (optimize <= 1);
Index: config/ia64/ia64.c
===================================================================
--- config/ia64/ia64.c (revision 152769)
+++ config/ia64/ia64.c (working copy)
@@ -5496,6 +5496,14 @@ ia64_override_options (void)
if (TARGET_AUTO_PIC)
target_flags |= MASK_CONST_GP;
+ /* Numerous experiment shows that IRA based loop pressure
+ calculation works better for RTL loop invariant motion on targets
+ with enough (>= 32) registers. It is an expensive optimization.
+ So it is on only for peak performance. */
+ if (optimize >= 3)
+ flag_ira_loop_pressure = 1;
+
+
ia64_flag_schedule_insns2 = flag_schedule_insns_after_reload;
flag_schedule_insns_after_reload = 0;
Index: config/rs6000/rs6000.c
===================================================================
--- config/rs6000/rs6000.c (revision 152769)
+++ config/rs6000/rs6000.c (working copy)
@@ -2266,6 +2266,13 @@ rs6000_override_options (const char *def
| MASK_POPCNTD | MASK_VSX | MASK_ISEL | MASK_NO_UPDATE)
};
+ /* Numerous experiment shows that IRA based loop pressure
+ calculation works better for RTL loop invariant motion on targets
+ with enough (>= 32) registers. It is an expensive optimization.
+ So it is on only for peak performance. */
+ if (optimize >= 3)
+ flag_ira_loop_pressure = 1;
+
/* Set the pointer size. */
if (TARGET_64BIT)
{
next prev parent reply other threads:[~2009-10-19 16:17 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-01 3:19 Vladimir Makarov
2009-10-01 8:49 ` Zdenek Dvorak
2009-10-01 14:34 ` Vladimir Makarov
2009-10-14 15:21 ` Zdenek Dvorak
2009-10-14 16:36 ` Vladimir Makarov
2009-10-16 21:58 ` Richard Guenther
2009-10-17 5:32 ` Vladimir Makarov
2009-10-17 11:17 ` Richard Guenther
2009-10-19 16:21 ` Vladimir Makarov [this message]
2009-10-20 2:54 ` David Edelsohn
2009-10-20 3:39 ` Vladimir Makarov
2009-10-20 9:29 ` Richard Guenther
2009-10-20 14:42 ` David Edelsohn
2009-10-13 22:25 ` Ping 2: " Vladimir Makarov
2009-10-14 9:48 ` Richard Guenther
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ADC9132.2030000@redhat.com \
--to=vmakarov@redhat.com \
--cc=dje@watson.ibm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=richard.guenther@gmail.com \
--cc=sje@cup.hp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).