public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Do not imply -fweb with -fpeel-loops
@ 2016-05-31 13:09 Jan Hubicka
  2016-05-31 13:40 ` Richard Biener
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Hubicka @ 2016-05-31 13:09 UTC (permalink / raw)
  To: gcc-patches

Hi,
enabling -fpeel-loops with -O3 and -Ofast had unexpected effect of enabling
-fweb and -frename-registers.  This patch drop -fweb from list of passes that
are implied by -fpeel-loops because it is no longer needed since we do peeling
at tree instead of RTL and the out-of-ssa pass cares about this.

I will look into rename registers separately. My understanding is that rename
registers is not really that specific for -funroll-loops or -fpeel-loops.
Loop unrolling may make it bit more useful because it increases expected basic
block size and thus enables scheduler to do more.

Rather it depends on target whether register renaming is win. On out of order targets
where scheduling is not that imprtant it does not seem to be worth the compile time
and code size effects, while on targets that depends on scheduling it does.
Tonight tests on x86 shows improvmeents in botan which I think are related to renaming
(because they also improved when Bern temporarily enabled it few days back),
measurable code size regression (which is probably fixable if we make rename-registers
to not increase instruction encoding size) and noticeable compile time slowdown
(not sure if it is from regrename itself that is quite easy pass or from the fact
that scheduler has more freedom and thus work harder).

On the other hand both SPECint and SPECfp improved by about 2% on Itanium and
code size reduced.  I think similar effect should happen on superscalar inorder
tarets with constant size of instruction encoding.

I would say we should handle -frename-registers same way as we do -fschedule-insns.
I.e. enable/disable it on target basis rather than budnle it to loop unrolling.
But that is for future patch.  I will commit this and propose patch making -fpeel-loops
independent of rename-registers next

Honza

Bootstrapped/regtested x86_64-linux, will commit it shortly.

	* loop-init.c (gate): Do not enale RTL loop unroller with -fpeel-loops.
	It no longer does that.
	* toplev.c (process_options): Do not enable flag_web with -fpeel-loops.
Index: loop-init.c
===================================================================
--- loop-init.c	(revision 236894)
+++ loop-init.c	(working copy)
@@ -560,7 +560,7 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *)
     {
-      return (flag_peel_loops || flag_unroll_loops || flag_unroll_all_loops);
+      return (flag_unroll_loops || flag_unroll_all_loops);
     }
 
   virtual unsigned int execute (function *);
Index: toplev.c
===================================================================
--- toplev.c	(revision 236894)
+++ toplev.c	(working copy)
@@ -1296,7 +1296,7 @@ process_options (void)
 
   /* web and rename-registers help when run after loop unrolling.  */
   if (flag_web == AUTODETECT_VALUE)
-    flag_web = flag_unroll_loops || flag_peel_loops;
+    flag_web = flag_unroll_loops;
 
   if (flag_rename_registers == AUTODETECT_VALUE)
     flag_rename_registers = flag_unroll_loops || flag_peel_loops;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Do not imply -fweb with -fpeel-loops
  2016-05-31 13:09 Do not imply -fweb with -fpeel-loops Jan Hubicka
@ 2016-05-31 13:40 ` Richard Biener
  2016-05-31 13:50   ` Jan Hubicka
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Biener @ 2016-05-31 13:40 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches

On Tue, May 31, 2016 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> enabling -fpeel-loops with -O3 and -Ofast had unexpected effect of enabling
> -fweb and -frename-registers.  This patch drop -fweb from list of passes that
> are implied by -fpeel-loops because it is no longer needed since we do peeling
> at tree instead of RTL and the out-of-ssa pass cares about this.
>
> I will look into rename registers separately. My understanding is that rename
> registers is not really that specific for -funroll-loops or -fpeel-loops.
> Loop unrolling may make it bit more useful because it increases expected basic
> block size and thus enables scheduler to do more.
>
> Rather it depends on target whether register renaming is win. On out of order targets
> where scheduling is not that imprtant it does not seem to be worth the compile time
> and code size effects, while on targets that depends on scheduling it does.
> Tonight tests on x86 shows improvmeents in botan which I think are related to renaming
> (because they also improved when Bern temporarily enabled it few days back),
> measurable code size regression (which is probably fixable if we make rename-registers
> to not increase instruction encoding size) and noticeable compile time slowdown
> (not sure if it is from regrename itself that is quite easy pass or from the fact
> that scheduler has more freedom and thus work harder).
>
> On the other hand both SPECint and SPECfp improved by about 2% on Itanium and
> code size reduced.  I think similar effect should happen on superscalar inorder
> tarets with constant size of instruction encoding.
>
> I would say we should handle -frename-registers same way as we do -fschedule-insns.
> I.e. enable/disable it on target basis rather than budnle it to loop unrolling.
> But that is for future patch.  I will commit this and propose patch making -fpeel-loops
> independent of rename-registers next

Ok.  Does -fweb still do what its documentation says (enable register
allocation to
work on pseudos directly)?  Given its downside and strong GIMPLE
optimizations maybe
it is time to remove it?

Thanks,
Richard.

> Honza
>
> Bootstrapped/regtested x86_64-linux, will commit it shortly.
>
>         * loop-init.c (gate): Do not enale RTL loop unroller with -fpeel-loops.
>         It no longer does that.
>         * toplev.c (process_options): Do not enable flag_web with -fpeel-loops.
> Index: loop-init.c
> ===================================================================
> --- loop-init.c (revision 236894)
> +++ loop-init.c (working copy)
> @@ -560,7 +560,7 @@ public:
>    /* opt_pass methods: */
>    virtual bool gate (function *)
>      {
> -      return (flag_peel_loops || flag_unroll_loops || flag_unroll_all_loops);
> +      return (flag_unroll_loops || flag_unroll_all_loops);
>      }
>
>    virtual unsigned int execute (function *);
> Index: toplev.c
> ===================================================================
> --- toplev.c    (revision 236894)
> +++ toplev.c    (working copy)
> @@ -1296,7 +1296,7 @@ process_options (void)
>
>    /* web and rename-registers help when run after loop unrolling.  */
>    if (flag_web == AUTODETECT_VALUE)
> -    flag_web = flag_unroll_loops || flag_peel_loops;
> +    flag_web = flag_unroll_loops;
>
>    if (flag_rename_registers == AUTODETECT_VALUE)
>      flag_rename_registers = flag_unroll_loops || flag_peel_loops;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Do not imply -fweb with -fpeel-loops
  2016-05-31 13:40 ` Richard Biener
@ 2016-05-31 13:50   ` Jan Hubicka
  2016-05-31 14:49     ` Richard Biener
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Hubicka @ 2016-05-31 13:50 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, GCC Patches

> > I would say we should handle -frename-registers same way as we do -fschedule-insns.
> > I.e. enable/disable it on target basis rather than budnle it to loop unrolling.
> > But that is for future patch.  I will commit this and propose patch making -fpeel-loops
> > independent of rename-registers next
> 
> Ok.  Does -fweb still do what its documentation says (enable register
> allocation to
> work on pseudos directly)?  Given its downside and strong GIMPLE
> optimizations maybe
> it is time to remove it?

I will wait with comitting the renaming patch and we will see the effect with
-fno-unroll-loops on ia64 tester (it probably pays back more on in-order
target) and we can test effect with -funroll-loops by patching it tomorrow.

I guess IRA does live range pslitting by itself.  Register allocator is not the
only pass which does not like reuse of pseudo.  Web originally also made
-fschedule-insns (not -fschedule-insns2) and CSE to work harder.

In tree-SSA world most of reuse is avoided by out-of-ssa pass. Random reuse
these days is limited to loop unrolling and RTL expansion I think. Without
unroling webizer renames 204 pseudos on tramp3d that seems close to 0. I can
try to dig into that. With -funroll-all-loops it renames 17309 registers.
Of course it would be possible to write unroller specific renamer, but it would
hardly be any wasier than webizer.

Honza

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Do not imply -fweb with -fpeel-loops
  2016-05-31 13:50   ` Jan Hubicka
@ 2016-05-31 14:49     ` Richard Biener
  2016-05-31 16:29       ` Bill Schmidt
  2016-06-01 14:31       ` Jan Hubicka
  0 siblings, 2 replies; 6+ messages in thread
From: Richard Biener @ 2016-05-31 14:49 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches

On Tue, May 31, 2016 at 1:42 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> > I would say we should handle -frename-registers same way as we do -fschedule-insns.
>> > I.e. enable/disable it on target basis rather than budnle it to loop unrolling.
>> > But that is for future patch.  I will commit this and propose patch making -fpeel-loops
>> > independent of rename-registers next
>>
>> Ok.  Does -fweb still do what its documentation says (enable register
>> allocation to
>> work on pseudos directly)?  Given its downside and strong GIMPLE
>> optimizations maybe
>> it is time to remove it?
>
> I will wait with comitting the renaming patch and we will see the effect with
> -fno-unroll-loops on ia64 tester (it probably pays back more on in-order
> target) and we can test effect with -funroll-loops by patching it tomorrow.
>
> I guess IRA does live range pslitting by itself.  Register allocator is not the
> only pass which does not like reuse of pseudo.  Web originally also made
> -fschedule-insns (not -fschedule-insns2) and CSE to work harder.
>
> In tree-SSA world most of reuse is avoided by out-of-ssa pass. Random reuse
> these days is limited to loop unrolling and RTL expansion I think. Without
> unroling webizer renames 204 pseudos on tramp3d that seems close to 0. I can
> try to dig into that. With -funroll-all-loops it renames 17309 registers.
> Of course it would be possible to write unroller specific renamer, but it would
> hardly be any wasier than webizer.

And we have things like -fsplit-ivs-in-unroller that does part of it...

Richard.

> Honza

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Do not imply -fweb with -fpeel-loops
  2016-05-31 14:49     ` Richard Biener
@ 2016-05-31 16:29       ` Bill Schmidt
  2016-06-01 14:31       ` Jan Hubicka
  1 sibling, 0 replies; 6+ messages in thread
From: Bill Schmidt @ 2016-05-31 16:29 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, GCC Patches

If the decision is taken to remove -fweb, please give me a heads-up as I have a target-specific pass that shares infrastructure with it.  I can factor that out to facilitate the removal; just let me know.

Thanks!

-- Bill

Bill Schmidt, Ph.D.
GCC for LInux on Power
Linux on Power Toolchain
IBM Linux Technology Center
wschmidt@linux.vnet.ibm.com




> On May 31, 2016, at 8:50 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Tue, May 31, 2016 at 1:42 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>> I would say we should handle -frename-registers same way as we do -fschedule-insns.
>>>> I.e. enable/disable it on target basis rather than budnle it to loop unrolling.
>>>> But that is for future patch.  I will commit this and propose patch making -fpeel-loops
>>>> independent of rename-registers next
>>> 
>>> Ok.  Does -fweb still do what its documentation says (enable register
>>> allocation to
>>> work on pseudos directly)?  Given its downside and strong GIMPLE
>>> optimizations maybe
>>> it is time to remove it?
>> 
>> I will wait with comitting the renaming patch and we will see the effect with
>> -fno-unroll-loops on ia64 tester (it probably pays back more on in-order
>> target) and we can test effect with -funroll-loops by patching it tomorrow.
>> 
>> I guess IRA does live range pslitting by itself.  Register allocator is not the
>> only pass which does not like reuse of pseudo.  Web originally also made
>> -fschedule-insns (not -fschedule-insns2) and CSE to work harder.
>> 
>> In tree-SSA world most of reuse is avoided by out-of-ssa pass. Random reuse
>> these days is limited to loop unrolling and RTL expansion I think. Without
>> unroling webizer renames 204 pseudos on tramp3d that seems close to 0. I can
>> try to dig into that. With -funroll-all-loops it renames 17309 registers.
>> Of course it would be possible to write unroller specific renamer, but it would
>> hardly be any wasier than webizer.
> 
> And we have things like -fsplit-ivs-in-unroller that does part of it...
> 
> Richard.
> 
>> Honza
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Do not imply -fweb with -fpeel-loops
  2016-05-31 14:49     ` Richard Biener
  2016-05-31 16:29       ` Bill Schmidt
@ 2016-06-01 14:31       ` Jan Hubicka
  1 sibling, 0 replies; 6+ messages in thread
From: Jan Hubicka @ 2016-06-01 14:31 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, GCC Patches

> >
> > In tree-SSA world most of reuse is avoided by out-of-ssa pass. Random reuse
> > these days is limited to loop unrolling and RTL expansion I think. Without
> > unroling webizer renames 204 pseudos on tramp3d that seems close to 0. I can
> > try to dig into that. With -funroll-all-loops it renames 17309 registers.
> > Of course it would be possible to write unroller specific renamer, but it would
> > hardly be any wasier than webizer.
> 
> And we have things like -fsplit-ivs-in-unroller that does part of it...

That is orthogonal to splitting done by web pass though. On Terbium I see two perofmrance change
today which is probably for turning off the webizer again. It is Eon 975->955 and Mesa 810->778.
This is w/o unrolling so not that many splits are triggered. I will collect the data with unrolling.

Honza

> 
> Richard.
> 
> > Honza

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-06-01 14:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-31 13:09 Do not imply -fweb with -fpeel-loops Jan Hubicka
2016-05-31 13:40 ` Richard Biener
2016-05-31 13:50   ` Jan Hubicka
2016-05-31 14:49     ` Richard Biener
2016-05-31 16:29       ` Bill Schmidt
2016-06-01 14:31       ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).