public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] rs6000: Enable -fvariable-expansion-in-unroller by default
@ 2019-06-27  3:23 Bill Schmidt
  2019-06-27  9:34 ` Richard Biener
  2019-06-27 11:32 ` Segher Boessenkool
  0 siblings, 2 replies; 7+ messages in thread
From: Bill Schmidt @ 2019-06-27  3:23 UTC (permalink / raw)
  To: GCC Patches; +Cc: Segher Boessenkool

Hi,

We've done some experimenting and realized that the subject option almost
always provide improved performance for Power when the loop unroller is
enabled.  So this patch turns that flag on by default for us.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
Is this OK for trunk?

Thanks!
Bill


2019-06-27  Bill Schmidt  <wschmidt@linux.ibm.com>

	* config/rs6000/rs6000.c (rs6000_option_override_internal): Enable
	-fvariable-expansion-in-unroller by default.


Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 272719)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -3616,6 +3616,11 @@ rs6000_option_override_internal (bool global_init_
       && !global_options_set.x_flag_asynchronous_unwind_tables)
     flag_asynchronous_unwind_tables = 1;
 
+  /* -fvariable-expansion-in-unroller is a win for POWER whenever the
+     loop unroller is active.  It is only checked during unrolling, so
+     we can just set it on by default.  */
+  flag_variable_expansion_in_unroller = 1;
+
   /* Set the pointer size.  */
   if (TARGET_64BIT)
     {

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] rs6000: Enable -fvariable-expansion-in-unroller by default
  2019-06-27  3:23 [PATCH] rs6000: Enable -fvariable-expansion-in-unroller by default Bill Schmidt
@ 2019-06-27  9:34 ` Richard Biener
  2019-06-27 11:45   ` Segher Boessenkool
  2019-06-27 11:32 ` Segher Boessenkool
  1 sibling, 1 reply; 7+ messages in thread
From: Richard Biener @ 2019-06-27  9:34 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: GCC Patches, Segher Boessenkool

On Thu, Jun 27, 2019 at 5:23 AM Bill Schmidt <wschmidt@linux.ibm.com> wrote:
>
> Hi,
>
> We've done some experimenting and realized that the subject option almost
> always provide improved performance for Power when the loop unroller is
> enabled.  So this patch turns that flag on by default for us.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
> Is this OK for trunk?

I guess it creates more freedom for combine (more single-uses) and register
allocation.  I wonder in which cases this might pessimize things?  I guess
the pre-RA scheduler might make RAs life harder with creating overlapping
life-ranges.

I guess you didn't actually investigate the nature of the improvements you saw?

Do we want to adjust the flags documentation, saying whether this is enabled
by default depends on the target (or even list them)?

Thanks,
Richard.

> Thanks!
> Bill
>
>
> 2019-06-27  Bill Schmidt  <wschmidt@linux.ibm.com>
>
>         * config/rs6000/rs6000.c (rs6000_option_override_internal): Enable
>         -fvariable-expansion-in-unroller by default.
>
>
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c  (revision 272719)
> +++ gcc/config/rs6000/rs6000.c  (working copy)
> @@ -3616,6 +3616,11 @@ rs6000_option_override_internal (bool global_init_
>        && !global_options_set.x_flag_asynchronous_unwind_tables)
>      flag_asynchronous_unwind_tables = 1;
>
> +  /* -fvariable-expansion-in-unroller is a win for POWER whenever the
> +     loop unroller is active.  It is only checked during unrolling, so
> +     we can just set it on by default.  */
> +  flag_variable_expansion_in_unroller = 1;
> +
>    /* Set the pointer size.  */
>    if (TARGET_64BIT)
>      {
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] rs6000: Enable -fvariable-expansion-in-unroller by default
  2019-06-27  3:23 [PATCH] rs6000: Enable -fvariable-expansion-in-unroller by default Bill Schmidt
  2019-06-27  9:34 ` Richard Biener
@ 2019-06-27 11:32 ` Segher Boessenkool
  1 sibling, 0 replies; 7+ messages in thread
From: Segher Boessenkool @ 2019-06-27 11:32 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: GCC Patches

Hi Bill,

On Wed, Jun 26, 2019 at 10:23:06PM -0500, Bill Schmidt wrote:
> 2019-06-27  Bill Schmidt  <wschmidt@linux.ibm.com>
> 
> 	* config/rs6000/rs6000.c (rs6000_option_override_internal): Enable
> 	-fvariable-expansion-in-unroller by default.
> 
> 
> --- gcc/config/rs6000/rs6000.c	(revision 272719)
> +++ gcc/config/rs6000/rs6000.c	(working copy)
> @@ -3616,6 +3616,11 @@ rs6000_option_override_internal (bool global_init_
>        && !global_options_set.x_flag_asynchronous_unwind_tables)
>      flag_asynchronous_unwind_tables = 1;
>  
> +  /* -fvariable-expansion-in-unroller is a win for POWER whenever the
> +     loop unroller is active.  It is only checked during unrolling, so
> +     we can just set it on by default.  */
> +  flag_variable_expansion_in_unroller = 1;

You should test global_options_set like the asynch unwind thing above,
so that a user can still turn off variable expansion if he/she so desires.


Okay with that change.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] rs6000: Enable -fvariable-expansion-in-unroller by default
  2019-06-27  9:34 ` Richard Biener
@ 2019-06-27 11:45   ` Segher Boessenkool
  2019-06-27 12:21     ` Bill Schmidt
  0 siblings, 1 reply; 7+ messages in thread
From: Segher Boessenkool @ 2019-06-27 11:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: Bill Schmidt, GCC Patches

On Thu, Jun 27, 2019 at 11:33:45AM +0200, Richard Biener wrote:
> On Thu, Jun 27, 2019 at 5:23 AM Bill Schmidt <wschmidt@linux.ibm.com> wrote:
> > We've done some experimenting and realized that the subject option almost
> > always provide improved performance for Power when the loop unroller is
> > enabled.  So this patch turns that flag on by default for us.
> 
> I guess it creates more freedom for combine (more single-uses) and register
> allocation.  I wonder in which cases this might pessimize things?  I guess
> the pre-RA scheduler might make RAs life harder with creating overlapping
> life-ranges.
> 
> I guess you didn't actually investigate the nature of the improvements you saw?

It breaks the length of dependency chains by a factor equal to the unroll
factor.  I do not know why this doesn't help a lot everywhere.  It of
course raises register pressure, maybe that is just it?

> Do we want to adjust the flags documentation, saying whether this is enabled
> by default depends on the target (or even list them)?

Good idea, thanks.


Segher

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] rs6000: Enable -fvariable-expansion-in-unroller by default
  2019-06-27 11:45   ` Segher Boessenkool
@ 2019-06-27 12:21     ` Bill Schmidt
  2019-07-01 10:30       ` Richard Biener
  0 siblings, 1 reply; 7+ messages in thread
From: Bill Schmidt @ 2019-06-27 12:21 UTC (permalink / raw)
  To: Segher Boessenkool, Richard Biener; +Cc: GCC Patches

On 6/27/19 6:45 AM, Segher Boessenkool wrote:
> On Thu, Jun 27, 2019 at 11:33:45AM +0200, Richard Biener wrote:
>> On Thu, Jun 27, 2019 at 5:23 AM Bill Schmidt <wschmidt@linux.ibm.com> wrote:
>>> We've done some experimenting and realized that the subject option almost
>>> always provide improved performance for Power when the loop unroller is
>>> enabled.  So this patch turns that flag on by default for us.
>> I guess it creates more freedom for combine (more single-uses) and register
>> allocation.  I wonder in which cases this might pessimize things?  I guess
>> the pre-RA scheduler might make RAs life harder with creating overlapping
>> life-ranges.
>>
>> I guess you didn't actually investigate the nature of the improvements you saw?
> It breaks the length of dependency chains by a factor equal to the unroll
> factor.  I do not know why this doesn't help a lot everywhere.  It of
> course raises register pressure, maybe that is just it?

Right, it's all about breaking dependencies to more efficiently exploit
the microarchitecture.  By default, variable expansion in GCC is quite
conservative, creating only two reduction streams out of one, so it's
pretty rare for it to cause spill.  This can be adjusted upwards with
--param max-variable-expansions-in-unroller=n.  Our experiments show
that raising n to 4 starts to cause some minor degradations, which are
almost certainly due to pressure, so the default setting looks appropriate.
>
>> Do we want to adjust the flags documentation, saying whether this is enabled
>> by default depends on the target (or even list them)?
> Good idea, thanks.

OK, I'll update the docs and make the change that Segher requested. 
Thanks for the reviews!

Bill
>
>
> Segher

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] rs6000: Enable -fvariable-expansion-in-unroller by default
  2019-06-27 12:21     ` Bill Schmidt
@ 2019-07-01 10:30       ` Richard Biener
  2019-07-01 12:12         ` Segher Boessenkool
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2019-07-01 10:30 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: Segher Boessenkool, GCC Patches

On Thu, Jun 27, 2019 at 2:19 PM Bill Schmidt <wschmidt@linux.ibm.com> wrote:
>
> On 6/27/19 6:45 AM, Segher Boessenkool wrote:
> > On Thu, Jun 27, 2019 at 11:33:45AM +0200, Richard Biener wrote:
> >> On Thu, Jun 27, 2019 at 5:23 AM Bill Schmidt <wschmidt@linux.ibm.com> wrote:
> >>> We've done some experimenting and realized that the subject option almost
> >>> always provide improved performance for Power when the loop unroller is
> >>> enabled.  So this patch turns that flag on by default for us.
> >> I guess it creates more freedom for combine (more single-uses) and register
> >> allocation.  I wonder in which cases this might pessimize things?  I guess
> >> the pre-RA scheduler might make RAs life harder with creating overlapping
> >> life-ranges.
> >>
> >> I guess you didn't actually investigate the nature of the improvements you saw?
> > It breaks the length of dependency chains by a factor equal to the unroll
> > factor.  I do not know why this doesn't help a lot everywhere.  It of
> > course raises register pressure, maybe that is just it?
>
> Right, it's all about breaking dependencies to more efficiently exploit
> the microarchitecture.  By default, variable expansion in GCC is quite
> conservative, creating only two reduction streams out of one, so it's
> pretty rare for it to cause spill.  This can be adjusted upwards with
> --param max-variable-expansions-in-unroller=n.  Our experiments show
> that raising n to 4 starts to cause some minor degradations, which are
> almost certainly due to pressure, so the default setting looks appropriate.

But it's probably only an issue for targets which enable pre-RA scheduling
by default?  It might also increase RA compile-time (more allocnos).

Richard.

> >> Do we want to adjust the flags documentation, saying whether this is enabled
> >> by default depends on the target (or even list them)?
> > Good idea, thanks.
>
> OK, I'll update the docs and make the change that Segher requested.
> Thanks for the reviews!
>
> Bill
> >
> >
> > Segher
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] rs6000: Enable -fvariable-expansion-in-unroller by default
  2019-07-01 10:30       ` Richard Biener
@ 2019-07-01 12:12         ` Segher Boessenkool
  0 siblings, 0 replies; 7+ messages in thread
From: Segher Boessenkool @ 2019-07-01 12:12 UTC (permalink / raw)
  To: Richard Biener; +Cc: Bill Schmidt, GCC Patches

On Mon, Jul 01, 2019 at 12:30:41PM +0200, Richard Biener wrote:
> On Thu, Jun 27, 2019 at 2:19 PM Bill Schmidt <wschmidt@linux.ibm.com> wrote:
> >
> > On 6/27/19 6:45 AM, Segher Boessenkool wrote:
> > > On Thu, Jun 27, 2019 at 11:33:45AM +0200, Richard Biener wrote:
> > >> On Thu, Jun 27, 2019 at 5:23 AM Bill Schmidt <wschmidt@linux.ibm.com> wrote:
> > >>> We've done some experimenting and realized that the subject option almost
> > >>> always provide improved performance for Power when the loop unroller is
> > >>> enabled.  So this patch turns that flag on by default for us.
> > >> I guess it creates more freedom for combine (more single-uses) and register
> > >> allocation.  I wonder in which cases this might pessimize things?  I guess
> > >> the pre-RA scheduler might make RAs life harder with creating overlapping
> > >> life-ranges.
> > >>
> > >> I guess you didn't actually investigate the nature of the improvements you saw?
> > > It breaks the length of dependency chains by a factor equal to the unroll
> > > factor.  I do not know why this doesn't help a lot everywhere.  It of
> > > course raises register pressure, maybe that is just it?
> >
> > Right, it's all about breaking dependencies to more efficiently exploit
> > the microarchitecture.  By default, variable expansion in GCC is quite
> > conservative, creating only two reduction streams out of one, so it's
> > pretty rare for it to cause spill.  This can be adjusted upwards with
> > --param max-variable-expansions-in-unroller=n.  Our experiments show
> > that raising n to 4 starts to cause some minor degradations, which are
> > almost certainly due to pressure, so the default setting looks appropriate.
> 
> But it's probably only an issue for targets which enable pre-RA scheduling
> by default?  It might also increase RA compile-time (more allocnos).

It only helps super-scalar CPUs normally.  It doesn't increase register
use much at all, compared to only doing unrolling.  It helps scheduling
a lot, but I don't see where sched1 comes in here?

To see if it helps your arch, just try it out?


Segher

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-07-01 12:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-27  3:23 [PATCH] rs6000: Enable -fvariable-expansion-in-unroller by default Bill Schmidt
2019-06-27  9:34 ` Richard Biener
2019-06-27 11:45   ` Segher Boessenkool
2019-06-27 12:21     ` Bill Schmidt
2019-07-01 10:30       ` Richard Biener
2019-07-01 12:12         ` Segher Boessenkool
2019-06-27 11:32 ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).