[PATCH][AArch64] Change FP reassociation width

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH][AArch64] Change FP reassociation width
@ 2017-06-12 10:50 Wilco Dijkstra
  2017-06-13  9:07 ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 4+ messages in thread
From: Wilco Dijkstra @ 2017-06-12 10:50 UTC (permalink / raw)
  To: GCC Patches; +Cc: nd, James Greenhalgh

Currently the FP reassociation width is set to 4 on AArch64.  On recent
GCCs this has become more aggressive in splitting expressions.  This means
many FMAs are split into FMUL and FADD.  The reassociation increases register
pressure, in some benchmarks so much that inner loops start to spill.
This results in larger, slower code.  Benchmarking FP reassociation width=1
showed a ~0.5% gain on SPECFP2006 and similar gains on other benchmarks,
so change it to 1.

Passes regress & bootstrap, OK for commit?

ChangeLog:
2017-06-12  Wilco Dijkstra  <wdijkstr@arm.com>

	* gcc/config/aarch64/aarch64.c
	(generic_tuning): Set fp_reassoc_width to 1.
	(cortexa35_tunings): Likewise.
	(cortexa53_tunings): Likewise.
	(cortexa57_tunings): Likewise.
	(cortexa72_tunings): Likewise.
	(cortexa73_tunings): Likewise.
--

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 72a758642743025ac8974c8f7ad4c44c31a474d5..0998bf37b2abf399277d2f2a539295506085209f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -541,7 +541,7 @@ static const struct tune_params generic_tunings =
   4,	/* jump_align.  */
   8,	/* loop_align.  */
   2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
+  1,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
@@ -567,7 +567,7 @@ static const struct tune_params cortexa35_tunings =
   8,	/* jump_align.  */
   8,	/* loop_align.  */
   2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
+  1,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
@@ -593,7 +593,7 @@ static const struct tune_params cortexa53_tunings =
   8,	/* jump_align.  */
   8,	/* loop_align.  */
   2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
+  1,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
@@ -619,7 +619,7 @@ static const struct tune_params cortexa57_tunings =
   8,	/* jump_align.  */
   8,	/* loop_align.  */
   2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
+  1,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
@@ -645,7 +645,7 @@ static const struct tune_params cortexa72_tunings =
   8,	/* jump_align.  */
   8,	/* loop_align.  */
   2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
+  1,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
@@ -671,7 +671,7 @@ static const struct tune_params cortexa73_tunings =
   8,	/* jump_align.  */
   8,	/* loop_align.  */
   2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
+  1,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH][AArch64] Change FP reassociation width
  2017-06-12 10:50 [PATCH][AArch64] Change FP reassociation width Wilco Dijkstra
@ 2017-06-13  9:07 ` Richard Earnshaw (lists)
  2017-06-13  9:43   ` Wilco Dijkstra
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Earnshaw (lists) @ 2017-06-13  9:07 UTC (permalink / raw)
  To: Wilco Dijkstra, GCC Patches; +Cc: nd, James Greenhalgh

On 12/06/17 11:50, Wilco Dijkstra wrote:
> Currently the FP reassociation width is set to 4 on AArch64.  On recent
> GCCs this has become more aggressive in splitting expressions.  This means
> many FMAs are split into FMUL and FADD.  The reassociation increases register
> pressure, in some benchmarks so much that inner loops start to spill.
> This results in larger, slower code.  Benchmarking FP reassociation width=1
> showed a ~0.5% gain on SPECFP2006 and similar gains on other benchmarks,
> so change it to 1.
> 

Why 1 and not 2?  Many processors have 2 fp pipes and forcing this down
to a sequential stream is not obviously the right thing.

If reassociation is is causing excess spilling, then the right fix for
that is to look at the pressure model, not hammer the problem away.

R.

> Passes regress & bootstrap, OK for commit?
> 
> ChangeLog:
> 2017-06-12  Wilco Dijkstra  <wdijkstr@arm.com>
> 
> 	* gcc/config/aarch64/aarch64.c
> 	(generic_tuning): Set fp_reassoc_width to 1.
> 	(cortexa35_tunings): Likewise.
> 	(cortexa53_tunings): Likewise.
> 	(cortexa57_tunings): Likewise.
> 	(cortexa72_tunings): Likewise.
> 	(cortexa73_tunings): Likewise.
> --
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 72a758642743025ac8974c8f7ad4c44c31a474d5..0998bf37b2abf399277d2f2a539295506085209f 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -541,7 +541,7 @@ static const struct tune_params generic_tunings =
>    4,	/* jump_align.  */
>    8,	/* loop_align.  */
>    2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> +  1,	/* fp_reassoc_width.  */
>    1,	/* vec_reassoc_width.  */
>    2,	/* min_div_recip_mul_sf.  */
>    2,	/* min_div_recip_mul_df.  */
> @@ -567,7 +567,7 @@ static const struct tune_params cortexa35_tunings =
>    8,	/* jump_align.  */
>    8,	/* loop_align.  */
>    2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> +  1,	/* fp_reassoc_width.  */
>    1,	/* vec_reassoc_width.  */
>    2,	/* min_div_recip_mul_sf.  */
>    2,	/* min_div_recip_mul_df.  */
> @@ -593,7 +593,7 @@ static const struct tune_params cortexa53_tunings =
>    8,	/* jump_align.  */
>    8,	/* loop_align.  */
>    2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> +  1,	/* fp_reassoc_width.  */
>    1,	/* vec_reassoc_width.  */
>    2,	/* min_div_recip_mul_sf.  */
>    2,	/* min_div_recip_mul_df.  */
> @@ -619,7 +619,7 @@ static const struct tune_params cortexa57_tunings =
>    8,	/* jump_align.  */
>    8,	/* loop_align.  */
>    2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> +  1,	/* fp_reassoc_width.  */
>    1,	/* vec_reassoc_width.  */
>    2,	/* min_div_recip_mul_sf.  */
>    2,	/* min_div_recip_mul_df.  */
> @@ -645,7 +645,7 @@ static const struct tune_params cortexa72_tunings =
>    8,	/* jump_align.  */
>    8,	/* loop_align.  */
>    2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> +  1,	/* fp_reassoc_width.  */
>    1,	/* vec_reassoc_width.  */
>    2,	/* min_div_recip_mul_sf.  */
>    2,	/* min_div_recip_mul_df.  */
> @@ -671,7 +671,7 @@ static const struct tune_params cortexa73_tunings =
>    8,	/* jump_align.  */
>    8,	/* loop_align.  */
>    2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> +  1,	/* fp_reassoc_width.  */
>    1,	/* vec_reassoc_width.  */
>    2,	/* min_div_recip_mul_sf.  */
>    2,	/* min_div_recip_mul_df.  */
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH][AArch64] Change FP reassociation width
  2017-06-13  9:07 ` Richard Earnshaw (lists)
@ 2017-06-13  9:43   ` Wilco Dijkstra
  2017-06-14 14:30     ` James Greenhalgh
  0 siblings, 1 reply; 4+ messages in thread
From: Wilco Dijkstra @ 2017-06-13  9:43 UTC (permalink / raw)
  To: Richard Earnshaw, GCC Patches; +Cc: nd, James Greenhalgh

Richard Earnshaw (lists) wrote:
> 
> Why 1 and not 2?  Many processors have 2 fp pipes and forcing this down
> to a sequential stream is not obviously the right thing.

1 was faster than 2. Like I said, the reassociation is too aggressive and even
splits multiply-add rather than keeping them. Until reassociation is fixed and
able to split a complex expression into 2 independent chains of similar depth
while keeping multiply-accumulate operations, it is best to set it to 1 for now.

> If reassociation is is causing excess spilling, then the right fix for
> that is to look at the pressure model, not hammer the problem away.

The problem of the GCC scheduler hugely increasing register pressure has
existed for many years with no progress being made. Unless we're willing to
start a project improving this I do not believe there will be a solution any time
soon. So changing the association width is the best solution for the time being.

Wilco

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH][AArch64] Change FP reassociation width
  2017-06-13  9:43   ` Wilco Dijkstra
@ 2017-06-14 14:30     ` James Greenhalgh
  0 siblings, 0 replies; 4+ messages in thread
From: James Greenhalgh @ 2017-06-14 14:30 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: Richard Earnshaw, GCC Patches, nd

On Tue, Jun 13, 2017 at 10:43:05AM +0100, Wilco Dijkstra wrote:
> Richard Earnshaw (lists) wrote:
> > 
> > Why 1 and not 2?Â  Many processors have 2 fp pipes and forcing this down
> > to a sequential stream is not obviously the right thing.
> 
> 1 was faster than 2. Like I said, the reassociation is too aggressive and even
> splits multiply-add rather than keeping them. Until reassociation is fixed and
> able to split a complex expression into 2 independent chains of similar depth
> while keeping multiply-accumulate operations, it is best to set it to 1 for now.
> 
> > If reassociation is is causing excess spilling, then the right fix for
> > that is to look at the pressure model, not hammer the problem away.
> 
> The problem of the GCC scheduler hugely increasing register pressure has
> existed for many years with no progress being made. Unless we're willing to
> start a project improving this I do not believe there will be a solution any time
> soon. So changing the association width is the best solution for the time being.

Are there bugs with testcases open in bugzilla for this? Was the regression
in "more recent" GCCs that are more aggressive at resassociation noted
somewhere?

It is hard to take this fix with the justification given, and no idea of
what compiler behaviour we're masking with this patch. I don't think I can
approve it with what is presented here, particularly so for "generic".
Richard's point of why "1" rather than "2" also needs a better answer than
given.

Thanks,
James

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-06-14 14:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-12 10:50 [PATCH][AArch64] Change FP reassociation width Wilco Dijkstra
2017-06-13  9:07 ` Richard Earnshaw (lists)
2017-06-13  9:43   ` Wilco Dijkstra
2017-06-14 14:30     ` James Greenhalgh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).