public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][AArch64] PR84114: Avoid reassociating FMA
@ 2018-02-22 11:38 Wilco Dijkstra
  2018-02-26 22:25 ` James Greenhalgh
  0 siblings, 1 reply; 5+ messages in thread
From: Wilco Dijkstra @ 2018-02-22 11:38 UTC (permalink / raw)
  To: GCC Patches; +Cc: nd, sje

As discussed in the PR, the reassociation phase runs before FMAs are formed
and so can significantly reduce FMA opportunities.  Although reassociation
could be switched off, it helps in many cases, so a better alternative is to
only avoid reassociation of floating point additions.  This fixes the testcase
and gives 1% speedup on SPECFP2017, fixing the performance regression.

OK for commit?

ChangeLog:
2018-02-23  Wilco Dijkstra  <wdijkstr@arm.com>

	PR tree-optimization/84114
	* config/aarch64/aarch64.c (aarch64_reassociation_width)
	Avoid reassociation of FLOAT_MODE addition.
--

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b3d5fde171920e5759046a4bd61cfcf9eb78d7dd..5f9541cf700aaf18c1f1ac73054614e2932781e4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1109,15 +1109,16 @@ aarch64_min_divisions_for_recip_mul (machine_mode mode)
   return aarch64_tune_params.min_div_recip_mul_df;
 }
 
+/* Return the reassociation width of treeop OPC with mode MODE.  */
 static int
-aarch64_reassociation_width (unsigned opc ATTRIBUTE_UNUSED,
-			     machine_mode mode)
+aarch64_reassociation_width (unsigned opc, machine_mode mode)
 {
   if (VECTOR_MODE_P (mode))
     return aarch64_tune_params.vec_reassoc_width;
   if (INTEGRAL_MODE_P (mode))
     return aarch64_tune_params.int_reassoc_width;
-  if (FLOAT_MODE_P (mode))
+  /* Avoid reassociating floating point addition so we emit more FMAs.  */
+  if (FLOAT_MODE_P (mode) && opc != PLUS_EXPR)
     return aarch64_tune_params.fp_reassoc_width;
   return 1;
 }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][AArch64] PR84114: Avoid reassociating FMA
  2018-02-22 11:38 [PATCH][AArch64] PR84114: Avoid reassociating FMA Wilco Dijkstra
@ 2018-02-26 22:25 ` James Greenhalgh
  2018-02-27 13:19   ` Richard Biener
  0 siblings, 1 reply; 5+ messages in thread
From: James Greenhalgh @ 2018-02-26 22:25 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: GCC Patches, nd, sje

On Thu, Feb 22, 2018 at 11:38:03AM +0000, Wilco Dijkstra wrote:
> As discussed in the PR, the reassociation phase runs before FMAs are formed
> and so can significantly reduce FMA opportunities.  Although reassociation
> could be switched off, it helps in many cases, so a better alternative is to
> only avoid reassociation of floating point additions.  This fixes the testcase
> and gives 1% speedup on SPECFP2017, fixing the performance regression.
> 
> OK for commit?

This is OK as a fairly safe fix for stage 4. We should fix reassociation
properly in GCC 9.

Thanks,
James

> 
> ChangeLog:
> 2018-02-23  Wilco Dijkstra  <wdijkstr@arm.com>
> 
> 	PR tree-optimization/84114
> 	* config/aarch64/aarch64.c (aarch64_reassociation_width)
> 	Avoid reassociation of FLOAT_MODE addition.
> --
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index b3d5fde171920e5759046a4bd61cfcf9eb78d7dd..5f9541cf700aaf18c1f1ac73054614e2932781e4 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1109,15 +1109,16 @@ aarch64_min_divisions_for_recip_mul (machine_mode mode)
>    return aarch64_tune_params.min_div_recip_mul_df;
>  }
>  
> +/* Return the reassociation width of treeop OPC with mode MODE.  */
>  static int
> -aarch64_reassociation_width (unsigned opc ATTRIBUTE_UNUSED,
> -			     machine_mode mode)
> +aarch64_reassociation_width (unsigned opc, machine_mode mode)
>  {
>    if (VECTOR_MODE_P (mode))
>      return aarch64_tune_params.vec_reassoc_width;
>    if (INTEGRAL_MODE_P (mode))
>      return aarch64_tune_params.int_reassoc_width;
> -  if (FLOAT_MODE_P (mode))
> +  /* Avoid reassociating floating point addition so we emit more FMAs.  */
> +  if (FLOAT_MODE_P (mode) && opc != PLUS_EXPR)
>      return aarch64_tune_params.fp_reassoc_width;
>    return 1;
>  }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][AArch64] PR84114: Avoid reassociating FMA
  2018-02-26 22:25 ` James Greenhalgh
@ 2018-02-27 13:19   ` Richard Biener
  2018-02-27 14:21     ` Wilco Dijkstra
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Biener @ 2018-02-27 13:19 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: Wilco Dijkstra, GCC Patches, nd, sje

On Mon, Feb 26, 2018 at 11:25 PM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
> On Thu, Feb 22, 2018 at 11:38:03AM +0000, Wilco Dijkstra wrote:
>> As discussed in the PR, the reassociation phase runs before FMAs are formed
>> and so can significantly reduce FMA opportunities.  Although reassociation
>> could be switched off, it helps in many cases, so a better alternative is to
>> only avoid reassociation of floating point additions.  This fixes the testcase
>> and gives 1% speedup on SPECFP2017, fixing the performance regression.
>>
>> OK for commit?
>
> This is OK as a fairly safe fix for stage 4. We should fix reassociation
> properly in GCC 9.

It happens that on some targets doing two FMAs in parallel and one
non-FMA operation merging them is faster than chaining three FMAs...

But yes, somewhere I suggested that FMA detection should/could be
integrated with reassociation.

Richard.

> Thanks,
> James
>
>>
>> ChangeLog:
>> 2018-02-23  Wilco Dijkstra  <wdijkstr@arm.com>
>>
>>       PR tree-optimization/84114
>>       * config/aarch64/aarch64.c (aarch64_reassociation_width)
>>       Avoid reassociation of FLOAT_MODE addition.
>> --
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index b3d5fde171920e5759046a4bd61cfcf9eb78d7dd..5f9541cf700aaf18c1f1ac73054614e2932781e4 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -1109,15 +1109,16 @@ aarch64_min_divisions_for_recip_mul (machine_mode mode)
>>    return aarch64_tune_params.min_div_recip_mul_df;
>>  }
>>
>> +/* Return the reassociation width of treeop OPC with mode MODE.  */
>>  static int
>> -aarch64_reassociation_width (unsigned opc ATTRIBUTE_UNUSED,
>> -                          machine_mode mode)
>> +aarch64_reassociation_width (unsigned opc, machine_mode mode)
>>  {
>>    if (VECTOR_MODE_P (mode))
>>      return aarch64_tune_params.vec_reassoc_width;
>>    if (INTEGRAL_MODE_P (mode))
>>      return aarch64_tune_params.int_reassoc_width;
>> -  if (FLOAT_MODE_P (mode))
>> +  /* Avoid reassociating floating point addition so we emit more FMAs.  */
>> +  if (FLOAT_MODE_P (mode) && opc != PLUS_EXPR)
>>      return aarch64_tune_params.fp_reassoc_width;
>>    return 1;
>>  }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][AArch64] PR84114: Avoid reassociating FMA
  2018-02-27 13:19   ` Richard Biener
@ 2018-02-27 14:21     ` Wilco Dijkstra
  2018-02-27 18:31       ` Aaron Sawdey
  0 siblings, 1 reply; 5+ messages in thread
From: Wilco Dijkstra @ 2018-02-27 14:21 UTC (permalink / raw)
  To: Richard Biener, James Greenhalgh; +Cc: GCC Patches, nd, sje

Richard Biener <richard.guenther@gmail.com>

> It happens that on some targets doing two FMAs in parallel and one
> non-FMA operation merging them is faster than chaining three FMAs...

Like I mentioned in the PR, long chains should be broken, but for that we need a new parameter to state how long a chain may be before it is split. The issue today is that it splits even very short chains, removing beneficial FMAs.

> But yes, somewhere I suggested that FMA detection should/could be
> integrated with reassociation.

Absolutely.

Wilco

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][AArch64] PR84114: Avoid reassociating FMA
  2018-02-27 14:21     ` Wilco Dijkstra
@ 2018-02-27 18:31       ` Aaron Sawdey
  0 siblings, 0 replies; 5+ messages in thread
From: Aaron Sawdey @ 2018-02-27 18:31 UTC (permalink / raw)
  To: Wilco Dijkstra, Richard Biener, James Greenhalgh; +Cc: GCC Patches, nd, sje

On Tue, 2018-02-27 at 14:21 +0000, Wilco Dijkstra wrote:
> Richard Biener <richard.guenther@gmail.com>
> 
> > It happens that on some targets doing two FMAs in parallel and one
> > non-FMA operation merging them is faster than chaining three
> > FMAs...
> 
> Like I mentioned in the PR, long chains should be broken, but for
> that we need a new parameter to state how long a chain may be before
> it is split. The issue today is that it splits even very short
> chains, removing beneficial FMAs.
> 
> > But yes, somewhere I suggested that FMA detection should/could be
> > integrated with reassociation.

I'd also like to see some work here. 

Doing two FMA in parallel and then a non-FMA merge is faster on ppc,
but it would be nice if the target had some more control of exactly how
this happens.

Also doing parallel reassociation increases register pressure so it
would be nice to be able to avoid causing issues as a result of that.

-- 
Aaron Sawdey, Ph.D.  acsawdey@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-02-27 18:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-22 11:38 [PATCH][AArch64] PR84114: Avoid reassociating FMA Wilco Dijkstra
2018-02-26 22:25 ` James Greenhalgh
2018-02-27 13:19   ` Richard Biener
2018-02-27 14:21     ` Wilco Dijkstra
2018-02-27 18:31       ` Aaron Sawdey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).