public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* re: [AArch64][Spec2017]Question about mlow-precision-div optimization.
@ 2020-03-09  1:48 bule
  2020-03-09 12:28 ` Richard Sandiford
  0 siblings, 1 reply; 5+ messages in thread
From: bule @ 2020-03-09  1:48 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Wilco Dijkstra, gcc-help, Yangfei (Felix)

Hi

Thanks for the reply.
I am an engineer from Huawei Technologies Co.,Ltd. And my company has signed the copyright assignment.
My huawei email is: bule1@huawei.com
For those points you mentioned in the last email, I have changed accordingly and the attachment is the updated patch.

Thanks,
Bu Le

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: re: [AArch64][Spec2017]Question about mlow-precision-div optimization.
  2020-03-09  1:48 re: [AArch64][Spec2017]Question about mlow-precision-div optimization bule
@ 2020-03-09 12:28 ` Richard Sandiford
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Sandiford @ 2020-03-09 12:28 UTC (permalink / raw)
  To: bule; +Cc: Wilco Dijkstra, gcc-help, Yangfei (Felix)

Hi,

bule <bule1@huawei.com> writes:
> Thanks for the reply.
>
> I am an engineer from Huawei Technologies Co.,Ltd. And my company has signed
> the copyright assignment.
>
> My huawei email is: bule1@huawei.com

OK, great.

> diff -Nurp gcc-10.0/gcc/config/aarch64/aarch64.c gcc-10.0_opti/gcc/config/aarch64/aarch64.c
> --- gcc-10.0/gcc/config/aarch64/aarch64.c	2020-03-08 18:00:34.581798076 +0800
> +++ gcc-10.0_opti/gcc/config/aarch64/aarch64.c	2020-03-08 17:36:15.400515481 +0800
> @@ -12854,10 +12854,10 @@ aarch64_emit_approx_div (rtx quo, rtx nu
>    /* Iterate over the series twice for SF and thrice for DF.  */
>    int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2;
>  
> -  /* Optionally iterate over the series once less for faster performance,
> -     while sacrificing the accuracy.  */
> +  /* Optionally iterate over the series less for faster performance,
> +     while sacrificing the accuracy. The default is 2 for DF and 1 for SF.  */
>    if (flag_mlow_precision_div)
> -    iterations--;
> +    iterations = aarch64_double_recp_precision : aarch64_float_recp_precision;

This is missing the "GET_MODE_INNER (mode) == DFmode ?" part of the condition.
Adding that will take it over the 80-char limit, so it should be formatted as:

    iterations = (GET_MODE_INNER (mode) == DFmode
		  ? aarch64_double_recp_precision
		  : aarch64_float_recp_precision);

Looks good otherwise.

Could you try doing a bootstrap with that change and seeing if it
still works for your use case?  If so, could you post the final patch?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: re: [AArch64][Spec2017]Question about mlow-precision-div optimization.
@ 2020-03-13 10:06 bule
  0 siblings, 0 replies; 5+ messages in thread
From: bule @ 2020-03-13 10:06 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Yangfei (Felix), Wilco Dijkstra, gcc-help

Sure, thanks a lot for the comments and all these fix-ups.
I have closed the PR.

Sorry for the typos.

Regards,
Bu Le

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: re: [AArch64][Spec2017]Question about mlow-precision-div optimization.
  2020-03-12 12:22 bule
@ 2020-03-13  9:30 ` Richard Sandiford
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Sandiford @ 2020-03-13  9:30 UTC (permalink / raw)
  To: bule; +Cc: Yangfei (Felix), Wilco Dijkstra, gcc-help

bule <bule1@huawei.com> writes:
> Hi,
>
>> Could you try doing a bootstrap with that change and seeing if it still works for your use case?  If so, could you post the final patch?
>
> The bootstrap test and regression dejagnu test has been done on an aarch64 Linux platform for the patch attached.
> The regression test for gcc will have 4 extra expected passes automatically due to the newly added 2 parameters. No new regression witnessed.
> Newly added test cases are: 
> 	PASS: gcc.dg/params/blocksort-part.c -O3 --param aarch64-double-recp-precision=1 (test for excess errors)
> 	PASS: gcc.dg/params/blocksort-part.c -O3 --param aarch64-double-recp-precision=5 (test for excess errors)
> 	PASS: gcc.dg/params/blocksort-part.c -O3 --param aarch64-float-recp-precision=1 (test for excess errors)
> 	PASS: gcc.dg/params/blocksort-part.c -O3 --param aarch64-float-recp-precision=5 (test for excess errors)
>
> gcc/config/aarch64/:
> +2020-03-12  Bu Le  <bule1@huawei.com>
> +
> +       PR target/94154
> +       * aarch64.c (aarch64_emit_approx_div): Add new parameters
> +         Add two parameters to control the precision of the reciprocal division.
> +
> +       * aarch64.opt : Declare new parameters.     
>
> gcc/doc:
> +2020-03-12  Bu Le  <bule1@huawei.com>
> +
> +       PR target/94154
> +       * invoke.texi: New parameters added for reciprocal division precision.
> +
>
> please help commit this if it's OK to go

Thanks, looks good.  Pushed to master with the minor changes below:

The changelog entries are relative to the innermost directory that
contains a ChangeLog file, so changes to gcc/doc and gcc/config/aarch64
have entries relative to gcc/.  I used the following changelog for the
commit:

2020-03-13  Bu Le  <bule1@huawei.com>

        PR target/94154
        * config/aarch64/aarch64.opt (-param=aarch64-float-recp-precision=)
        (-param=aarch64-double-recp-precision=): New options.
        * doc/invoke.texi: Document them.
        * config/aarch64/aarch64.c (aarch64_emit_approx_div): Use them
        instead of hard-coding the choice of 1 for float and 2 for double.

In:

> +    iterations = (GET_MODE_INNER (mode) == DFmode)
> +				   ? aarch64_double_recp_precision
> +				   : aarch64_float_recp_precision;

the formatting should be:

  iterations = (GET_MODE_INNER (mode) == DFmode
                ? aarch64_double_recp_precision
                : aarch64_float_recp_precision);

with "?" and ":" indented below the character after the opening "(".
(GCC has quite strict rules about this kind of thing, sorry.)

In:

> +@item aarch64-float-recp-precision
> +The number of Newton iterations for calculating the reciprocal for float type.
> +The precision of division is propotional to this param
> +when divisionapproximation is enabled.
> +The default value is 1.
> +
> +@item aarch64-float-recp-precision
> +The number of Newton iterations for calculating the reciprocal for double type.
> +The precision of division is propotional to this param
> +when divisionapproximation is enabled.
> +The default value is 2.

there were a couple of typos:
- s/propotional/proportional/
- a missing space after "division"

Also, the second param should be "double" rather than "float".

Thanks,
Richard


^ permalink raw reply	[flat|nested] 5+ messages in thread

* re: re: [AArch64][Spec2017]Question about mlow-precision-div optimization.
@ 2020-03-12 12:22 bule
  2020-03-13  9:30 ` Richard Sandiford
  0 siblings, 1 reply; 5+ messages in thread
From: bule @ 2020-03-12 12:22 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Yangfei (Felix), Wilco Dijkstra, gcc-help

[-- Attachment #1: Type: text/plain, Size: 1393 bytes --]

Hi,

> Could you try doing a bootstrap with that change and seeing if it still works for your use case?  If so, could you post the final patch?

The bootstrap test and regression dejagnu test has been done on an aarch64 Linux platform for the patch attached.
The regression test for gcc will have 4 extra expected passes automatically due to the newly added 2 parameters. No new regression witnessed.
Newly added test cases are: 
	PASS: gcc.dg/params/blocksort-part.c -O3 --param aarch64-double-recp-precision=1 (test for excess errors)
	PASS: gcc.dg/params/blocksort-part.c -O3 --param aarch64-double-recp-precision=5 (test for excess errors)
	PASS: gcc.dg/params/blocksort-part.c -O3 --param aarch64-float-recp-precision=1 (test for excess errors)
	PASS: gcc.dg/params/blocksort-part.c -O3 --param aarch64-float-recp-precision=5 (test for excess errors)

gcc/config/aarch64/:
+2020-03-12  Bu Le  <bule1@huawei.com>
+
+       PR target/94154
+       * aarch64.c (aarch64_emit_approx_div): Add new parameters
+         Add two parameters to control the precision of the reciprocal division.
+
+       * aarch64.opt : Declare new parameters.     

gcc/doc:
+2020-03-12  Bu Le  <bule1@huawei.com>
+
+       PR target/94154
+       * invoke.texi: New parameters added for reciprocal division precision.
+

please help commit this if it's OK to go

Thanks,
Bu Le


[-- Attachment #2: div-param2.patch --]
[-- Type: application/octet-stream, Size: 3167 bytes --]

diff -Nurp a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
--- a/gcc/config/aarch64/aarch64.c	2020-03-08 18:00:34.581798076 +0800
+++ b/gcc/config/aarch64/aarch64.c	2020-03-10 11:06:58.946181675 +0800
@@ -12854,10 +12854,12 @@ aarch64_emit_approx_div (rtx quo, rtx nu
   /* Iterate over the series twice for SF and thrice for DF.  */
   int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2;
 
-  /* Optionally iterate over the series once less for faster performance,
-     while sacrificing the accuracy.  */
+  /* Optionally iterate over the series less for faster performance,
+     while sacrificing the accuracy.  The default is 2 for DF and 1 for SF.  */
   if (flag_mlow_precision_div)
-    iterations--;
+    iterations = (GET_MODE_INNER (mode) == DFmode)
+				   ? aarch64_double_recp_precision
+				   : aarch64_float_recp_precision;
 
   /* Iterate over the series to calculate the approximate reciprocal.  */
   rtx xtmp = gen_reg_rtx (mode);
diff -Nurp a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
--- a/gcc/config/aarch64/aarch64.opt	2020-03-08 18:00:34.589802076 +0800
+++ b/gcc/config/aarch64/aarch64.opt	2020-03-08 17:52:37.556515481 +0800
@@ -262,3 +262,12 @@ Generate local calls to out-of-line atom
 -param=aarch64-sve-compare-costs=
 Target Joined UInteger Var(aarch64_sve_compare_costs) Init(1) IntegerRange(0, 1) Param
 When vectorizing for SVE, consider using unpacked vectors for smaller elements and use the cost model to pick the cheapest approach.  Also use the cost model to choose between SVE and Advanced SIMD vectorization.
+
+-param=aarch64-float-recp-precision=
+Target Joined UInteger Var(aarch64_float_recp_precision) Init(1) IntegerRange(1, 5) Param
+The number of Newton iterations for calculating the reciprocal for float type.  The precision of division is propotional to this param when division approximation is enabled.  The default value is 1.
+
+-param=aarch64-double-recp-precision=
+Target Joined UInteger Var(aarch64_double_recp_precision) Init(2) IntegerRange(1, 5) Param
+The number of Newton iterations for calculating the reciprocal for double type.  The precision of division is proportional to this param when division approximation is enabled.  The default value is 2.
+
diff -Nurp a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
--- a/gcc/doc/invoke.texi	2020-03-08 18:00:32.216616076 +0800
+++ b/gcc/doc/invoke.texi	2020-03-10 11:49:55.366032646 +0800
@@ -13095,6 +13095,19 @@ Also use the cost model to choose betwee
 Using unpacked vectors includes storing smaller elements in larger
 containers and accessing elements with extending loads and truncating
 stores.
+
+@item aarch64-float-recp-precision
+The number of Newton iterations for calculating the reciprocal for float type.
+The precision of division is propotional to this param
+when divisionapproximation is enabled.
+The default value is 1.
+
+@item aarch64-float-recp-precision
+The number of Newton iterations for calculating the reciprocal for double type.
+The precision of division is propotional to this param
+when divisionapproximation is enabled.
+The default value is 2.
+
 @end table
 
 @end table

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-03-13 10:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-09  1:48 re: [AArch64][Spec2017]Question about mlow-precision-div optimization bule
2020-03-09 12:28 ` Richard Sandiford
2020-03-12 12:22 bule
2020-03-13  9:30 ` Richard Sandiford
2020-03-13 10:06 bule

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).