[gcc(refs/vendors/ibm/heads/perf)] aarch64: Add --params to control the number of recip steps [PR94154]

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

* [gcc(refs/vendors/ibm/heads/perf)] aarch64: Add --params to control the number of recip steps [PR94154]
@ 2020-03-19  6:17 Jiu Fu Guo
  0 siblings, 0 replies; only message in thread
From: Jiu Fu Guo @ 2020-03-19  6:17 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:dbf3dc75888623e9d4bb7cc5e9c30caa9b24ffe7

commit dbf3dc75888623e9d4bb7cc5e9c30caa9b24ffe7
Author: Bu Le <bule1@huawei.com>
Date:   Thu Mar 12 22:39:12 2020 +0000

    aarch64: Add --params to control the number of recip steps [PR94154]
    
    -mlow-precision-div hard-coded the number of iterations to 2 for double
    and 1 for float.  This patch adds a --param to control the number.
    
    2020-03-13  Bu Le  <bule1@huawei.com>
    
    gcc/
            PR target/94154
            * config/aarch64/aarch64.opt (-param=aarch64-float-recp-precision=)
            (-param=aarch64-double-recp-precision=): New options.
            * doc/invoke.texi: Document them.
            * config/aarch64/aarch64.c (aarch64_emit_approx_div): Use them
            instead of hard-coding the choice of 1 for float and 2 for double.

Diff:
---
 gcc/ChangeLog                  |  9 +++++++++
 gcc/config/aarch64/aarch64.c   |  8 +++++---
 gcc/config/aarch64/aarch64.opt |  9 +++++++++
 gcc/doc/invoke.texi            | 11 +++++++++++
 4 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 61982897e11..ac8940a25f7 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2020-03-13  Bu Le  <bule1@huawei.com>
+
+	PR target/94154
+	* config/aarch64/aarch64.opt (-param=aarch64-float-recp-precision=)
+	(-param=aarch64-double-recp-precision=): New options.
+	* doc/invoke.texi: Document them.
+	* config/aarch64/aarch64.c (aarch64_emit_approx_div): Use them
+	instead of hard-coding the choice of 1 for float and 2 for double.
+
 2019-03-13  Eric Botcazou  <ebotcazou@adacore.com>
 
 	PR rtl-optimization/94119
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c320d5ba51d..2c81f86dd2a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12911,10 +12911,12 @@ aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
   /* Iterate over the series twice for SF and thrice for DF.  */
   int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2;
 
-  /* Optionally iterate over the series once less for faster performance,
-     while sacrificing the accuracy.  */
+  /* Optionally iterate over the series less for faster performance,
+     while sacrificing the accuracy.  The default is 2 for DF and 1 for SF.  */
   if (flag_mlow_precision_div)
-    iterations--;
+    iterations = (GET_MODE_INNER (mode) == DFmode
+		  ? aarch64_double_recp_precision
+		  : aarch64_float_recp_precision);
 
   /* Iterate over the series to calculate the approximate reciprocal.  */
   rtx xtmp = gen_reg_rtx (mode);
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 77df0b77f8c..37181b5baca 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -262,3 +262,12 @@ Generate local calls to out-of-line atomic operations.
 -param=aarch64-sve-compare-costs=
 Target Joined UInteger Var(aarch64_sve_compare_costs) Init(1) IntegerRange(0, 1) Param
 When vectorizing for SVE, consider using unpacked vectors for smaller elements and use the cost model to pick the cheapest approach.  Also use the cost model to choose between SVE and Advanced SIMD vectorization.
+
+-param=aarch64-float-recp-precision=
+Target Joined UInteger Var(aarch64_float_recp_precision) Init(1) IntegerRange(1, 5) Param
+The number of Newton iterations for calculating the reciprocal for float type.  The precision of division is proportional to this param when division approximation is enabled.  The default value is 1.
+
+-param=aarch64-double-recp-precision=
+Target Joined UInteger Var(aarch64_double_recp_precision) Init(2) IntegerRange(1, 5) Param
+The number of Newton iterations for calculating the reciprocal for double type.  The precision of division is proportional to this param when division approximation is enabled.  The default value is 2.
+
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index af28015234c..96a95162696 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13179,6 +13179,17 @@ Also use the cost model to choose between SVE and Advanced SIMD vectorization.
 Using unpacked vectors includes storing smaller elements in larger
 containers and accessing elements with extending loads and truncating
 stores.
+
+@item aarch64-float-recp-precision
+The number of Newton iterations for calculating the reciprocal for float type.
+The precision of division is proportional to this param when division
+approximation is enabled.  The default value is 1.
+
+@item aarch64-double-recp-precision
+The number of Newton iterations for calculating the reciprocal for double type.
+The precision of division is propotional to this param when division
+approximation is enabled.  The default value is 2.
+
 @end table
 
 @end table


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2020-03-19  6:17 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-19  6:17 [gcc(refs/vendors/ibm/heads/perf)] aarch64: Add --params to control the number of recip steps [PR94154] Jiu Fu Guo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).