public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Evandro Menezes <e.menezes@samsung.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>,
	Marcus Shawcroft <Marcus.Shawcroft@arm.com>,
	James Greenhalgh <james.greenhalgh@arm.com>,
	Andrew Pinski <pinskia@gmail.com>,
	Benedikt Huber <benedikt.huber@theobroma-systems.com>,
	philipp.tomsich@theobroma-systems.com,
	Kyrill Tkachov <kyrylo.tkachov@arm.com>
Subject: Re: [AArch64] Emit square root using the Newton series
Date: Tue, 08 Mar 2016 22:08:00 -0000	[thread overview]
Message-ID: <56DF4D50.4060804@samsung.com> (raw)
In-Reply-To: <56D8D553.6060902@samsung.com>

[-- Attachment #1: Type: text/plain, Size: 2836 bytes --]

On 02/16/16 14:56, Evandro Menezes wrote:
> On 12/08/15 15:35, Evandro Menezes wrote:
>> Emit square root using the Newton series
>>
>>    2015-12-03  Evandro Menezes  <e.menezes@samsung.com>
>>
>>    gcc/
>>             * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
>>    Declare new
>>             function.
>>             * config/aarch64/aarch64-simd.md (sqrt<mode>2): New
>>    expansion and
>>             insn definitions.
>>             * config/aarch64/aarch64-tuning-flags.def
>>             (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
>>             * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
>>    new function.
>>             * config/aarch64/aarch64.md (sqrt<mode>2): New expansion
>>    and insn
>>             definitions.
>>             * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
>>    Expand option
>>             description.
>>             * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.
>>
>> This patch extends the patch that added support for implementing 
>> x^-1/2 using the Newton series by adding support for x^1/2 as well.
>>
>> Is it OK at this point of stage 3?
>>
>> Thank you,
>>
>
> James,
>
> As I was saying, this patch results in some validation errors in 
> CPU2000 benchmarks using DF.  Although proving the algorithm to be 
> pretty solid with a vast set of random values, I'm confused why some 
> benchmarks fail to validate with this implementation of the Newton 
> series for square root too, when they pass with the Newton series for 
> reciprocal square root.
>
> Since I had no problems with the same algorithm on x86-64, I wonder if 
> the initial estimate on AArch64, which offers just 8 bits, whereas 
> x86-64 offers 11 bits, has to do with it.  Then again, the algorithm 
> iterated 1 less time on x86-64 than on AArch64.
>
> Since it seems that the initial estimate is sufficient for CPU2000 to 
> validate when using SF, I'm leaning towards restricting the Newton 
> series for square root only for SF.
>
> Your thoughts on the matter are appreciated,

         Add choices for the reciprocal square root approximation

         Allow a target to prefer such operation depending on the FP
    precision.

         gcc/
             * config/aarch64/aarch64-protos.h
             (AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro.
             * config/aarch64/aarch64-tuning-flags.def
             (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask.
             (AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise.
             * config/aarch64/aarch64.c
             (use_rsqrt_p): New argument for the mode.
             (aarch64_builtin_reciprocal): Devise mode from builtin.
             (aarch64_optab_supported_p): New argument for the mode.


Now that the patch is attached, feedback is appreciated.

Thank you,


-- 
Evandro Menezes


[-- Attachment #2: 0001-Add-choices-for-the-reciprocal-square-root-approxima.patch --]
[-- Type: text/x-patch, Size: 3848 bytes --]

From 0bb413550e854c81cc5ab180a3afdd43cd4faf0b Mon Sep 17 00:00:00 2001
From: Evandro Menezes <e.menezes@samsung.com>
Date: Thu, 3 Mar 2016 18:13:46 -0600
Subject: [PATCH] Add choices for the reciprocal square root approximation

Allow a target to prefer such operation depending on the FP precision.

gcc/
	* config/aarch64/aarch64-protos.h
	(AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro.
	* config/aarch64/aarch64-tuning-flags.def
	(AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask.
	(AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise.
	* config/aarch64/aarch64.c
	(use_rsqrt_p): New argument for the mode.
	(aarch64_builtin_reciprocal): Devise mode from builtin.
	(aarch64_optab_supported_p): New argument for the mode.
---
 gcc/config/aarch64/aarch64-protos.h         |  3 +++
 gcc/config/aarch64/aarch64-tuning-flags.def |  3 ++-
 gcc/config/aarch64/aarch64.c                | 23 +++++++++++++++--------
 3 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index acf2062..ee3505c 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -263,6 +263,9 @@ enum aarch64_extra_tuning_flags
 };
 #undef AARCH64_EXTRA_TUNING_OPTION
 
+#define AARCH64_EXTRA_TUNE_APPROX_RSQRT \
+  (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF | AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF)
+
 extern struct tune_params aarch64_tune_params;
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index 7e45a0c..57d9588 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -29,5 +29,6 @@
      AARCH64_TUNE_ to give an enum name. */
 
 AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
-AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT)
+AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT_DF)
+AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrtf", APPROX_RSQRT_SF)
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 801f95a..39a1a47 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7464,12 +7464,16 @@ aarch64_memory_move_cost (machine_mode mode ATTRIBUTE_UNUSED,
    to optimize 1.0/sqrt.  */
 
 static bool
-use_rsqrt_p (void)
+use_rsqrt_p (machine_mode mode)
 {
   return (!flag_trapping_math
 	  && flag_unsafe_math_optimizations
-	  && ((aarch64_tune_params.extra_tuning_flags
-	       & AARCH64_EXTRA_TUNE_APPROX_RSQRT)
+	  && ((GET_MODE_INNER (mode) == SFmode
+	       && (aarch64_tune_params.extra_tuning_flags
+		   & AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF))
+	      || (GET_MODE_INNER (mode) == DFmode
+		  && (aarch64_tune_params.extra_tuning_flags
+		      & AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF))
 	      || flag_mrecip_low_precision_sqrt));
 }
 
@@ -7479,9 +7483,12 @@ use_rsqrt_p (void)
 static tree
 aarch64_builtin_reciprocal (tree fndecl)
 {
-  if (!use_rsqrt_p ())
-    return NULL_TREE;
-  return aarch64_builtin_rsqrt (DECL_FUNCTION_CODE (fndecl));
+  machine_mode mode = TYPE_MODE (TREE_TYPE (fndecl));
+
+  if (use_rsqrt_p (mode))
+    return aarch64_builtin_rsqrt (DECL_FUNCTION_CODE (fndecl));
+
+  return NULL_TREE;
 }
 
 typedef rtx (*rsqrte_type) (rtx, rtx);
@@ -13960,13 +13967,13 @@ aarch64_promoted_type (const_tree t)
 /* Implement the TARGET_OPTAB_SUPPORTED_P hook.  */
 
 static bool
-aarch64_optab_supported_p (int op, machine_mode, machine_mode,
+aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode,
 			   optimization_type opt_type)
 {
   switch (op)
     {
     case rsqrt_optab:
-      return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p ();
+      return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p (mode1);
 
     default:
       return true;
-- 
2.6.3


  reply	other threads:[~2016-03-08 22:08 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-08 21:35 Evandro Menezes
2015-12-09 14:05 ` Marcus Shawcroft
2015-12-09 16:31   ` Evandro Menezes
2015-12-09 16:52 ` Kyrill Tkachov
2015-12-09 16:59   ` Evandro Menezes
2015-12-09 17:03     ` Kyrill Tkachov
2015-12-09 17:16       ` Kyrill Tkachov
2015-12-09 18:50         ` Evandro Menezes
2015-12-10 10:30           ` Kyrill Tkachov
2016-02-23  0:50             ` Evandro Menezes
2016-02-26 15:00               ` James Greenhalgh
2016-02-26 23:42                 ` Evandro Menezes
2016-02-26 23:46                   ` Evandro Menezes
2016-02-16 20:56 ` Evandro Menezes
2016-03-04  0:22   ` Evandro Menezes
2016-03-08 22:08     ` Evandro Menezes [this message]
2016-03-08 22:18       ` Evandro Menezes
2016-03-08 22:20         ` Evandro Menezes
2016-03-16 19:45       ` Evandro Menezes
2016-03-17 14:55         ` James Greenhalgh
2016-03-17 16:25           ` Evandro Menezes
     [not found] <AM3PR08MB00886499882773F3C8B9F71D83B30@AM3PR08MB0088.eurprd08.prod.outlook.com>
     [not found] ` <011d01d17a26$31b3ade0$951b09a0$@samsung.com>
2016-03-10 16:52   ` Wilco Dijkstra
2016-03-10 16:58     ` Evandro Menezes
2016-03-10 19:10       ` Wilco Dijkstra
2016-03-10 22:15         ` Evandro Menezes
2016-03-11  1:06           ` Wilco Dijkstra
2016-03-14 16:39             ` Evandro Menezes
2016-03-14 19:13               ` Wilco Dijkstra
2016-03-16 21:44             ` Evandro Menezes
2016-03-17 22:50 Evandro Menezes
2016-03-24 20:30 ` [AArch64] " Evandro Menezes
2016-04-01 22:45   ` Evandro Menezes
2016-04-04 16:32     ` Evandro Menezes
     [not found]       ` <DB3PR08MB008902F0F0AFA3B1F1C91511839E0@DB3PR08MB0089.eurprd08.prod.outlook.com>
2016-04-05 22:30         ` Evandro Menezes
2016-04-12 18:15           ` Evandro Menezes
2016-04-21 18:44             ` Evandro Menezes
2016-04-27 14:24             ` James Greenhalgh
2016-04-27 15:45               ` Evandro Menezes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56DF4D50.4060804@samsung.com \
    --to=e.menezes@samsung.com \
    --cc=Marcus.Shawcroft@arm.com \
    --cc=benedikt.huber@theobroma-systems.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=james.greenhalgh@arm.com \
    --cc=kyrylo.tkachov@arm.com \
    --cc=philipp.tomsich@theobroma-systems.com \
    --cc=pinskia@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).