From: Evandro Menezes <e.menezes@samsung.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>,
Marcus Shawcroft <Marcus.Shawcroft@arm.com>,
James Greenhalgh <james.greenhalgh@arm.com>,
Andrew Pinski <pinskia@gmail.com>,
Benedikt Huber <benedikt.huber@theobroma-systems.com>,
philipp.tomsich@theobroma-systems.com,
Kyrill Tkachov <kyrylo.tkachov@arm.com>
Subject: Re: [AArch64] Emit square root using the Newton series
Date: Tue, 08 Mar 2016 22:08:00 -0000 [thread overview]
Message-ID: <56DF4D50.4060804@samsung.com> (raw)
In-Reply-To: <56D8D553.6060902@samsung.com>
[-- Attachment #1: Type: text/plain, Size: 2836 bytes --]
On 02/16/16 14:56, Evandro Menezes wrote:
> On 12/08/15 15:35, Evandro Menezes wrote:
>> Emit square root using the Newton series
>>
>> 2015-12-03 Evandro Menezes <e.menezes@samsung.com>
>>
>> gcc/
>> * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
>> Declare new
>> function.
>> * config/aarch64/aarch64-simd.md (sqrt<mode>2): New
>> expansion and
>> insn definitions.
>> * config/aarch64/aarch64-tuning-flags.def
>> (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
>> * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
>> new function.
>> * config/aarch64/aarch64.md (sqrt<mode>2): New expansion
>> and insn
>> definitions.
>> * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
>> Expand option
>> description.
>> * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.
>>
>> This patch extends the patch that added support for implementing
>> x^-1/2 using the Newton series by adding support for x^1/2 as well.
>>
>> Is it OK at this point of stage 3?
>>
>> Thank you,
>>
>
> James,
>
> As I was saying, this patch results in some validation errors in
> CPU2000 benchmarks using DF. Although proving the algorithm to be
> pretty solid with a vast set of random values, I'm confused why some
> benchmarks fail to validate with this implementation of the Newton
> series for square root too, when they pass with the Newton series for
> reciprocal square root.
>
> Since I had no problems with the same algorithm on x86-64, I wonder if
> the initial estimate on AArch64, which offers just 8 bits, whereas
> x86-64 offers 11 bits, has to do with it. Then again, the algorithm
> iterated 1 less time on x86-64 than on AArch64.
>
> Since it seems that the initial estimate is sufficient for CPU2000 to
> validate when using SF, I'm leaning towards restricting the Newton
> series for square root only for SF.
>
> Your thoughts on the matter are appreciated,
Add choices for the reciprocal square root approximation
Allow a target to prefer such operation depending on the FP
precision.
gcc/
* config/aarch64/aarch64-protos.h
(AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro.
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask.
(AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise.
* config/aarch64/aarch64.c
(use_rsqrt_p): New argument for the mode.
(aarch64_builtin_reciprocal): Devise mode from builtin.
(aarch64_optab_supported_p): New argument for the mode.
Now that the patch is attached, feedback is appreciated.
Thank you,
--
Evandro Menezes
[-- Attachment #2: 0001-Add-choices-for-the-reciprocal-square-root-approxima.patch --]
[-- Type: text/x-patch, Size: 3848 bytes --]
From 0bb413550e854c81cc5ab180a3afdd43cd4faf0b Mon Sep 17 00:00:00 2001
From: Evandro Menezes <e.menezes@samsung.com>
Date: Thu, 3 Mar 2016 18:13:46 -0600
Subject: [PATCH] Add choices for the reciprocal square root approximation
Allow a target to prefer such operation depending on the FP precision.
gcc/
* config/aarch64/aarch64-protos.h
(AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro.
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask.
(AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise.
* config/aarch64/aarch64.c
(use_rsqrt_p): New argument for the mode.
(aarch64_builtin_reciprocal): Devise mode from builtin.
(aarch64_optab_supported_p): New argument for the mode.
---
gcc/config/aarch64/aarch64-protos.h | 3 +++
gcc/config/aarch64/aarch64-tuning-flags.def | 3 ++-
gcc/config/aarch64/aarch64.c | 23 +++++++++++++++--------
3 files changed, 20 insertions(+), 9 deletions(-)
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index acf2062..ee3505c 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -263,6 +263,9 @@ enum aarch64_extra_tuning_flags
};
#undef AARCH64_EXTRA_TUNING_OPTION
+#define AARCH64_EXTRA_TUNE_APPROX_RSQRT \
+ (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF | AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF)
+
extern struct tune_params aarch64_tune_params;
HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index 7e45a0c..57d9588 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -29,5 +29,6 @@
AARCH64_TUNE_ to give an enum name. */
AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
-AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT)
+AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT_DF)
+AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrtf", APPROX_RSQRT_SF)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 801f95a..39a1a47 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7464,12 +7464,16 @@ aarch64_memory_move_cost (machine_mode mode ATTRIBUTE_UNUSED,
to optimize 1.0/sqrt. */
static bool
-use_rsqrt_p (void)
+use_rsqrt_p (machine_mode mode)
{
return (!flag_trapping_math
&& flag_unsafe_math_optimizations
- && ((aarch64_tune_params.extra_tuning_flags
- & AARCH64_EXTRA_TUNE_APPROX_RSQRT)
+ && ((GET_MODE_INNER (mode) == SFmode
+ && (aarch64_tune_params.extra_tuning_flags
+ & AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF))
+ || (GET_MODE_INNER (mode) == DFmode
+ && (aarch64_tune_params.extra_tuning_flags
+ & AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF))
|| flag_mrecip_low_precision_sqrt));
}
@@ -7479,9 +7483,12 @@ use_rsqrt_p (void)
static tree
aarch64_builtin_reciprocal (tree fndecl)
{
- if (!use_rsqrt_p ())
- return NULL_TREE;
- return aarch64_builtin_rsqrt (DECL_FUNCTION_CODE (fndecl));
+ machine_mode mode = TYPE_MODE (TREE_TYPE (fndecl));
+
+ if (use_rsqrt_p (mode))
+ return aarch64_builtin_rsqrt (DECL_FUNCTION_CODE (fndecl));
+
+ return NULL_TREE;
}
typedef rtx (*rsqrte_type) (rtx, rtx);
@@ -13960,13 +13967,13 @@ aarch64_promoted_type (const_tree t)
/* Implement the TARGET_OPTAB_SUPPORTED_P hook. */
static bool
-aarch64_optab_supported_p (int op, machine_mode, machine_mode,
+aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode,
optimization_type opt_type)
{
switch (op)
{
case rsqrt_optab:
- return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p ();
+ return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p (mode1);
default:
return true;
--
2.6.3
next prev parent reply other threads:[~2016-03-08 22:08 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-08 21:35 Evandro Menezes
2015-12-09 14:05 ` Marcus Shawcroft
2015-12-09 16:31 ` Evandro Menezes
2015-12-09 16:52 ` Kyrill Tkachov
2015-12-09 16:59 ` Evandro Menezes
2015-12-09 17:03 ` Kyrill Tkachov
2015-12-09 17:16 ` Kyrill Tkachov
2015-12-09 18:50 ` Evandro Menezes
2015-12-10 10:30 ` Kyrill Tkachov
2016-02-23 0:50 ` Evandro Menezes
2016-02-26 15:00 ` James Greenhalgh
2016-02-26 23:42 ` Evandro Menezes
2016-02-26 23:46 ` Evandro Menezes
2016-02-16 20:56 ` Evandro Menezes
2016-03-04 0:22 ` Evandro Menezes
2016-03-08 22:08 ` Evandro Menezes [this message]
2016-03-08 22:18 ` Evandro Menezes
2016-03-08 22:20 ` Evandro Menezes
2016-03-16 19:45 ` Evandro Menezes
2016-03-17 14:55 ` James Greenhalgh
2016-03-17 16:25 ` Evandro Menezes
[not found] <AM3PR08MB00886499882773F3C8B9F71D83B30@AM3PR08MB0088.eurprd08.prod.outlook.com>
[not found] ` <011d01d17a26$31b3ade0$951b09a0$@samsung.com>
2016-03-10 16:52 ` Wilco Dijkstra
2016-03-10 16:58 ` Evandro Menezes
2016-03-10 19:10 ` Wilco Dijkstra
2016-03-10 22:15 ` Evandro Menezes
2016-03-11 1:06 ` Wilco Dijkstra
2016-03-14 16:39 ` Evandro Menezes
2016-03-14 19:13 ` Wilco Dijkstra
2016-03-16 21:44 ` Evandro Menezes
2016-03-17 22:50 Evandro Menezes
2016-03-24 20:30 ` [AArch64] " Evandro Menezes
2016-04-01 22:45 ` Evandro Menezes
2016-04-04 16:32 ` Evandro Menezes
[not found] ` <DB3PR08MB008902F0F0AFA3B1F1C91511839E0@DB3PR08MB0089.eurprd08.prod.outlook.com>
2016-04-05 22:30 ` Evandro Menezes
2016-04-12 18:15 ` Evandro Menezes
2016-04-21 18:44 ` Evandro Menezes
2016-04-27 14:24 ` James Greenhalgh
2016-04-27 15:45 ` Evandro Menezes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56DF4D50.4060804@samsung.com \
--to=e.menezes@samsung.com \
--cc=Marcus.Shawcroft@arm.com \
--cc=benedikt.huber@theobroma-systems.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=james.greenhalgh@arm.com \
--cc=kyrylo.tkachov@arm.com \
--cc=philipp.tomsich@theobroma-systems.com \
--cc=pinskia@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).