GCN: Add -mlow-precision-sqrt for double-precision sqrt [PR105246] (was: Re: [PATCH] amdgcn: Add support for additional natively supported floating-point operations)

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Tobias Burnus <tobias@codesourcery.com>
To: Andrew Stubbs <ams@codesourcery.com>
Cc: Kwok Cheung Yeung <kcy@codesourcery.com>,
	gcc-patches <gcc-patches@gcc.gnu.org>,
	Richard Biener <rguenther@suse.de>
Subject: GCN: Add -mlow-precision-sqrt for double-precision sqrt [PR105246] (was: Re: [PATCH] amdgcn: Add support for additional natively supported floating-point operations)
Date: Fri, 9 Sep 2022 14:20:11 +0200	[thread overview]
Message-ID: <5d7ca95c-8a61-6d07-dd5b-e7c2d03072ff@codesourcery.com> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2209091011440.20505@jbgna.fhfr.qr>

[-- Attachment #1: Type: text/plain, Size: 2031 bytes --]

On 09.09.22 12:16, Richard Biener wrote:
> On Fri, 9 Sep 2022, Tobias Burnus wrote:
>> -funsafe-math-optimizations implies -fno-signed-zeros, -fno-trapping-math,
>> -fassociative-math,
>> and -freciprocal-math. All of them reduce precision and my violate IEEE or
>> ISO/language standards.
>>
>> However, I think it is rather surprising to have all of the sudden only a
>> precision of the order of 100,000,000 ULP instead of ~4 ULP as to be expected.
>> That's a precision loss of the order of 10^8 or 2^29 which is huge!
>> ...
> I agree - for example powerpc has -mrecip= to control which instructions
> to use (float/double rsqrt or inverse) and -mrecip-precision to
> specify whether further iteration is done or not.
> [...]
> Your suggested huge reduction in precision isn't usually acceptable
> and should be always explicitely enabled.

First, I have to correct myself – Kwok's -munsafe-math-optimizations is
only about single-precision functions, which do not have this problem.

However, the pre-existing 'sqrt' problem still is real. It also applies
to reverse sqrt ("v_rsq"), but that's for whatever reason not used for GCN.

This patch now adds a commandline flag - off by default - to choose
whether this behavior is wanted. I did use the same name as aarch64,
https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#index-mlow-precision-sqrt
(the latter also has -mlow-precision-recip-sqrt, which is not (yet)
sensible for GCN.)

This patch was manually tested for all combinations and I also looked at
insn-recog.cc, given that it is my first .md patch – it it seems to work
fine.

OK for mainline – or are there comments or more suggestions? I also
included some word for the release notes.

Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: gcn-low-prec-sqrt.diff --]
[-- Type: text/x-patch, Size: 3550 bytes --]

GCN: Add -mlow-precision-sqrt for double-precision sqrt [PR105246]

GCN's sqrt supports single and double precision; however, for either
the result has 23 bits for the fractional part of the floating-point
number. (For double precision: instead of 52 bits).

This adds now -mlow-precision-sqrt, using the same naming as aarch64.
Before, the hardware builtin sqrt was always used with
unsafe-math-optimiaztions, now only with single precision; for
double precision, the new -mlow-precision-sqrt is explicitly
required in addition. As there is no rsqrt, this flag likewise
applies to 1/sqrt.

	PR target/105246

gcc/ChangeLog:

	* config/gcn/gcn.opt (mlow-precision-sqrt): New, off by default.
	* config/gcn/gcn-valu.md (sqrt, v_sqrt): Require it unless SFmode.
	* doc/invoke.texi (GCN): Add -mlow-precision-sqrt entry.

 gcc/config/gcn/gcn-valu.md |  6 ++++--
 gcc/config/gcn/gcn.opt     |  7 +++++++
 gcc/doc/invoke.texi        | 11 +++++++++++
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 8c33ae0c717..c7a0b562874 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -2276,7 +2276,8 @@ (define_insn "sqrt<mode>2<exec>"
   [(set (match_operand:V_FP 0 "register_operand"  "=  v")
 	(sqrt:V_FP
 	  (match_operand:V_FP 1 "gcn_alu_operand" "vSvB")))]
-  "flag_unsafe_math_optimizations"
+  "(flag_unsafe_math_optimizations
+    && (<MODE>mode == V64SFmode || flag_mlow_precision_sqrt))"
   "v_sqrt%i0\t%0, %1"
   [(set_attr "type" "vop1")
    (set_attr "length" "8")])
@@ -2285,7 +2286,8 @@ (define_insn "sqrt<mode>2"
   [(set (match_operand:FP 0 "register_operand"  "=  v")
 	(sqrt:FP
 	  (match_operand:FP 1 "gcn_alu_operand" "vSvB")))]
-  "flag_unsafe_math_optimizations"
+  "(flag_unsafe_math_optimizations
+    && (<MODE>mode == SFmode || flag_mlow_precision_sqrt))"
   "v_sqrt%i0\t%0, %1"
   [(set_attr "type" "vop1")
    (set_attr "length" "8")])
diff --git a/gcc/config/gcn/gcn.opt b/gcc/config/gcn/gcn.opt
index 9606aaf0b1a..a3f341f7eb1 100644
--- a/gcc/config/gcn/gcn.opt
+++ b/gcc/config/gcn/gcn.opt
@@ -77,6 +77,13 @@ mgang-private-size=
 Target RejectNegative Joined UInteger Var(gang_private_size_opt) Init(-1)
 Amount of local data-share (LDS) memory to reserve for gang-private variables.
 
+mlow-precision-sqrt
+Target Var(flag_mlow_precision_sqrt) Optimization
+Enable the square root approximation for 64bit double precision;
+this reduces precision of square root results to 23 bits for the
+fractional part of the floating-point number.
+It also implies low-precision reciprocal sqrt.
+
 Wopenacc-dims
 Target Var(warn_openacc_dims) Warning
 Warn about invalid OpenACC dimensions.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5c066219a7d..fdd6e41cade 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -20192,6 +20192,17 @@ compiled code must match the device mode.  The default is @samp{-mno-xnack}.
 At present this option is a placeholder for support that is not yet
 implemented.
 
+@item -mlow-precision-sqrt
+@itemx -mno-low-precision-sqrt
+@opindex mlow-precision-sqrt
+@opindex mno-low-precision-sqrt
+Enable the square root approximation for 64bit double precision;
+this reduces precision of square root results to 23 bits for the
+fractional part of the floating-point number (2@sup{29} ULP).
+It also implies low-precision reciprocal sqrt.
+This option only has an effect if @option{-ffast-math} or
+@option{-funsafe-math-optimizations} is used as well.
+
 @end table
 
 @node ARC Options

[-- Attachment #3: www-gcn-low-prec.diff --]
[-- Type: text/x-patch, Size: 901 bytes --]

gcc-13/changes.html - GCN: document -mlow-precision-sqrt

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 390193ca..d335eab3 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -179,6 +179,12 @@ a work-in-progress.</p>
   <li>Support for the Instinct MI200 series devices (<a
       href="https://gcc.gnu.org/onlinedocs/gcc/AMD-GCN-Options.html">
       <code>gfx90a</code></a>) has been added.</li>
+  <li>The <code>-mlow-precision-sqrt</code> option (disabled by default)
+      has been added to use the hardware <code>sqrt</code> also for
+      double-precision floating point arguments; note that the result
+      only has much a reduced accurary of 2<sup>29</sup> ULP. This
+      option requires <code>-funsafe-math-optimizations</code>
+      (implied by <code>-ffast-math</code>) in addition.</li>
 </ul>
 
 <!-- <h3 id="arc">ARC</h3> -->

next prev parent reply	other threads:[~2022-09-09 12:20 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-08 20:38 [PATCH] amdgcn: Add support for additional natively supported floating-point operations Kwok Cheung Yeung
2022-09-09  8:10 ` Andrew Stubbs
2022-09-09  9:15   ` Tobias Burnus
2022-09-09 10:16     ` Richard Biener
2022-09-09 12:20       ` Tobias Burnus [this message]
2022-09-09 12:40         ` GCN: Add -mlow-precision-sqrt for double-precision sqrt [PR105246] (was: Re: [PATCH] amdgcn: Add support for additional natively supported floating-point operations) Andrew Stubbs
2022-09-09 12:32       ` [PATCH] amdgcn: Add support for additional natively supported floating-point operations Stubbs, Andrew
2022-09-09 17:57 ` Joseph Myers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5d7ca95c-8a61-6d07-dd5b-e7c2d03072ff@codesourcery.com \
    --to=tobias@codesourcery.com \
    --cc=ams@codesourcery.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=kcy@codesourcery.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).