public inbox for glibc-cvs@sourceware.org
help / color / mirror / Atom feed
* [glibc/azanella/hypot-refactor] math: Improve hypot performance with FMA
@ 2021-12-06 17:58 Adhemerval Zanella
  0 siblings, 0 replies; 3+ messages in thread
From: Adhemerval Zanella @ 2021-12-06 17:58 UTC (permalink / raw)
  To: glibc-cvs

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=45c6208295461e30c8e8c0438929759362ed2c64

commit 45c6208295461e30c8e8c0438929759362ed2c64
Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Date:   Tue Nov 30 16:29:25 2021 -0300

    math: Improve hypot performance with FMA
    
    Improve hypot performance significantly by using fma when available. The
    fma version has twice the throughput of the previous version and 70% of
    the latency.  The non-fma version has 30% higher throughput and 10%
    higher latency.
    
    Max ULP error is 0.949 with fma and 0.792 without fma.
    
    Passes GLIBC testsuite.

Diff:
---
 sysdeps/ieee754/dbl-64/e_hypot.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c
index 75bce2df4e..6fedf0d61f 100644
--- a/sysdeps/ieee754/dbl-64/e_hypot.c
+++ b/sysdeps/ieee754/dbl-64/e_hypot.c
@@ -26,7 +26,11 @@
      rounding mode.
    - Handle required underflow exception for subnormal results.
 
-   The expected ULP is ~0.792.
+   The expected ULP is ~0.792 or ~0.948 if FMA is used.  For FMA, the
+   correction is not used and the error of sqrt (x^2 + y^2) is below 1 ULP
+   if x^2 + y^2 is computed with less than 0.707 ULP error.  If |x| >= |2y|,
+   fma (x, x, y^2) has ~0.625 ULP.  If |x| < |2y|, fma (|2x|, |y|, (x - y)^2)
+   has ~0.625 ULP.
 
    [1] https://arxiv.org/pdf/1904.09481.pdf  */
 
@@ -48,6 +52,16 @@ static inline double
 kernel (double ax, double ay)
 {
   double t1, t2;
+#ifdef __FP_FAST_FMA
+  t1 = ay + ay;
+  t2 = ax - ay;
+
+  if (t1 >= ax)
+    return sqrt (fma (t1, ax, t2 * t2));
+  else
+    return sqrt (fma (ax, ax, ay * ay));
+
+#else
   double h = sqrt (ax * ax + ay * ay);
   if (h <= 2.0 * ay)
     {
@@ -64,6 +78,7 @@ kernel (double ax, double ay)
 
   h -= (t1 + t2) / (2.0 * h);
   return h;
+#endif
 }
 
 double


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [glibc/azanella/hypot-refactor] math: Improve hypot performance with FMA
@ 2021-12-01 17:09 Adhemerval Zanella
  0 siblings, 0 replies; 3+ messages in thread
From: Adhemerval Zanella @ 2021-12-01 17:09 UTC (permalink / raw)
  To: glibc-cvs

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4d76239e99cb6b71e2c4fe0b491861d2cd9ea284

commit 4d76239e99cb6b71e2c4fe0b491861d2cd9ea284
Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Date:   Tue Nov 30 16:29:25 2021 -0300

    math: Improve hypot performance with FMA
    
    Improve hypot performance significantly by using fma when available. The
    fma version has twice the throughput of the previous version and 70% of
    the latency.  The non-fma version has 30% higher throughput and 10%
    higher latency.
    
    Max ULP error is 0.949 with fma and 0.792 without fma.
    
    Passes GLIBC testsuite.

Diff:
---
 sysdeps/ieee754/dbl-64/e_hypot.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c
index 6cc48888f0..5f438e4348 100644
--- a/sysdeps/ieee754/dbl-64/e_hypot.c
+++ b/sysdeps/ieee754/dbl-64/e_hypot.c
@@ -26,7 +26,11 @@
      rounding mode.
    - Handle required underflow exception for subnormal results.
 
-   The expected ULP is ~0.792.
+   The expected ULP is ~0.792 or ~0.948 if FMA is used.  For FMA, the
+   correction is not used and the error of sqrt (x^2 + y^2) is below 1 ULP
+   if x^2 + y^2 is computed with less than 0.707 ULP error.  If |x| >= |2y|,
+   fma (x, x, y^2) has ~0.625 ULP.  If |x| < |2y|, fma (|2x|, |y|, (x - y)^2)
+   has ~0.625 ULP.
 
    [1] https://arxiv.org/pdf/1904.09481.pdf  */
 
@@ -48,6 +52,16 @@ static inline double
 kernel (double ax, double ay)
 {
   double t1, t2;
+#ifdef __FP_FAST_FMA
+  t1 = ay + ay;
+  t2 = ax - ay;
+
+  if (t1 >= ax)
+    return sqrt (fma (t1, ax, t2 * t2));
+  else
+    return sqrt (fma (ax, ax, ay * ay));
+
+#else
   double h = sqrt (ax * ax + ay * ay);
   if (h <= 2.0 * ay)
     {
@@ -64,6 +78,7 @@ kernel (double ax, double ay)
 
   h -= (t1 + t2) / (2.0 * h);
   return h;
+#endif
 }
 
 double


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [glibc/azanella/hypot-refactor] math: Improve hypot performance with FMA
@ 2021-12-01 16:45 Adhemerval Zanella
  0 siblings, 0 replies; 3+ messages in thread
From: Adhemerval Zanella @ 2021-12-01 16:45 UTC (permalink / raw)
  To: glibc-cvs

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4d76239e99cb6b71e2c4fe0b491861d2cd9ea284

commit 4d76239e99cb6b71e2c4fe0b491861d2cd9ea284
Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Date:   Tue Nov 30 16:29:25 2021 -0300

    math: Improve hypot performance with FMA
    
    Improve hypot performance significantly by using fma when available. The
    fma version has twice the throughput of the previous version and 70% of
    the latency.  The non-fma version has 30% higher throughput and 10%
    higher latency.
    
    Max ULP error is 0.949 with fma and 0.792 without fma.
    
    Passes GLIBC testsuite.

Diff:
---
 sysdeps/ieee754/dbl-64/e_hypot.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c
index 6cc48888f0..5f438e4348 100644
--- a/sysdeps/ieee754/dbl-64/e_hypot.c
+++ b/sysdeps/ieee754/dbl-64/e_hypot.c
@@ -26,7 +26,11 @@
      rounding mode.
    - Handle required underflow exception for subnormal results.
 
-   The expected ULP is ~0.792.
+   The expected ULP is ~0.792 or ~0.948 if FMA is used.  For FMA, the
+   correction is not used and the error of sqrt (x^2 + y^2) is below 1 ULP
+   if x^2 + y^2 is computed with less than 0.707 ULP error.  If |x| >= |2y|,
+   fma (x, x, y^2) has ~0.625 ULP.  If |x| < |2y|, fma (|2x|, |y|, (x - y)^2)
+   has ~0.625 ULP.
 
    [1] https://arxiv.org/pdf/1904.09481.pdf  */
 
@@ -48,6 +52,16 @@ static inline double
 kernel (double ax, double ay)
 {
   double t1, t2;
+#ifdef __FP_FAST_FMA
+  t1 = ay + ay;
+  t2 = ax - ay;
+
+  if (t1 >= ax)
+    return sqrt (fma (t1, ax, t2 * t2));
+  else
+    return sqrt (fma (ax, ax, ay * ay));
+
+#else
   double h = sqrt (ax * ax + ay * ay);
   if (h <= 2.0 * ay)
     {
@@ -64,6 +78,7 @@ kernel (double ax, double ay)
 
   h -= (t1 + t2) / (2.0 * h);
   return h;
+#endif
 }
 
 double


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-12-06 17:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-06 17:58 [glibc/azanella/hypot-refactor] math: Improve hypot performance with FMA Adhemerval Zanella
  -- strict thread matches above, loose matches on Subject: below --
2021-12-01 17:09 Adhemerval Zanella
2021-12-01 16:45 Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).