From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1791) id 981BF3858014; Wed, 1 Dec 2021 17:09:20 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 981BF3858014 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Adhemerval Zanella To: glibc-cvs@sourceware.org Subject: [glibc/azanella/hypot-refactor] math: Improve hypot performance with FMA X-Act-Checkin: glibc X-Git-Author: Wilco Dijkstra X-Git-Refname: refs/heads/azanella/hypot-refactor X-Git-Oldrev: e12b696f5ac876db294c1d25630a3d07444572c3 X-Git-Newrev: 4d76239e99cb6b71e2c4fe0b491861d2cd9ea284 Message-Id: <20211201170920.981BF3858014@sourceware.org> Date: Wed, 1 Dec 2021 17:09:20 +0000 (GMT) X-BeenThere: glibc-cvs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Dec 2021 17:09:20 -0000 https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4d76239e99cb6b71e2c4fe0b491861d2cd9ea284 commit 4d76239e99cb6b71e2c4fe0b491861d2cd9ea284 Author: Wilco Dijkstra Date: Tue Nov 30 16:29:25 2021 -0300 math: Improve hypot performance with FMA Improve hypot performance significantly by using fma when available. The fma version has twice the throughput of the previous version and 70% of the latency. The non-fma version has 30% higher throughput and 10% higher latency. Max ULP error is 0.949 with fma and 0.792 without fma. Passes GLIBC testsuite. Diff: --- sysdeps/ieee754/dbl-64/e_hypot.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c index 6cc48888f0..5f438e4348 100644 --- a/sysdeps/ieee754/dbl-64/e_hypot.c +++ b/sysdeps/ieee754/dbl-64/e_hypot.c @@ -26,7 +26,11 @@ rounding mode. - Handle required underflow exception for subnormal results. - The expected ULP is ~0.792. + The expected ULP is ~0.792 or ~0.948 if FMA is used. For FMA, the + correction is not used and the error of sqrt (x^2 + y^2) is below 1 ULP + if x^2 + y^2 is computed with less than 0.707 ULP error. If |x| >= |2y|, + fma (x, x, y^2) has ~0.625 ULP. If |x| < |2y|, fma (|2x|, |y|, (x - y)^2) + has ~0.625 ULP. [1] https://arxiv.org/pdf/1904.09481.pdf */ @@ -48,6 +52,16 @@ static inline double kernel (double ax, double ay) { double t1, t2; +#ifdef __FP_FAST_FMA + t1 = ay + ay; + t2 = ax - ay; + + if (t1 >= ax) + return sqrt (fma (t1, ax, t2 * t2)); + else + return sqrt (fma (ax, ax, ay * ay)); + +#else double h = sqrt (ax * ax + ay * ay); if (h <= 2.0 * ay) { @@ -64,6 +78,7 @@ kernel (double ax, double ay) h -= (t1 + t2) / (2.0 * h); return h; +#endif } double