From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 5CAEA3858409 for ; Wed, 17 Nov 2021 19:02:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5CAEA3858409 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 1AHIlOQZ028365; Wed, 17 Nov 2021 19:02:54 GMT Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0b-001b2d01.pphosted.com with ESMTP id 3cd7aw0a42-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 17 Nov 2021 19:02:53 +0000 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1AHIoQ0m007112; Wed, 17 Nov 2021 19:02:53 GMT Received: from b03cxnp08026.gho.boulder.ibm.com (b03cxnp08026.gho.boulder.ibm.com [9.17.130.18]) by ppma02dal.us.ibm.com with ESMTP id 3ca50cxdu5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 17 Nov 2021 19:02:53 +0000 Received: from b03ledav006.gho.boulder.ibm.com (b03ledav006.gho.boulder.ibm.com [9.17.130.237]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 1AHJ2qOw50594100 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 17 Nov 2021 19:02:52 GMT Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 24F21C6059; Wed, 17 Nov 2021 19:02:52 +0000 (GMT) Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 95F07C6074; Wed, 17 Nov 2021 19:02:51 +0000 (GMT) Received: from li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com (unknown [9.65.92.51]) by b03ledav006.gho.boulder.ibm.com (Postfix) with ESMTPS; Wed, 17 Nov 2021 19:02:51 +0000 (GMT) Date: Wed, 17 Nov 2021 13:02:49 -0600 From: "Paul A. Clarke" To: Wilco Dijkstra Cc: Adhemerval Zanella , "'GNU C Library'" Subject: Re: RFC: Improve hypot performance Message-ID: <20211117190249.GC7755@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: OcxN4TytWZyuYbQQn3OynCi95G2FmJJc X-Proofpoint-GUID: OcxN4TytWZyuYbQQn3OynCi95G2FmJJc X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-11-17_06,2021-11-17_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=801 malwarescore=0 priorityscore=1501 bulkscore=0 phishscore=0 clxscore=1015 spamscore=0 suspectscore=0 mlxscore=0 lowpriorityscore=0 adultscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2111170084 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Nov 2021 19:02:59 -0000 On Wed, Nov 17, 2021 at 03:58:53PM +0000, Wilco Dijkstra via Libc-alpha wrote: > Hi Adhemerval, > > Here is an early version of a much faster hypot implementation. It uses fma > to significantly outperform the powerpc version both in throughput and > latency. It has a worst-case ULP of ~0.949 and passes the testsuite. The > powerpc version has a worst-case ULP of ~1.21 and several test failures. > > It applies on top of your hypot patch series. I didn't optimize the non-fma > case since modern targets have fma. It'll be interesting to compare it on > Power. You'll need to correctly set FAST_FMINMAX to indicate support for > inlined fmin/fmax instructions (this will be added to math_private.h for > targets that have it), without it the code tries to use a conditional move > since using a branch here is really bad for performance. On Power10, this implementation is still has a large delta compared to the current implementation: current on Power10: "hypot": { "workload-random": { "duration": 5.27306e+08, "iterations": 4.8e+07, "reciprocal-throughput": 8.26834, "latency": 13.7027, "max-throughput": 1.20943e+08, "min-throughput": 7.29781e+07 } } with Adhemerval's patches and this patch: "hypot": { "workload-random": { "duration": 5.34629e+08, "iterations": 3.6e+07, "reciprocal-throughput": 12.5006, "latency": 17.201, "max-throughput": 7.99959e+07, "min-throughput": 5.81362e+07 } } PC > --- > > diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c > index d20bc3e3657e350a1103a8f8477db35ee60399e0..3906711788cff5d66725e5879bb4d6c36dd24dc7 100644 > --- a/sysdeps/ieee754/dbl-64/e_hypot.c > +++ b/sysdeps/ieee754/dbl-64/e_hypot.c > @@ -37,73 +37,40 @@ > #include > #include > > +#define FAST_FMINMAX 1 > +//#undef __FP_FAST_FMA > + > +#define SCALE 0x1p-600 > +#define LARGE_VAL 0x1p+511 > +#define TINY_VAL 0x1p-459 > +#define EPS 0x1p-54 > + > + > static inline double > handle_errno (double r) > { > + r = math_narrow_eval (r); > if (isinf (r)) > __set_errno (ERANGE); > return r; > } > > -/* sqrt (DBL_EPSILON / 2.0) */ > -#define SQRT_EPS_DIV_2 0x1.6a09e667f3bcdp-27 > -/* DBL_MIN / (sqrt (DBL_EPSILON / 2.0)) */ > -#define DBL_MIN_THRESHOLD 0x1.6a09e667f3bcdp-996 > -/* eps (double) * sqrt (DBL_MIN)) */ > -#define SCALE 0x1p-563 > -/* 1 / eps (sqrt (DBL_MIN) */ > -#define INV_SCALE 0x1p+563 > -/* sqrt (DBL_MAX) */ > -#define SQRT_DBL_MAX 0x1.6a09e667f3bccp+511 > -/* sqrt (DBL_MIN) */ > -#define SQRT_DBL_MIN 0x1p-511 > - > -double > -__hypot (double x, double y) > +static inline double > +kernel (double ax, double ay) > { > - if ((isinf (x) || isinf (y)) > - && !issignaling (x) && !issignaling (y)) > - return INFINITY; > - if (isnan (x) || isnan (y)) > - return x + y; > - > - double ax = fabs (x); > - double ay = fabs (y); > - if (ay > ax) > - { > - double tmp = ax; > - ax = ay; > - ay = tmp; > - } > - > - /* Widely varying operands. The DBL_MIN_THRESHOLD check is used to avoid > - a spurious underflow from the multiplication. */ > - if (ax >= DBL_MIN_THRESHOLD && ay <= ax * SQRT_EPS_DIV_2) > - return (ay == 0.0) > - ? ax > - : handle_errno (math_narrow_eval (ax + DBL_TRUE_MIN)); > + double t1, t2; > +#ifdef __FP_FAST_FMA > + t1 = ay + ay; > + t2 = ax - ay; > > - double scale = SCALE; > - if (ax > SQRT_DBL_MAX) > - { > - ax *= scale; > - ay *= scale; > - scale = INV_SCALE; > - } > - else if (ay < SQRT_DBL_MIN) > - { > - ax /= scale; > - ay /= scale; > - } > + if (t1 >= ax) > + return sqrt (fma (t1, ax, t2 * t2)); > else > - scale = 1.0; > - > + return sqrt (fma (ax, ax, ay * ay)); > +#else > double h = sqrt (ax * ax + ay * ay); > > - double t1, t2; > - if (h == 0.0) > - return h; > - else if (h <= 2.0 * ay) > + if (h <= 2.0 * ay) > { > double delta = h - ay; > t1 = ax * (2.0 * delta - ax); > @@ -112,14 +79,57 @@ __hypot (double x, double y) > else > { > double delta = h - ax; > - t1 = 2.0 * delta * (ax - 2 * ay); > + t1 = 2.0 * delta * (ax - 2.0 * ay); > t2 = (4.0 * delta - ay) * ay + delta * delta; > } > h -= (t1 + t2) / (2.0 * h); > - h = math_narrow_eval (h * scale); > - math_check_force_underflow_nonneg (h); > - return handle_errno (h); > + return h; > +#endif > +} > + > + > +double > +__hypot (double x, double y) > +{ > + if (!isfinite (x) || !isfinite (y)) > + { > + if ((isinf (x) || isinf (y)) > + && !issignaling_inline (x) && !issignaling_inline (y)) > + return INFINITY; > + return x + y; > + } > + > + x = fabs (x); > + y = fabs (y); > + > + double ax = FAST_FMINMAX ? fmax (x, y) : (x < y ? y : x); > + double ay = FAST_FMINMAX ? fmin (x, y) : (x < y ? x : y); > + > + if (__glibc_unlikely (ax > LARGE_VAL)) > + { > + if (__glibc_unlikely (ay <= ax * EPS)) > + return handle_errno (ax + ay); > + > + return handle_errno (kernel (ax * SCALE, ay * SCALE) / SCALE); > + } > + > + if (__glibc_unlikely (ay < TINY_VAL)) > + { > + if (__glibc_unlikely (ax >= ay / EPS)) > + return math_narrow_eval (ax + ay); > + > + ax = math_narrow_eval (kernel (ax / SCALE, ay / SCALE) * SCALE); > + math_check_force_underflow_nonneg (ax); > + return ax; > + } > + > + /* Common case: ax is not huge and ay is not tiny. */ > + if (__glibc_unlikely (ay <= ax * EPS)) > + return math_narrow_eval (ax + ay); > + > + return math_narrow_eval (kernel (ax, ay)); > } > + > strong_alias (__hypot, __ieee754_hypot) > libm_alias_finite (__ieee754_hypot, __hypot) > #if LIBM_SVID_COMPAT