From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR03-DB5-obe.outbound.protection.outlook.com (mail-eopbgr40071.outbound.protection.outlook.com [40.107.4.71]) by sourceware.org (Postfix) with ESMTPS id 8E9D63858410 for ; Wed, 17 Nov 2021 15:59:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8E9D63858410 Received: from AS9PR06CA0311.eurprd06.prod.outlook.com (2603:10a6:20b:45b::8) by AM0PR08MB5139.eurprd08.prod.outlook.com (2603:10a6:208:15d::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4713.19; Wed, 17 Nov 2021 15:59:02 +0000 Received: from AM5EUR03FT032.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:45b:cafe::f3) by AS9PR06CA0311.outlook.office365.com (2603:10a6:20b:45b::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4713.19 via Frontend Transport; Wed, 17 Nov 2021 15:59:02 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT032.mail.protection.outlook.com (10.152.16.84) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4690.20 via Frontend Transport; Wed, 17 Nov 2021 15:59:02 +0000 Received: ("Tessian outbound dbb52aec1fa6:v110"); Wed, 17 Nov 2021 15:59:01 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 40010a44baa6eb32 X-CR-MTA-TID: 64aa7808 Received: from c30f05762114.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id ECB9BDE2-1375-4244-A098-3B0FC2D90748.1; Wed, 17 Nov 2021 15:58:54 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id c30f05762114.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 17 Nov 2021 15:58:54 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NLaFOjEOfop2tYl0MUjh8VlktbEvncB0NLZKlJ4UZOs0nwE0oTZVhwhTWbAr3eoNkPnkya9QqeCoB5DPKlxz8Tw7a1XcJERnsYqov94AcX2aUv8qosAxKmaXu/WJZ6rzKJ51mVmSccl4OKGUJxxoQWg4J5BGNwqcgcTnf17adm7TlamnX+yNUoN9sIBlBn7RG+IUdrCFtkz92PUSXL9W+pud1C176ZqQ8NximtUCQW697oj/vW2OfyQyv9FBhkMdtcXuNdoLP4MPeWLadl6zDiustDvt+ra7yaZwfTfk+BWMSOxw0LGBv8nQxH1blKyj6Jg3i+TP9FuHir1J1ESC8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=gMoD0/YO819sziwYdDFpUfa0a5d/Oj5zDLkBd4DspHI=; b=cHqGsCdyaimXgS3BjDNwRxFxRQKacmr0aWoV6G/8q2hKSxOc/cliYj+nwh8oHx/eEzvMxk2DALX+zHvy54b5hfB9iTgqO5g/njcGvfPgMWZNg/MmQpq0T9q9nLKajcPOlCf/2f0i5QEhZ+dwcjHEaZIz7mdd3Hj5SWu+nE0vRvbi/RkxXWuHYKTS2b3AdRtYHkEKZfSJRzmqWHXKinzsYFHOVfeDPf1M24K5SprTIKLw+gqsxAhz2wEf21uU5SgOhBJwmXbxyPH/x4A6cPtMOzLxZ+fSGMIijTz7sKB732llbpStb97VjM6MHH26+TRBeobCiL/Ajdwhk1VKWbRd1g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VE1PR08MB4638.eurprd08.prod.outlook.com (2603:10a6:802:b1::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4713.19; Wed, 17 Nov 2021 15:58:53 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::e49f:f587:130d:78e4]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::e49f:f587:130d:78e4%8]) with mapi id 15.20.4713.019; Wed, 17 Nov 2021 15:58:53 +0000 From: Wilco Dijkstra To: Adhemerval Zanella CC: 'GNU C Library' Subject: RFC: Improve hypot performance Thread-Topic: RFC: Improve hypot performance Thread-Index: AQHX28SGNCsQvCJfBE6VimQCQbhoZg== Date: Wed, 17 Nov 2021 15:58:53 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: suggested_attachment_session_id: 7c644ce8-8d12-e944-8c7c-fd04253f8ab1 Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: d98084a1-c67a-4ae7-1ea6-08d9a9e32c9b x-ms-traffictypediagnostic: VE1PR08MB4638:|AM0PR08MB5139: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:4941;OLM:4941; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: UPWTSjw+BEUlyhu6nNa0wuAZ0MSk/TQco+Z57SUJ5D6fOWuJMDa2AX6780vvS06WKLgZUbYz3Jomnq3SatIlJQ5+hlyYThIluolP+nTjj0ihekduRbB/tBVcDdIks18BnFc3K1C4ib6Uf4BBdeVrNE6aXKedHeca/dojv/Q8w2Y8HYw2UQklZEm9a/A1SrcXLuXWA9MAzGNhDGeIfzIAGP9c0f1JSHKAfYeZ9HUpD8Jwo89Yx1vXbmb+bc8cd0PnA7WoiM085nKE8Ifqmos5ODVY9HSriKa+np5Z5qxj9udl1NT39ICGtsLALeT2Zh6PnjZRbloQAhVzjh16/U09aWTOyZ09nN4cj2CsvHuVpT2tjRIKOMT3lpGV1ePBPv+hupoR7ysoBaLHl51iZk8FxZSza5427Rq/HCF8Sw0Wm9QuTvhAGq7nWDVAi6V/4RDWNhAmBoKxE7SaA2LstVQHfqYxYLyRm8yuP87Ecw9sGiav+VUgLNwE4iUALnPECIKE/OD3ahBkXqD62OBmtLEmHuK4EXHfFPPuz40tu9QYhHTuxCS+Dk2rONEDBJSY8f2PgkH/dXxQF/i0m7lhJKhVGQY2N+PiBVYflI6vx7CTzpWNSRfUaadgRPrMvL8odlclsjcV2DDkiPKzAUNED0RWUY5cBBr9gRIxIG1e6DtMNB2lkEIyVZDTFF6/bZx9sy3rP0VRyeeAjJl+WhewF99wLw== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(64756008)(186003)(86362001)(66446008)(4326008)(83380400001)(316002)(52536014)(66476007)(66946007)(76116006)(71200400001)(66556008)(91956017)(26005)(7116003)(9686003)(5660300002)(8676002)(33656002)(7696005)(508600001)(122000001)(38100700002)(55016002)(8936002)(6506007)(2906002)(38070700005)(6916009); DIR:OUT; SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB4638 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT032.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: bf518e82-ac78-45a5-bf2f-08d9a9e3274f X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ZTA2lTR85Hwj3VbImWgpSc2nBw66CVsWqZdOFWl2a35M2nl0Z6/fZLZuc7WIrL/W+AtDUw9TbBIKjiTQXYrjyHPA+nItEPYYjWVjMVzP7a4bDt8CbwYYYbyXlBEYnNUQjodt7KkucjjsNEevKU7F2OOBGzBtpqfUnqkGF1U2P7frLZqyMvn451G0Smx+ISEBaqquWpT+nW5dElhP7Z9mgrN5z7qayCvJHm6kMbhghmN0FM9XL1a41FjoWQfNDwjvC4dSCRplFclx92piDR5qZZzjTWW1f2cYjk9ZEXK+bNzyVUgYlkWjLWibg0cQgcBZRaRLTFWlxphKKXQxyKHvQuonjRkDgwvdU/gEJagMbxb92AP4qc6l4zYlxag3/ux7T9ALvm5GNnmK04RUWJ9Wa0zpWbvXElIyVAT2YN2hVBojUJFEc2QvFWv8t/Y66rNz68yixlGVTO82n4/fbe9Qch21w3pucENhBcjojx7F+9E1je8hvQ8xNFAGZXVvkKgPW5ck2HdW1CLI9EfO0mBfixvt3RQeFGfr4Hg9qEtUctrqbsI5rmKjwRdOdwNcB9fXMixTP0mamEjGiht0r8R24fNDzZxAaeUwrjAtzjFmCs+YJRyM7OVTAkago1wGiw/CTL0mBZSZRPPdVphbF26xn2PlyZNWbbA0vK8WJxQRb3JPxmMs7fjcmZXKThLS7MHy2OfWnnNJzTgC3DktLqYF9Q== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(36840700001)(46966006)(36860700001)(55016002)(83380400001)(8936002)(7696005)(5660300002)(9686003)(47076005)(6862004)(4326008)(86362001)(6506007)(186003)(8676002)(26005)(7116003)(81166007)(356005)(70586007)(336012)(316002)(2906002)(33656002)(82310400003)(508600001)(52536014)(70206006); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Nov 2021 15:59:02.0079 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: d98084a1-c67a-4ae7-1ea6-08d9a9e32c9b X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT032.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR08MB5139 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Nov 2021 15:59:08 -0000 Hi Adhemerval,=0A= =0A= Here is an early version of a much faster hypot implementation. It uses fma= =0A= to significantly outperform the powerpc version both in throughput and=0A= latency. It has a worst-case ULP of ~0.949 and passes the testsuite. The=0A= powerpc version has a worst-case ULP of ~1.21 and several test failures.=0A= =0A= It applies on top of your hypot patch series. I didn't optimize the non-fma= =0A= case since modern targets have fma. It'll be interesting to compare it on= =0A= Power. You'll need to correctly set FAST_FMINMAX to indicate support for=0A= inlined fmin/fmax instructions (this will be added to math_private.h for=0A= targets that have it), without it the code tries to use a conditional move= =0A= since using a branch here is really bad for performance.=0A= =0A= Cheers,=0A= Wilco=0A= =0A= ---=0A= =0A= diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hy= pot.c=0A= index d20bc3e3657e350a1103a8f8477db35ee60399e0..3906711788cff5d66725e5879bb= 4d6c36dd24dc7 100644=0A= --- a/sysdeps/ieee754/dbl-64/e_hypot.c=0A= +++ b/sysdeps/ieee754/dbl-64/e_hypot.c=0A= @@ -37,73 +37,40 @@=0A= #include =0A= #include =0A= =0A= +#define FAST_FMINMAX 1=0A= +//#undef __FP_FAST_FMA=0A= +=0A= +#define SCALE 0x1p-600=0A= +#define LARGE_VAL 0x1p+511=0A= +#define TINY_VAL 0x1p-459=0A= +#define EPS 0x1p-54=0A= +=0A= +=0A= static inline double=0A= handle_errno (double r)=0A= {=0A= + r =3D math_narrow_eval (r);=0A= if (isinf (r))=0A= __set_errno (ERANGE);=0A= return r;=0A= }=0A= =0A= -/* sqrt (DBL_EPSILON / 2.0) */=0A= -#define SQRT_EPS_DIV_2 0x1.6a09e667f3bcdp-27=0A= -/* DBL_MIN / (sqrt (DBL_EPSILON / 2.0)) */=0A= -#define DBL_MIN_THRESHOLD 0x1.6a09e667f3bcdp-996=0A= -/* eps (double) * sqrt (DBL_MIN)) */=0A= -#define SCALE 0x1p-563=0A= -/* 1 / eps (sqrt (DBL_MIN) */=0A= -#define INV_SCALE 0x1p+563=0A= -/* sqrt (DBL_MAX) */=0A= -#define SQRT_DBL_MAX 0x1.6a09e667f3bccp+511=0A= -/* sqrt (DBL_MIN) */=0A= -#define SQRT_DBL_MIN 0x1p-511=0A= -=0A= -double=0A= -__hypot (double x, double y)=0A= +static inline double=0A= +kernel (double ax, double ay)=0A= {=0A= - if ((isinf (x) || isinf (y))=0A= - && !issignaling (x) && !issignaling (y))=0A= - return INFINITY;=0A= - if (isnan (x) || isnan (y))=0A= - return x + y;=0A= -=0A= - double ax =3D fabs (x);=0A= - double ay =3D fabs (y);=0A= - if (ay > ax)=0A= - {=0A= - double tmp =3D ax;=0A= - ax =3D ay;=0A= - ay =3D tmp;=0A= - }=0A= -=0A= - /* Widely varying operands. The DBL_MIN_THRESHOLD check is used to avoi= d=0A= - a spurious underflow from the multiplication. */=0A= - if (ax >=3D DBL_MIN_THRESHOLD && ay <=3D ax * SQRT_EPS_DIV_2)=0A= - return (ay =3D=3D 0.0)=0A= - ? ax=0A= - : handle_errno (math_narrow_eval (ax + DBL_TRUE_MIN));=0A= + double t1, t2;=0A= +#ifdef __FP_FAST_FMA=0A= + t1 =3D ay + ay;=0A= + t2 =3D ax - ay;=0A= =0A= - double scale =3D SCALE;=0A= - if (ax > SQRT_DBL_MAX)=0A= - {=0A= - ax *=3D scale;=0A= - ay *=3D scale;=0A= - scale =3D INV_SCALE;=0A= - }=0A= - else if (ay < SQRT_DBL_MIN)=0A= - {=0A= - ax /=3D scale;=0A= - ay /=3D scale;=0A= - }=0A= + if (t1 >=3D ax)=0A= + return sqrt (fma (t1, ax, t2 * t2));=0A= else=0A= - scale =3D 1.0;=0A= -=0A= + return sqrt (fma (ax, ax, ay * ay));=0A= +#else=0A= double h =3D sqrt (ax * ax + ay * ay);=0A= =0A= - double t1, t2;=0A= - if (h =3D=3D 0.0)=0A= - return h;=0A= - else if (h <=3D 2.0 * ay)=0A= + if (h <=3D 2.0 * ay)=0A= {=0A= double delta =3D h - ay;=0A= t1 =3D ax * (2.0 * delta - ax);=0A= @@ -112,14 +79,57 @@ __hypot (double x, double y)=0A= else=0A= {=0A= double delta =3D h - ax;=0A= - t1 =3D 2.0 * delta * (ax - 2 * ay);=0A= + t1 =3D 2.0 * delta * (ax - 2.0 * ay);=0A= t2 =3D (4.0 * delta - ay) * ay + delta * delta;=0A= }=0A= h -=3D (t1 + t2) / (2.0 * h);=0A= - h =3D math_narrow_eval (h * scale);=0A= - math_check_force_underflow_nonneg (h);=0A= - return handle_errno (h);=0A= + return h;=0A= +#endif=0A= +}=0A= +=0A= +=0A= +double=0A= +__hypot (double x, double y)=0A= +{=0A= + if (!isfinite (x) || !isfinite (y))=0A= + {=0A= + if ((isinf (x) || isinf (y))=0A= + && !issignaling_inline (x) && !issignaling_inline (y))=0A= + return INFINITY;=0A= + return x + y;=0A= + }=0A= +=0A= + x =3D fabs (x);=0A= + y =3D fabs (y);=0A= +=0A= + double ax =3D FAST_FMINMAX ? fmax (x, y) : (x < y ? y : x);=0A= + double ay =3D FAST_FMINMAX ? fmin (x, y) : (x < y ? x : y);=0A= +=0A= + if (__glibc_unlikely (ax > LARGE_VAL))=0A= + {=0A= + if (__glibc_unlikely (ay <=3D ax * EPS))=0A= + return handle_errno (ax + ay);=0A= +=0A= + return handle_errno (kernel (ax * SCALE, ay * SCALE) / SCALE);=0A= + }=0A= +=0A= + if (__glibc_unlikely (ay < TINY_VAL))=0A= + {=0A= + if (__glibc_unlikely (ax >=3D ay / EPS))=0A= + return math_narrow_eval (ax + ay);=0A= +=0A= + ax =3D math_narrow_eval (kernel (ax / SCALE, ay / SCALE) * SCALE);= =0A= + math_check_force_underflow_nonneg (ax);=0A= + return ax;=0A= + }=0A= +=0A= + /* Common case: ax is not huge and ay is not tiny. */=0A= + if (__glibc_unlikely (ay <=3D ax * EPS))=0A= + return math_narrow_eval (ax + ay);=0A= +=0A= + return math_narrow_eval (kernel (ax, ay));=0A= }=0A= +=0A= strong_alias (__hypot, __ieee754_hypot)=0A= libm_alias_finite (__ieee754_hypot, __hypot)=0A= #if LIBM_SVID_COMPAT=0A=