From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 109113 invoked by alias); 21 Mar 2018 17:50:13 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 108621 invoked by uid 89); 21 Mar 2018 17:50:12 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.3 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,KAM_LOTSOFHASH,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=million, ldouble, HAccept-Language:en-GB, average X-HELO: EUR01-VE1-obe.outbound.protection.outlook.com From: Wilco Dijkstra To: "libc-alpha@sourceware.org" CC: nd Subject: [PATCH 1/7] sin/cos slow paths: avoid slow paths for small inputs Date: Wed, 21 Mar 2018 17:50:00 -0000 Message-ID: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;DB6PR0801MB2038;6:4YNlT1zGjsCosHX4Rja1OZCPLhh0ll+PXR/x8vFExAm/7+eTzqzqhj5VJg/k6SvZYyXpk1b8olVyyINJmaSbSj5R3ZSRLN3OhDXsDus5hmJ8vaLsC+vLxWnfa4niqWbbsMaVZXBFyyrKzOAFI0uZePJ3IKBzwqWBhBBPxuQFsUk3l6WsOc9U2YCQ3BRWTGPU1MT+een0nIzltdlJPDm5vKBbZXCXORej6yyBPFyRMBom3YXvBNH1sq7FsNcUq1MBbQTcKJSDaLmbe5ssPXhkDbAzQ0tYKTztTIKp+OCJBovS3mFonAAk8qAB791KUImHmmco+XQ+QHhu9y8ZsKpPzIUP2Ni/U07VtZhbWq/XGTSPLqjjTMGPKz7u/fX/8nGE;5:q2acbPSjELpVn1nAPGb1VnN135C8fLPEdierq0lJTundJXWRU+/QVbCgpIyOnZmxTTzbwyQLCgtP8Z9/Ms9a4/Xvf/9LI9GxlgBdr4Q7WqjFpq/x87G4jO167Ngq/Xy11iJcTjtGJoEEEyJRRx8hK6KXYmPRJKF+AbVHWkuMdwU=;24:u0fw4oDf8Y5K9Cn0Q4sjCdvv0zH4ujhp/kh8TlVuj6QRMD2Qm4JZl/bnVGiNp1lsww6THdvEG/pFd5US2Q9X49qDfMsrlKR9mfeHbxjXkDs=;7:iz3bceTX7lwOeOs43akEpi2J2Q9L/0URaJyDvcWW+g6vaIAF8dv6U24vduNu3vtY2+Ngxjiczfaw3yluvifUfW8nFAKg+QECk2ETJACCYAFZvr/ldWcbL3vVQHB9cCIgTzv2TUpOWGVbpOZQMBcoviryWN1zi6zJl/3q4MmHMTX2JO769AOelRmmdQrbVtR5oOH7sh0qr2K/5kpTYkUo8n8+Wyqg/Ly7XYh70nK0KiG8Re1AaTej85XBVqBadNbu x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: 81e2c0e7-2d1c-42af-7907-08d58f542f02 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(7020095)(4652020)(48565401081)(5600026)(4604075)(3008032)(2017052603328)(7153060)(7193020);SRVR:DB6PR0801MB2038; x-ms-traffictypediagnostic: DB6PR0801MB2038: nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(10201501046)(3002001)(3231221)(944501325)(52105095)(93006095)(93001095)(6055026)(6041310)(20161123562045)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123564045)(20161123558120)(6072148)(201708071742011);SRVR:DB6PR0801MB2038;BCL:0;PCL:0;RULEID:;SRVR:DB6PR0801MB2038; x-forefront-prvs: 0618E4E7E1 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(366004)(39380400002)(39860400002)(396003)(346002)(376002)(54534003)(377424004)(189003)(199004)(102836004)(478600001)(2351001)(3660700001)(68736007)(316002)(59450400001)(6506007)(6916009)(105586002)(97736004)(8936002)(72206003)(7696005)(33656002)(99286004)(14454004)(81166006)(81156014)(8676002)(26005)(575784001)(86362001)(3846002)(6116002)(25786009)(74316002)(7736002)(305945005)(53936002)(5660300001)(4326008)(9686003)(106356001)(5250100002)(6436002)(2906002)(2900100001)(5640700003)(3280700002)(66066001)(2501003)(55016002);DIR:OUT;SFP:1101;SCL:1;SRVR:DB6PR0801MB2038;H:DB6PR0801MB2053.eurprd08.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: adMgqXthQN4fHn4qfJ4qIFpF/KziLgwtseDev8teoAzp8v73avyFNxNV7IFyrVhoNVrY8yD9J5C+ws68Jgw1MaB98VOvlAAo4+X5eCCBzDKce7kv66QOYk3EBYg1ZzbuZROXjksA4mZg10qLCWjjeWCvJUQ9pVEdZhE4wKFJaOlZixdpc58TavFP9hjai+wcFUSd9beUKIfz5APTNsf8s0zouIARe78AinaZNJGWqSOAuFDF+9RkiiiD1riXilI6FHfesAKOynTll4z72Xvf6srllayhBZ4iOEn6J63ZlyDr7HGVjlkINF6M0rvTtw0Kc0VTymNnuL8U8hD05udhKw== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: 81e2c0e7-2d1c-42af-7907-08d58f542f02 X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Mar 2018 17:50:07.0884 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0801MB2038 X-SW-Source: 2018-03/txt/msg00504.txt.bz2 This series of patches removes the slow patchs from sin, cos and sincos. Besides greatly simplifying the implementation, the new version is also much faster for inputs up to PI (41% faster) and for large inputs needing range reduction (27% faster). ULP is ~0.55 with no errors found after testing 1.6 billion inputs across m= ost of the range with mpsin and mpcos. The number of incorrectly rounded resul= ts (ie. ULP >0.5) is at most ~2750 per million inputs between 0.125 and 0.5, the average is ~850 per million between 0 and PI. Tested on AArch64 and x86_64 with no regressions. The first patch removes the slow paths for the cases where the input is sma= ll and doesn't require range reduction. Update ULP tables for sin, cos and si= ncos on AArch64 and x86_64. ChangeLog: 2018-03-20 Wilco Dijkstra * sysdeps/aarch64/libm-test-ulps: Update ULP for sin, cos, sincos. * sysdeps/ieee754/dbl-64/s_sin.c (__sin): Remove slow paths for small inpu= ts. (__cos): Likewise. * sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sin, cos, sincos. -- diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps index 1f469803be59bb4813370d95c6d091de901e6129..be06085154db24c8fd6cf1bce41= 7028a959aaa27 100644 --- a/sysdeps/aarch64/libm-test-ulps +++ b/sysdeps/aarch64/libm-test-ulps @@ -1012,7 +1012,9 @@ ildouble: 2 ldouble: 2 =20 Function: "cos": +double: 1 float: 1 +idouble: 1 ifloat: 1 ildouble: 1 ldouble: 1 @@ -1970,7 +1972,9 @@ ildouble: 2 ldouble: 2 =20 Function: "sin": +double: 1 float: 1 +idouble: 1 ifloat: 1 ildouble: 1 ldouble: 1 @@ -2000,7 +2004,9 @@ ildouble: 3 ldouble: 3 =20 Function: "sincos": +double: 1 float: 1 +idouble: 1 ifloat: 1 ildouble: 1 ldouble: 1 diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c index 8c589cbd4ab7451a5889e9a474bf4bd36c49d498..0c16b728df127ad54039da3eec3= 76e5f1fe4c852 100644 --- a/sysdeps/ieee754/dbl-64/s_sin.c +++ b/sysdeps/ieee754/dbl-64/s_sin.c @@ -448,7 +448,7 @@ SECTION #endif __sin (double x) { - double xx, res, t, cor; + double xx, t, cor; mynumber u; int4 k, m; double retval =3D 0; @@ -471,26 +471,22 @@ __sin (double x) xx =3D x * x; /* Taylor series. */ t =3D POLYNOMIAL (xx) * (xx * x); - res =3D x + t; - cor =3D (x - res) + t; - retval =3D (res =3D=3D res + 1.07 * cor) ? res : slow (x); + /* Max ULP of x + t is 0.535. */ + retval =3D x + t; } /* else if (k < 0x3fd00000) */ /*---------------------------- 0.25<|x|< 0.855469---------------------- */ else if (k < 0x3feb6000) { - res =3D do_sin (x, 0, &cor); - retval =3D (res =3D=3D res + 1.096 * cor) ? res : slow1 (x); - retval =3D __copysign (retval, x); + /* Max ULP is 0.548. */ + retval =3D __copysign (do_sin (x, 0, &cor), x); } /* else if (k < 0x3feb6000) */ =20 /*----------------------- 0.855469 <|x|<2.426265 ----------------------*/ else if (k < 0x400368fd) { - t =3D hp0 - fabs (x); - res =3D do_cos (t, hp1, &cor); - retval =3D (res =3D=3D res + 1.020 * cor) ? res : slow2 (x); - retval =3D __copysign (retval, x); + /* Max ULP is 0.51. */ + retval =3D __copysign (do_cos (t, hp1, &cor), x); } /* else if (k < 0x400368fd) */ =20 #ifndef IN_SINCOS @@ -541,7 +537,7 @@ SECTION #endif __cos (double x) { - double y, xx, res, cor, a, da; + double y, xx, cor, a, da; mynumber u; int4 k, m; =20 @@ -561,8 +557,8 @@ __cos (double x) =20 else if (k < 0x3feb6000) { /* 2^-27 < |x| < 0.855469 */ - res =3D do_cos (x, 0, &cor); - retval =3D (res =3D=3D res + 1.020 * cor) ? res : cslow2 (x); + /* Max ULP is 0.51. */ + retval =3D do_cos (x, 0, &cor); } /* else if (k < 0x3feb6000) */ =20 else if (k < 0x400368fd) @@ -571,20 +567,12 @@ __cos (double x) a =3D y + hp1; da =3D (y - a) + hp1; xx =3D a * a; + /* Max ULP is 0.501 if xx < 0.01588 or 0.518 otherwise. + Range reduction uses 106 bits here which is sufficient. */ if (xx < 0.01588) - { - res =3D TAYLOR_SIN (xx, a, da, cor); - cor =3D 1.02 * cor + __copysign (1.0e-31, cor); - retval =3D (res =3D=3D res + cor) ? res : sloww (a, da, x, true); - } + retval =3D TAYLOR_SIN (xx, a, da, cor); else - { - res =3D do_sin (a, da, &cor); - cor =3D 1.035 * cor + __copysign (1.0e-31, cor); - retval =3D ((res =3D=3D res + cor) ? __copysign (res, a) - : sloww1 (a, da, x, true)); - } - + retval =3D __copysign (do_sin (a, da, &cor), a); } /* else if (k < 0x400368fd) */ =20 =20 diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-te= st-ulps index 48e53f7ef2cf814d71d5d0c9f2bb907f594aa7ef..bbb8a4d0754dbe6665682cd8a7f= 51f7319a14014 100644 --- a/sysdeps/x86_64/fpu/libm-test-ulps +++ b/sysdeps/x86_64/fpu/libm-test-ulps @@ -1262,7 +1262,9 @@ ildouble: 1 ldouble: 1 =20 Function: "cos": +double: 1 float128: 1 +idouble: 1 ifloat128: 1 ildouble: 1 ldouble: 1 @@ -2528,7 +2530,9 @@ Function: "pow_vlen8_avx2": float: 3 =20 Function: "sin": +double: 1 float128: 1 +idouble: 1 ifloat128: 1 ildouble: 1 ldouble: 1 @@ -2578,7 +2582,9 @@ Function: "sin_vlen8_avx2": float: 1 =20 Function: "sincos": +double: 1 float128: 1 +idouble: 1 ifloat128: 1 ildouble: 1 ldouble: 1