From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sourceware.org (Postfix) with ESMTPS id 3ABA83858402 for ; Mon, 17 Jan 2022 07:46:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3ABA83858402 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=inria.fr Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=inria.fr DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=inria.fr; s=dc; h=date:message-id:from:to:subject; bh=gGHVzSz4bDEIH9v+x2XVjLL79GSPdCdA0k8ROHxRXRQ=; b=jSVB6mD3OONhxnX2iePs4EqgNn4K7GfIxUkBu3WjWiomxE1jFe1++2gK eeiAN/Cl1bukdrJwOR0oVCnI7xkxoDsShQnT6lhZEtomA3zLnwAY5ws5z pyZAVA1BN4zz0Ho+AVfTLIWglf4wjK3kLZrC+pmdvPUn39Rqr4G8ZM2L+ 8=; X-IronPort-AV: E=Sophos;i="5.88,295,1635199200"; d="scan'208";a="3185192" Received: from tomate.loria.fr (HELO tomate) ([152.81.10.51]) by mail3-relais-sop.national.inria.fr with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2022 08:46:40 +0100 Date: Mon, 17 Jan 2022 08:46:38 +0100 Message-Id: From: Paul Zimmermann To: libc-alpha@sourceware.org Subject: review of libmvec's accuracy X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Jan 2022 07:46:42 -0000 Hi, during the last week I did a review of libmvec's accuracy after the recent changes, both for binary32 and binary64, and for with sse4.2, avx2 and avx512. Thanks to H.J. and Sunil Pandey for helping me to build libmvec and compile my program with libmvec. Apart from an issue with the binary64 atan2 function with sse4.2 which was rapidly fixed (BZ #28765), I found no error of more than 4 ulps. For univariate binary32 functions, an exhaustive test was performed. For binary64 functions, here are the largest errors I found (with corresponding inputs): SSE 4.2: acos 0 -1 0x1.35a0de2b038fep-1 [2] [2.31] 2.30611 2.306107962898911 acosh 0 -1 0x1.001ffe4f6d239p+0 [3] [3.32] 3.31361 3.313607079224876 asin 0 -1 0x1.000c80481e7f9p-1 [3] [3.49] 3.48486 3.484857846054733 asinh 0 -1 -0x1.fff13ade9918p-12 [4] [3.59] 3.58616 3.586157827853675 atan 0 -1 0x1.000a5e848c0dp-3 [3] [2.65] 2.6434 2.643390142606502 atanh 0 -1 -0x1.fff0caf4c48dp-13 [4] [3.59] 3.58751 3.587505773899075 cbrt 0 -1 -0x0.bd54cbc41f0b9p-1022 [3] [3.43] 3.42039 3.4203899319518 cos 0 -1 0x1.852e715836081p+18 [4] [3.85] 3.84216 3.84215369681995 cosh 0 -1 -0x1.633c654fee2bap+9 [2] [1.93] 1.92222 1.922214006544865 erf 0 -1 0x1.0000b7af4dcdp-8 [3] [2.55] 2.54483 2.544825896765913 erfc 0 -1 0x1.78afff6f3044cp+4 [2] [2.22] 2.21867 2.218661454888216 exp 0 -1 -0x1.205968aae119fp-8 [3] [3.21] 3.20379 3.20378601523757 exp10 0 -1 0x1.33f4082f47b74p+8 [2] [2.01] 2.00015 2.000141917186939 exp2 0 -1 -0x1.4c31866f6d3bbp-6 [2] [1.65] 1.64293 1.642923112885358 expm1 0 -1 0x1.86f57e8de4a5p-9 [3] [2.97] 2.96513 2.96512822155441 log 0 -1 0x1.00e000c7fa1c3p+0 [2] [1.59] 1.58569 1.585688691408811 log10 0 -1 0x1.00201204555c8p+0 [2] [2.10] 2.09444 2.094430783262944 log1p 0 -1 0x1.000bcdec306p-11 [3] [2.59] 2.58927 2.589263240103137 log2 0 -1 0x1.002000d8e91c5p+0 [2] [2.09] 2.08932 2.089318403685938 sin 0 -1 0x1.a5a68e24971a3p+20 [4] [3.84] 3.83437 3.834368052489666 sinh 0 -1 -0x1.c5c9440e9422dp-9 [2] [2.40] 2.39492 2.394913144733352 tan 0 -1 0x1.72e90b4651593p+15 [4] [3.97] 3.96274 3.962739804605088 tanh 0 -1 -0x1.000a02c5a8b47p-2 [2] [2.18] 2.17016 2.170154720479918 atan2 0 -1 0x1.abe93a1719613p-948,0x1.aab7bbb5ca811p-948 [2.48] 2.47316 2.473151123402774 hypot 0 -1 0x1.205e5d5fee071p-9,0x1.a71193d4eb838p-9 [2.67] 2.6658 2.665792374792187 pow 0 -1 0x1.e174ee53813b7p+859,-0x1.d929d0607bf52p-12 [1.01] 1.00093 1.000926794407638 AVX2: acos 0 -1 0x1.ffc00159839aep-1 [2] [2.06] 2.05784 2.057839168511332 acosh 0 -1 0x1.007ff5e6aae25p+0 [3] [3.29] 3.28513 3.285121178034015 asin 0 -1 0x1.000fb59dbb7ffp-1 [3] [2.96] 2.95635 2.956349972684072 asinh 0 -1 0x1.fffdfee9d0656p-12 [4] [3.59] 3.58591 3.585908502909785 atan 0 -1 0x1.0029e0e2db7dp-3 [3] [2.65] 2.64176 2.641757760180734 atanh 0 -1 -0x1.ffe2abaa5690dp-13 [4] [3.59] 3.58538 3.585375859489622 cbrt 0 -1 0x0.bdf2e4b035cc5p-1022 [3] [3.41] 3.40348 3.403477053110753 cos 0 -1 -0x1.f5ec1ef4d1fb8p+3 [4] [3.67] 3.66518 3.665174088332274 cosh 0 -1 -0x1.633c654fee2bap+9 [2] [1.93] 1.92222 1.922214006544865 erf 0 -1 0x1.00005abf94234p-8 [3] [2.55] 2.54487 2.544864849771448 erfc 0 -1 0x1.78affead86a26p+4 [2] [2.21] 2.20423 2.204221099573643 exp 0 -1 -0x1.2059763f8882bp-8 [3] [3.21] 3.20362 3.203617681564174 exp10 0 -1 0x1.33f4082f47b74p+8 [2] [2.01] 2.00015 2.000141917186939 exp2 0 -1 -0x1.4c3c931a5de98p-6 [2] [1.65] 1.64097 1.640964569447761 expm1 0 -1 0x1.856b41d86994cp-9 [3] [2.97] 2.96463 2.964620318450374 log 0 -1 0x1.002001ffaa4ap+0 [2] [1.59] 1.58883 1.588820663143568 log10 0 -1 0x1.001fffbd3f495p+0 [2] [2.10] 2.0948 2.094790260038313 log1p 0 -1 0x1.fff86f9b9acp-12 [3] [2.59] 2.58923 2.589227792166073 log2 0 -1 0x1.002003e5a80e3p+0 [2] [2.09] 2.08921 2.089203751745159 sin 0 -1 0x1.9977bea4253f1p+0 [3] [3.49] 3.48842 3.488412468469455 sinh 0 -1 -0x1.633c654fee2bap+9 [2] [1.93] 1.92222 1.922214006544865 tan 0 -1 0x1.3fab696843fbfp+8 [4] [3.54] 3.53263 3.532620877461187 tanh 0 -1 -0x1.00010c3967f16p-2 [2] [2.14] 2.13884 2.138831342496609 atan2 0 -1 0x1.a83f842ef3f73p-633,0x1.a799d8a6677ep-633 [3.47] 3.46942 3.469416124628504 hypot 0 -1 0x1.6d080c1f5339ep+25,0x1.149ee0ad66632p+13 [1.39] 1.38804 1.388031156080099 pow 0 -1 0x1.bb393b102aa6p+246,-0x1.9c23caed44f1fp-10 [1.00] 0.99995 0.9999496063741427 AVX512: acos 0 -1 0x1.35b9bac9f42c6p-1 [2] [1.83] 1.82629 1.826281376899049 acosh 0 -1 0x1.0007ffe4f42f8p+0 [2] [1.99] 1.98412 1.984110568276137 asin 0 -1 -0x1.0312655c1d169p-1 [3] [2.70] 2.69006 2.690052531147541 asinh 0 -1 -0x1.fff14d29165f4p-8 [2] [1.53] 1.52036 1.520353639540609 atan 0 -1 -0x1.0010aea41501p-3 [3] [2.65] 2.64024 2.640233443317396 atanh 0 -1 0x1.85cb7cc1e1318p-6 [2] [1.52] 1.51047 1.510460405768978 cbrt 0 -1 0x1.477fc84889eabp-511 [2] [1.84] 1.83354 1.833539371973596 cos 0 -1 -0x1.9a4f79002782p-6 [4] [3.66] 3.65239 3.652382154396661 cosh 0 -1 -0x1.2b3088f4a6e98p+4 [2] [2.03] 2.02425 2.024245773256978 erf 0 -1 0x1.00001d2920fb7p-8 [3] [2.55] 2.54487 2.54486123201942 erfc 0 -1 0x1.78afff9d452cp+4 [2] [2.21] 2.20537 2.205360962115159 exp 0 -1 -0x1.205968a73d4abp-8 [3] [3.21] 3.20361 3.203606347080522 exp10 0 -1 0x1.33f4082f47b74p+8 [2] [2.01] 2.00015 2.000141917186939 exp2 0 -1 -0x1.8000e569a5545p-3 [1] [1.06] 1.05024 1.050231467186485 expm1 0 -1 0x1.c3b7c858f0575p-6 [2] [2.12] 2.11697 2.1169642246763 log 0 -1 0x1.001f01ac83b3p+0 [2] [1.60] 1.59111 1.59110562523555 log10 0 -1 0x1.f03ebdaea826bp-1 [2] [1.96] 1.95902 1.959012325388189 log1p 0 -1 0x1.075745181aabp-6 [2] [1.95] 1.94684 1.94683948265057 log2 0 -1 0x1.ede4ac763282bp-1 [2] [1.87] 1.86313 1.863121719090275 sin 0 -1 -0x1.99631ed67b43fp+0 [3] [3.49] 3.48873 3.488727287224188 sinh 0 -1 -0x1.633c654fee2bap+9 [2] [1.93] 1.92222 1.922214006544865 tan 0 -1 -0x1.780c9aeca907cp+17 [4] [3.99] 3.98992 3.989911851716534 tanh 0 -1 -0x1.001bf41f56582p-1 [1] [1.20] 1.19944 1.199437551242041 atan2 0 -1 0x1.499c920038ab4p+559,0x1.4939bd8e01601p+559 [3.42] 3.41468 3.414672714924121 hypot 0 -1 -0x1.72b48b14296a7p-510,-0x1.3dcd53d99b107p-518 [1.51] 1.50331 1.503309592574038 pow 0 -1 0x1.65f5c9d0c7bc9p-828,0x1.eba10d43b8f54p-12 [1.00] 0.999925 0.9999244722109198 As a side note, I found no mention of the libmvec accurary in the reference manual (libc.pdf). Best regards, Paul