From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 52646 invoked by alias); 18 Oct 2017 17:22:15 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 52635 invoked by uid 89); 18 Oct 2017 17:22:14 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.2 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LOTSOFHASH,RCVD_IN_DNSWL_NONE,SPF_PASS,URIBL_RED autolearn=ham version=3.3.2 spammy= X-HELO: relay1.mentorg.com Date: Wed, 18 Oct 2017 17:22:00 -0000 From: Joseph Myers To: Patrick McGehearty CC: Subject: Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86. In-Reply-To: <1508172962-97543-1-git-send-email-patrick.mcgehearty@oracle.com> Message-ID: References: <1508172962-97543-1-git-send-email-patrick.mcgehearty@oracle.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) X-SW-Source: 2017-10/txt/msg00854.txt.bz2 On Mon, 16 Oct 2017, Patrick McGehearty wrote: > diff --git a/sysdeps/ieee754/dbl-64/e_exp.c b/sysdeps/ieee754/dbl-64/e_exp.c > index 6757a14..555a47f 100644 > --- a/sysdeps/ieee754/dbl-64/e_exp.c > +++ b/sysdeps/ieee754/dbl-64/e_exp.c > @@ -1,238 +1,452 @@ > +/* EXP function - Compute double precision exponential > + Copyright (C) 2017 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > /* > - * IBM Accurate Mathematical Library > - * written by International Business Machines Corp. > - * Copyright (C) 2001-2017 Free Software Foundation, Inc. The file still contains __exp1, used by pow. Thus, I do not think removing the existing copyright dates is appropriate unless as a preliminary patch you move __exp1 to another file (e_pow.c being the obvious one; watch out for the sysdeps/x86_64/fpu/multiarch/e_exp-*.c with their own macro definitions of __exp1 that would no longer be appropriate after such a move). Does this patch remove all calls to __slowexp? If so, I'd expect slowexp.c to be removed as part of the patch (including architecture-specific / multiarch variants, and Makefile references to special options etc. for building slowexp.c). > +extern double __ieee754_exp (double); > + > + > +static const double TBL[] = { > + 1.00000000000000000000e+00, 0.00000000000000000000e+00, > + 1.02189714865411662714e+00, 5.10922502897344389359e-17, > + 1.04427378242741375480e+00, 8.55188970553796365958e-17, As per previous comments, this sort of table of precomputed values should use hex float constants. You can use before-and-after comparison of object files to make sure the hex floats do represent the same values as the decimal constants. > +static const double > + half =0.5, > +/* Following three values used to scale x to primary range */ > + invln2_32 =4.61662413084468283841e+01, /* 0x40471547, 0x652b82fe */ > + ln2_32hi =2.16608493865351192653e-02, /* 0x3f962e42, 0xfee00000 */ > + ln2_32lo =5.96317165397058656257e-12, /* 0x3d9a39ef, 0x35793c76 */ > +/* t2-t5 terms used for polynomial computation */ > + t2 =1.6666666666526086527e-1, /* 3fc5555555548f7c */ > + t3 =4.1666666666226079285e-2, /* 3fa5555555545d4e */ > + t4 =8.3333679843421958056e-3, /* 3f811115b7aa905e */ > + t5 =1.3888949086377719040e-3, /* 3f56c1728d739765 */ > + one =1.0, > +/* maximum value for x to not overflow */ > + threshold1 =7.09782712893383973096e+02, /* 0x40862E42, 0xFEFA39EF */ > +/* maximum value for -x to not underflow */ > + threshold2 =7.45133219101941108420e+02, /* 0x40874910, 0xD52D3051 */ > +/* scaling factor used when result near zero*/ > + twom54 =5.55111512312578270212e-17, /* 0x3c900000, 0x00000000 */ > +/* value used to force inexact condition */ > + small =1.0e-100; Likewise hex floats for these values (other than half / one / small). Also, the formatting is far from GNU standard; I'd expect e.g.: /* Scaling factor used when result near zero. */ static const double twom54 = 0x1p-54; (single space either side of =, comment before each variable definition starting with a capital letter and ending with ". "). > + /* > + Use FE_TONEAREST rounding mode for computing yy.y > + Avoid set/reset of rounding mode if already in FE_TONEAREST mode > + */ > + fe_val = __fegetround(); > + if (fe_val == FE_TONEAREST) { > + t = xx.x * xx.x; > + yy.y = xx.x + (t * (half + xx.x * t2) + > + (t * t) * (t3 + xx.x * t4 + t * t5)); > + retval = one + yy.y; > + } else { > + __fesetround(FE_TONEAREST); > + t = xx.x * xx.x; > + yy.y = xx.x + (t * (half + xx.x * t2) + > + (t * t) * (t3 + xx.x * t4 + t * t5)); > + retval = one + yy.y; > + __fesetround(fe_val); The formatting here is off, missing space before '(' in function calls. And you should be using the optimized SET_RESTORE_ROUND (FE_TONEAREST), which in addition to avoiding setting the rounding mode to a value it already has, also e.g. arranges on x86_64 to set only the SSE rounding mode and not the x87 mode because only the SSE mode needs setting for types other than long double. (I'm not clear why you actually need to set the rounding mode here. Does excessive inaccuracy result in this particular case if you don't set it?) > + fe_val = __fegetround(); Likewise again, and subsequently. > + if (fe_val == FE_TONEAREST) { > + z = xx.x - TBL2[j]; > + t = z * z; > + /* the "small" term below guarantees inexact will be raised */ Guaranteeing "inexact" is not part of the goals for most libm functions, so I expect you can remove that term. > + if (-xx.x > threshold2) > + { /* set underflow error condition */ > + double force_underflow = tiny * tiny; > + math_force_eval (force_underflow); > + retval = zero; > + return retval; I'd expect the force_underflow value to be returned in this case (so the return value is appropriate in round-upward mode). -- Joseph S. Myers joseph@codesourcery.com