From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 112874 invoked by alias); 7 Sep 2017 20:42:40 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 111869 invoked by uid 89); 7 Sep 2017 20:42:40 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.4 required=5.0 tests=BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_SORBS_SPAM,RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=H*u:6.3, H*UA:6.3 X-HELO: aserp1040.oracle.com Subject: Re: [PATCH] improves exp() and expf() performance on Sparc. To: Joseph Myers Cc: libc-alpha@sourceware.org References: <1504306749-46787-1-git-send-email-patrick.mcgehearty@oracle.com> From: Patrick McGehearty Message-ID: <706fe477-8d85-47d9-d62c-164bba5606ec@oracle.com> Date: Thu, 07 Sep 2017 20:42:00 -0000 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-SW-Source: 2017-09/txt/msg00311.txt.bz2 On 9/6/2017 4:01 PM, Joseph Myers wrote: > On Wed, 6 Sep 2017, Patrick McGehearty wrote: > >> The sysdeps/ieee754/dbl-64/w_exp_compat.c >> declares __exp (double x) >> and then adds: >> hidden_def (__exp) >> weak_alias (__exp, exp) >> >> I believe the weak_alias in w_exp_compat.c is overriden by the >> sparc_libm_ifunc in e_exp-generic.c.  At least, I am not seeing any >> link time errors about double exp declarations and I am seeing the new >> code being executed (as proved by the speed and accuracy changes). > Then you should avoid any object code from w_exp_compat.c being linked > into libm.so at all, by overriding it with a dummy file, rather than just > letting certain symbols be overridden at link time. > >> As for error handling, I believe the extra level of indirection on >> return from exp provided by the sysdeps/ieee754/dbl-64/w_exp_compat.c >> routine is an anti-performance design. Every normal return from e_exp > It's fairly clearly a design optimized for consistency of error handling > in the presence of several architecture-specific implementations of the > main function, without needing to e.g. deal with TLS in assembly code for > accessing errno or make multiple implementations handle matherr the same > way. When you avoid architecture-specific implementations (especially .S > ones) as far as possible, integrated error handling is more practical, > especially if you also use new symbol versions to avoid needing to deal > with matherr. > > For expf performance obviously needs to be compared with Szabolcs's > implementation (compiled with whatever options and configured > appropriately regarding conversions to integer etc. to be optimal for > SPARC). For exp, I'm inclined to say performance should be compared with > the existing exp *with the slow paths calling __slowexp removed along with > the associated checks for whether to use those slow paths* since those > slow paths are completely unnecessary. > The sysdeps/ieee_754 subtree has a number of direct calls into ieee754_exp from such places as e_sinh, e_cosh, e_gamma_r, and s_erf. While I have not found direct calls to __exp in the ieee_754 subtree, I see overriding w_exp_compat.c as having some risk of unexpected behavior with the only perceived benefit to be eliminating a modest number of bytes from libm. For exp, when I test isolated values, the factor of improvement between ieee754 and the new code on Sparc to be in the range of 8x to 14x. That's not considering cases which trigger slowexp(). Comparing the "make bench" benchtests/bench.out for exp():      ieee754    new max:  17630     174 min:    399      26 mean:  5320      67 When the differences are this large and the new max is faster than the old min, I don't see a need in doing further performance testing. For expf, the comparison for individual values shows an improvement in the range of 15x. benchtests does not measure expf(). Making this change will provide a clear, immediate gain in expf() performance. Is the Szabolcs code in its final form?  There were some discussion of accuracy and of possible changes to the algorithm, perhaps using a larger table. The Sparc code uses a larger table and thus may be more accurate for some ulp sensitive values. Or it may be a non-issue since both algorithms are using double precision for computation. Wilco Dijkstra compared the new Sparc code to Szabolcs code on aarch64 and found Szabolcs code to be 10% faster on aarch64. That advantage may or may not be reversed on Sparc, but it is close enough to justify testing. In addition to a performance comparison, we'd want to do an accuracy comparison to see what differences we might be accepting.