From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 63993 invoked by alias); 7 Sep 2017 23:53:04 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 63232 invoked by uid 89); 7 Sep 2017 23:53:04 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=H*u:6.3, H*UA:6.3, his, risk X-HELO: userp1040.oracle.com Subject: Re: [PATCH] improves exp() and expf() performance on Sparc. To: libc-alpha@sourceware.org References: <1504306749-46787-1-git-send-email-patrick.mcgehearty@oracle.com> <706fe477-8d85-47d9-d62c-164bba5606ec@oracle.com> From: Patrick McGehearty Message-ID: <9ec36391-8fca-bdfa-a7a9-4d715e62c568@oracle.com> Date: Thu, 07 Sep 2017 23:53:00 -0000 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-SW-Source: 2017-09/txt/msg00367.txt.bz2 On 9/7/2017 4:05 PM, Joseph Myers wrote: > On Thu, 7 Sep 2017, Patrick McGehearty wrote: > >> The sysdeps/ieee_754 subtree has a number of direct calls into >> ieee754_exp from such places as e_sinh, e_cosh, e_gamma_r, and s_erf. >> While I have not found direct calls to __exp in the ieee_754 subtree, >> I see overriding w_exp_compat.c as having some risk of >> unexpected behavior with the only perceived benefit to be eliminating >> a modest number of bytes from libm. > Those direct calls don't use the wrapper and so are completely irrelevant > to the matter of overriding it. > > It is quite clear that the wrapper needs to be overridden on any > architecture providing its own exp (as opposed to __ieee754_exp) > implementation, just as ia64 overrides it. > >> For expf, the comparison for individual values shows an improvement >> in the range of 15x. benchtests does not measure expf(). > Presumably you need to test with the benchmark addition Szabolcs points to > in his patch submission. > >> Making this change will provide a clear, immediate gain in expf() >> performance. > Maintainability is also important, and it points against having lots of > architecture-specific versions. Thus, people interested in expf > optimization should first be helping with the review of Szabolcs's patch > (and the benchtests addition patch it builds on). Once that's done, it > can provide a basis for judging the merits of architecture-specific expf > versions (which might well also indicate improvements to Szabolcs's code > as an alternative to adding an architecture-specific version). > > For exp, when you have a better-performing C version the question should > first be whether it can replace the existing generic C version (possibly > then being built multiple times on architectures where that's useful) > rather than whether to add it as architecture-specific code. Adding a C > version as architecture-specific code (rather than having limited > architecture-specific hooks in a generic version) should only be once > there is evidence of different architectures' performance characteristics > requiring substantially different approaches. > The sysdeps/ieee_754 subtree has a number of direct calls into ieee754_exp from such places as e_sinh, e_cosh, e_gamma_r, and s_erf. While I have not found direct calls to __exp in the ieee_754 subtree, I see overriding w_exp_compat.c as having some risk of unexpected behavior with the only perceived benefit to be eliminating a modest number of bytes from libm. As for exp performance, when I test isolated values, the factor of improvement between ieee754 and the new code on Sparc to be in the range of 8x to 14x. That's not considering cases which trigger slowexp(). Comparing the "make bench" benchtests/bench.out for exp():      ieee754    new max:  17630     174 min:    399      26 mean:  5320      67 When the differences are this large and the new max is faster than the old min, I don't see a need in doing further performance testing. Moving on to expf, the comparison for individual values shows an improvement in the range of 15x. benchtests does not measure expf(). Making this change will provide a clear, immediate gain in expf() performance. The Szabolcs code appears to provide similar benefits.  There were some discussion of accuracy and of possible changes to the algorithm, perhaps by using a larger table. The Sparc code uses a larger table and thus may be more accurate for some ulp sensitive values. Or it may be a non-issue since both algorithms are using double precision for computation. Wilco Dijkstra compared the new Sparc code to Szabolcs code on aarch64 and found Szabolcs code to be 10% faster on aarch64.  That result is close enough to justify testing on Sparc. In addition to a performance comparison, we'd want to compare accuracy to see if there are notable differences.