From: Patrick McGehearty <patrick.mcgehearty@oracle.com>
To: libc-alpha@sourceware.org
Subject: Re: [PATCH] improves exp() and expf() performance on Sparc.
Date: Thu, 07 Sep 2017 23:53:00 -0000 [thread overview]
Message-ID: <9ec36391-8fca-bdfa-a7a9-4d715e62c568@oracle.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1709072053000.32296@digraph.polyomino.org.uk>
On 9/7/2017 4:05 PM, Joseph Myers wrote:
> On Thu, 7 Sep 2017, Patrick McGehearty wrote:
>
>> The sysdeps/ieee_754 subtree has a number of direct calls into
>> ieee754_exp from such places as e_sinh, e_cosh, e_gamma_r, and s_erf.
>> While I have not found direct calls to __exp in the ieee_754 subtree,
>> I see overriding w_exp_compat.c as having some risk of
>> unexpected behavior with the only perceived benefit to be eliminating
>> a modest number of bytes from libm.
> Those direct calls don't use the wrapper and so are completely irrelevant
> to the matter of overriding it.
>
> It is quite clear that the wrapper needs to be overridden on any
> architecture providing its own exp (as opposed to __ieee754_exp)
> implementation, just as ia64 overrides it.
>
>> For expf, the comparison for individual values shows an improvement
>> in the range of 15x. benchtests does not measure expf().
> Presumably you need to test with the benchmark addition Szabolcs points to
> in his patch submission.
>
>> Making this change will provide a clear, immediate gain in expf()
>> performance.
> Maintainability is also important, and it points against having lots of
> architecture-specific versions. Thus, people interested in expf
> optimization should first be helping with the review of Szabolcs's patch
> (and the benchtests addition patch it builds on). Once that's done, it
> can provide a basis for judging the merits of architecture-specific expf
> versions (which might well also indicate improvements to Szabolcs's code
> as an alternative to adding an architecture-specific version).
>
> For exp, when you have a better-performing C version the question should
> first be whether it can replace the existing generic C version (possibly
> then being built multiple times on architectures where that's useful)
> rather than whether to add it as architecture-specific code. Adding a C
> version as architecture-specific code (rather than having limited
> architecture-specific hooks in a generic version) should only be once
> there is evidence of different architectures' performance characteristics
> requiring substantially different approaches.
>
The sysdeps/ieee_754 subtree has a number of direct calls into
ieee754_exp from such places as e_sinh, e_cosh, e_gamma_r, and s_erf.
While I have not found direct calls to __exp in the ieee_754 subtree,
I see overriding w_exp_compat.c as having some risk of
unexpected behavior with the only perceived benefit to be eliminating
a modest number of bytes from libm.
As for exp performance, when I test isolated values, the factor of
improvement between ieee754 and the new code on Sparc to be in the
range of 8x to 14x. That's not considering cases which trigger
slowexp().
Comparing the "make bench" benchtests/bench.out for exp():
    ieee754   new
max:Â 17630Â Â Â Â 174
min:Â Â Â 399Â Â Â Â Â 26
mean:Â 5320Â Â Â Â Â 67
When the differences are this large and the new max is faster than the
old min, I don't see a need in doing further performance testing.
Moving on to expf, the comparison for individual values shows an
improvement in the range of 15x. benchtests does not measure expf().
Making this change will provide a clear, immediate gain in expf()
performance.
The Szabolcs code appears to provide similar benefits. There were
some discussion of accuracy and of possible changes to the algorithm,
perhaps by using a larger table. The Sparc code uses a larger table and
thus may be more accurate for some ulp sensitive values. Or it may be
a non-issue since both algorithms are using double precision for
computation.
Wilco Dijkstra compared the new Sparc code to Szabolcs code on aarch64
and found Szabolcs code to be 10% faster on aarch64. That result is
close enough to justify testing on Sparc. In addition to a performance
comparison, we'd want to compare accuracy to see if there are notable
differences.
next prev parent reply other threads:[~2017-09-07 23:53 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-01 22:59 Patrick McGehearty
2017-09-01 23:14 ` Joseph Myers
2017-09-06 20:34 ` Patrick McGehearty
2017-09-06 21:01 ` Joseph Myers
2017-09-07 20:42 ` Patrick McGehearty
2017-09-07 21:05 ` Joseph Myers
2017-09-07 23:53 ` Patrick McGehearty [this message]
2017-09-04 11:43 ` Szabolcs Nagy
2017-09-06 20:31 ` Patrick McGehearty
2017-09-11 18:50 Wilco Dijkstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9ec36391-8fca-bdfa-a7a9-4d715e62c568@oracle.com \
--to=patrick.mcgehearty@oracle.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).