public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Patrick McGehearty <patrick.mcgehearty@oracle.com>
To: Joseph Myers <joseph@codesourcery.com>
Cc: libc-alpha@sourceware.org
Subject: Re: [PATCH] improves exp() and expf() performance on Sparc.
Date: Thu, 07 Sep 2017 20:42:00 -0000	[thread overview]
Message-ID: <706fe477-8d85-47d9-d62c-164bba5606ec@oracle.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1709062045570.29155@digraph.polyomino.org.uk>

On 9/6/2017 4:01 PM, Joseph Myers wrote:
> On Wed, 6 Sep 2017, Patrick McGehearty wrote:
>
>> The sysdeps/ieee754/dbl-64/w_exp_compat.c
>> declares __exp (double x)
>> and then adds:
>> hidden_def (__exp)
>> weak_alias (__exp, exp)
>>
>> I believe the weak_alias in w_exp_compat.c is overriden by the
>> sparc_libm_ifunc in e_exp-generic.c.  At least, I am not seeing any
>> link time errors about double exp declarations and I am seeing the new
>> code being executed (as proved by the speed and accuracy changes).
> Then you should avoid any object code from w_exp_compat.c being linked
> into libm.so at all, by overriding it with a dummy file, rather than just
> letting certain symbols be overridden at link time.
>
>> As for error handling, I believe the extra level of indirection on
>> return from exp provided by the sysdeps/ieee754/dbl-64/w_exp_compat.c
>> routine is an anti-performance design. Every normal return from e_exp
> It's fairly clearly a design optimized for consistency of error handling
> in the presence of several architecture-specific implementations of the
> main function, without needing to e.g. deal with TLS in assembly code for
> accessing errno or make multiple implementations handle matherr the same
> way.  When you avoid architecture-specific implementations (especially .S
> ones) as far as possible, integrated error handling is more practical,
> especially if you also use new symbol versions to avoid needing to deal
> with matherr.
>
> For expf performance obviously needs to be compared with Szabolcs's
> implementation (compiled with whatever options and configured
> appropriately regarding conversions to integer etc. to be optimal for
> SPARC).  For exp, I'm inclined to say performance should be compared with
> the existing exp *with the slow paths calling __slowexp removed along with
> the associated checks for whether to use those slow paths* since those
> slow paths are completely unnecessary.
>
The sysdeps/ieee_754 subtree has a number of direct calls into
ieee754_exp from such places as e_sinh, e_cosh, e_gamma_r, and s_erf.
While I have not found direct calls to __exp in the ieee_754 subtree,
I see overriding w_exp_compat.c as having some risk of
unexpected behavior with the only perceived benefit to be eliminating
a modest number of bytes from libm.

For exp, when I test isolated values, the factor of improvement
between ieee754 and the new code on Sparc to be in the range of 8x to
14x. That's not considering cases which trigger slowexp().

Comparing the "make bench" benchtests/bench.out for exp():
      ieee754    new
max:  17630     174
min:    399      26
mean:  5320      67

When the differences are this large and the new max is faster than the
old min, I don't see a need in doing further performance testing.

For expf, the comparison for individual values shows an improvement
in the range of 15x. benchtests does not measure expf().
Making this change will provide a clear, immediate gain in expf()
performance.

Is the Szabolcs code in its final form?  There were some discussion
of accuracy and of possible changes to the algorithm, perhaps using
a larger table. The Sparc code uses a larger table and thus may
be more accurate for some ulp sensitive values. Or it may be a non-issue
since both algorithms are using double precision for computation.

Wilco Dijkstra compared the new Sparc code to Szabolcs code on
aarch64 and found Szabolcs code to be 10% faster on aarch64.
That advantage may or may not be reversed on Sparc, but it is
close enough to justify testing.
In addition to a performance comparison, we'd want to do an
accuracy comparison to see what differences we might be accepting.


  reply	other threads:[~2017-09-07 20:42 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-01 22:59 Patrick McGehearty
2017-09-01 23:14 ` Joseph Myers
2017-09-06 20:34   ` Patrick McGehearty
2017-09-06 21:01     ` Joseph Myers
2017-09-07 20:42       ` Patrick McGehearty [this message]
2017-09-07 21:05         ` Joseph Myers
2017-09-07 23:53           ` Patrick McGehearty
2017-09-04 11:43 ` Szabolcs Nagy
2017-09-06 20:31   ` Patrick McGehearty
2017-09-11 18:50 Wilco Dijkstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=706fe477-8d85-47d9-d62c-164bba5606ec@oracle.com \
    --to=patrick.mcgehearty@oracle.com \
    --cc=joseph@codesourcery.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).