public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
	'GNU C Library' <libc-alpha@sourceware.org>
Subject: Re: [PATCH] math: Improve fmod(f) performance
Date: Thu, 13 Apr 2023 17:56:03 -0300	[thread overview]
Message-ID: <64198947-61d6-9f15-17de-a5c8c8f1e71b@linaro.org> (raw)
In-Reply-To: <PAWPR08MB89828DFAF4C1BD00EFA5B25283989@PAWPR08MB8982.eurprd08.prod.outlook.com>



On 13/04/23 17:45, Wilco Dijkstra wrote:
> Hi Adhemerval,
> 
>> So at least with current 'close-exponents' from bench-fmod, which was
>> generated from exponents between -10 and 10, the gain is more modest
>> (and normal inputs does show a small regression).  This should be ok, 
>> but I also think we need to outline that A72 gains might not show on
>> different hardware.
> 
> On a SkyLake I'm seeing this for fmod:
> 
>                   master   patch
> subnormals         51.34    45.92 (+11.8%)
> normal             436.9    420.5 (+3.9%)
> close-exponents    56.44    53.11 (+6.3%)
> 
> And on Zen2:
> 
>                   master   patch
> subnormals         10.83    10.39 (+4.2%)
> normal             336.1    335.8 (+0.01%)
> close-exponents    14.90    14.11 (+5.6%)
> 
> So it shows good improvements across the board. It's odd your results on AMD are
> worse than my Zen 2 results - are there large variations between runs? I did quite a
> few runs to get a fast result and increased iterations of the math benchmarks 10x.

I don't see much variation, but I think these numbers on multiple chips
are more than enough.  Could you include them on commit message?

> 
> I can't explain why the gains on AArch64 are so much larger - the reduced instruction
> counts and branches for the common cases seem to make a big difference. On x86
> there are still many MOVABS instructions which are problematic for decode> 
>> So maybe also add another bench-fmod set for |x/y| < 2^12 to show
>> the potential gains.
> 
> I'm not sure how that would improve things - ideally we need more realistic
> inputs (ie. actual traces) but we could change the existing inputs into workloads
> to give it a more difficult problem. Changing close-exponents into a workload
> shows 11.0% lower latency and 11.9% better throughput on my SkyLake. On Zen 2
> I see 1% lower latency and 7.4% better throughput. Neoverse V1 shows 25.1%
> lower latency and 23.9% better throughput.

Fair enough, I think the only small nit is the clz_uint64 usage then.

      reply	other threads:[~2023-04-13 20:56 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-13 14:29 Wilco Dijkstra
2023-04-13 15:58 ` Adhemerval Zanella Netto
2023-04-13 20:45   ` Wilco Dijkstra
2023-04-13 20:56     ` Adhemerval Zanella Netto [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=64198947-61d6-9f15-17de-a5c8c8f1e71b@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).