From: "H.J. Lu" <hjl.tools@gmail.com>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>,
"libc-alpha@sourceware.org" <libc-alpha@sourceware.org>,
kirill <kirill.okhotnikov@gmail.com>
Subject: Re: [PATCH v2 3/5] math: Improve fmod
Date: Fri, 17 Mar 2023 09:07:45 -0700 [thread overview]
Message-ID: <CAMe9rOpVOAdrOMV_wzn=2bRwaN9qK9_=kbdqU6SVLrCb5EFtFw@mail.gmail.com> (raw)
In-Reply-To: <PAWPR08MB898200ADEB2952D04378C4D883BD9@PAWPR08MB8982.eurprd08.prod.outlook.com>
On Fri, Mar 17, 2023 at 7:55 AM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>
> Hi Adhemerval,
>
> >> It's these cases where x87 is still faster than the generic version:
> >>
> >>> E5-2640 | close-exponents | 39.298 | 22.2742
> >>>
> >>> i7-4510U | close-exponents | 29.463 | 22.8572
> >>
> >> Are these mostly x < y or cases where the exponent difference is just over 11 and
> >> thus we do not use the fast path?
> >
> > In fact the fast path will be used on ~83% of the cases (849 from 1024 entries).
> > Profiling shows that the initial checks might be the culprit, since generic
> > compat wrapper uses compiler builtins that might map to fp instructions. But even
> > trying to mimic did not improve much. It seems that for some CPU the integer
> > operations to create the final floating number is what is costly.
>
> If it is mostly the fast path we could further tune it and reduce instruction counts.
> It takes 6 if statements to enter this fast path, we could reduce that to 3. There are
> several large constants which could be simplified (older x86 cores might have
> issues with multiple 10-byte MOVABS in the instruction stream).
>
> Also I think your results for generic above use the wrapper, so we'd still get the
> > 20% speedup which should make things closer.
>
The current __ieee754_fmod doesn't set errno nor does x87 __ieee754_fmod.
A wrapper will avoid setting errno.
--
H.J.
next prev parent reply other threads:[~2023-03-17 16:08 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-15 20:59 [PATCH v2 0/5] Improve fmod and fmodf Adhemerval Zanella
2023-03-15 20:59 ` [PATCH v2 1/5] benchtests: Add fmod benchmark Adhemerval Zanella
2023-03-15 20:59 ` [PATCH v2 2/5] benchtests: Add fmodf benchmark Adhemerval Zanella
2023-03-15 20:59 ` [PATCH v2 3/5] math: Improve fmod Adhemerval Zanella
2023-03-16 0:58 ` H.J. Lu
2023-03-16 14:28 ` Adhemerval Zanella Netto
2023-03-16 16:13 ` Wilco Dijkstra
2023-03-16 20:39 ` Adhemerval Zanella Netto
2023-03-17 14:55 ` Wilco Dijkstra
2023-03-17 16:07 ` H.J. Lu [this message]
2023-03-17 18:22 ` Wilco Dijkstra
2023-03-15 20:59 ` [PATCH v2 4/5] math: Improve fmodf Adhemerval Zanella
2023-03-16 18:11 ` Wilco Dijkstra
2023-03-16 18:38 ` Adhemerval Zanella Netto
2023-03-16 19:15 ` Wilco Dijkstra
2023-03-16 19:45 ` Adhemerval Zanella Netto
2023-03-15 20:59 ` [PATCH v2 5/5] math: Remove the error handling wrapper from fmod and fmodf Adhemerval Zanella
2023-03-16 17:21 ` Wilco Dijkstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAMe9rOpVOAdrOMV_wzn=2bRwaN9qK9_=kbdqU6SVLrCb5EFtFw@mail.gmail.com' \
--to=hjl.tools@gmail.com \
--cc=Wilco.Dijkstra@arm.com \
--cc=adhemerval.zanella@linaro.org \
--cc=kirill.okhotnikov@gmail.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).