From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: libc-alpha@sourceware.org,
Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
"H . J . Lu" <hjl.tools@gmail.com>
Subject: [PATCH 0/4] Improve fmod and fmodf
Date: Fri, 10 Mar 2023 14:58:56 -0300 [thread overview]
Message-ID: <20230310175900.2388957-1-adhemerval.zanella@linaro.org> (raw)
This is an updated version of a previous submission aimed to improve
fmod implementation [1] by Kirill Okhotnikov. I extended it with:
1. Proper benchmarks for both single and double. The inputs are
divided in 3 subsets: subnormals, normal nubmers, and close
exponents. It uses a list with random generated values.
2. Use math_config.h definitions instead math_private (so it might
eventually get back on optimize-routines).
3. Implement the same strategy for float version.
4. Also tuned the final division to use multiplication with inverse
instead of direct modulo. It showed better performance on both
x86_64 and aarch64 chips I have tested.
The performance shows a good improvement compared to current algorithm
for fmod (using gcc 11):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
x86_64 (Ryzen 9) | subnormals | 19.1584 | 12.0932
x86_64 (Ryzen 9) | normal | 1016.51 | 301.204
x86_64 (Ryzen 9) | close-exponents | 18.4428 | 16.8506
aarch64 (N1) | subnormal | 11.153 | 6.81778
aarch64 (N1) | normal | 528.649 | 158.339
aarch64 (N1) | close-exponents | 11.4517 | 8.67894
I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it uses a lot of soft-fp implementation
(for modulo, clz, ctz, and multiplication):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
armhf (N1) | subnormal | 15.7284 | 15.1083
armhf (N1) | normal | 837.525 | 244.833
armhf (N1) | close-exponents | 16.2111 | 21.8182
The fmodf shows a more moderate improvement:
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
x86_64 (Ryzen 9) | subnormals | 17.2549 | 12.3214
x86_64 (Ryzen 9) | normal | 85.4096 | 52.6625
x86_64 (Ryzen 9) | close-exponents | 19.1072 | 17.4622
aarch64 (N1) | subnormal | 10.2182 | 6.81778
aarch64 (N1) | normal | 60.0616 | 158.339
aarch64 (N1) | close-exponents | 11.5256 | 8.67894
armhf (N1) | subnormal | 11.6662 | 10.8955
armhf (N1) | normal | 69.2759 | 35.4184
armhf (N1) | close-exponents | 13.6472 | 17.8539
I also checked against H.J proposal to use fprem on x86_64 [2] and
against recent suggestion on libc-alpha [3], and on both cases
this newer implementation shows better performance.
[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html
[2] https://patchwork.sourceware.org/project/glibc/patch/20230309183312.205763-1-hjl.tools@gmail.com/
[3] https://sourceware.org/pipermail/libc-alpha/2023-March/146164.html
Adhemerval Zanella (4):
benchtests: Add fmod benchmark
benchtests: Add fmodf benchmark
math: Improve fmod
math: Improve fmodf
benchtests/Makefile | 2 +
benchtests/fmod-inputs | 2182 ++++++++++++++++++++++++++
benchtests/fmodf-inputs | 2182 ++++++++++++++++++++++++++
sysdeps/ieee754/dbl-64/e_fmod.c | 234 +--
sysdeps/ieee754/dbl-64/math_config.h | 110 ++
sysdeps/ieee754/flt-32/e_fmodf.c | 230 +--
sysdeps/ieee754/flt-32/math_config.h | 89 ++
7 files changed, 4840 insertions(+), 189 deletions(-)
create mode 100644 benchtests/fmod-inputs
create mode 100644 benchtests/fmodf-inputs
--
2.34.1
next reply other threads:[~2023-03-10 17:59 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-10 17:58 Adhemerval Zanella [this message]
2023-03-10 17:58 ` [PATCH 1/4] benchtests: Add fmod benchmark Adhemerval Zanella
2023-03-10 17:58 ` [PATCH 2/4] benchtests: Add fmodf benchmark Adhemerval Zanella
2023-03-10 17:58 ` [PATCH 3/4] math: Improve fmod Adhemerval Zanella
2023-03-10 17:59 ` [PATCH 4/4] math: Improve fmodf Adhemerval Zanella
2023-03-10 23:17 ` H.J. Lu
2023-03-13 15:19 ` Matt Turner
2023-03-13 16:38 ` Adhemerval Zanella Netto
2023-03-14 16:42 ` Wilco Dijkstra
2023-03-15 17:50 ` Adhemerval Zanella Netto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230310175900.2388957-1-adhemerval.zanella@linaro.org \
--to=adhemerval.zanella@linaro.org \
--cc=Wilco.Dijkstra@arm.com \
--cc=hjl.tools@gmail.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).