From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: libc-alpha@sourceware.org,
Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
"H . J . Lu" <hjl.tools@gmail.com>
Subject: [PATCH v4 0/5] Improve fmod and fmodf
Date: Mon, 20 Mar 2023 13:01:13 -0300 [thread overview]
Message-ID: <20230320160118.352206-1-adhemerval.zanella@linaro.org> (raw)
This is an updated version of a previous submission aimed to improve
fmod implementation [1] by Kirill Okhotnikov. I extended it with:
1. Proper benchmarks for both single and double. The inputs are
divided in 3 subsets: subnormals, normal nubmers, and close
exponents. It uses a list with random generated values.
2. Use math_config.h definitions instead math_private (so it might
eventually get back on optimize-routines).
3. Implement the same strategy for float version.
4. Also tuned the final division to use multiplication with inverse
instead of direct modulo. It showed better performance on both
x86_64 and aarch64 chips I have tested.
5. Remove SVID error handling wrapper.
The performance shows a good improvement compared to current algorithm
for fmod (using gcc 11):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
x86_64 (Ryzen 9) | subnormals | 19.1584 | 9.40992
x86_64 (Ryzen 9) | normal | 1016.51 | 296.738
x86_64 (Ryzen 9) | close-exponents | 18.4428 | 13.119
aarch64 (N1) | subnormal | 11.153 | 4.33313
aarch64 (N1) | normal | 528.649 | 158.339
aarch64 (N1) | close-exponents | 11.4517 | 5.76138
I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it uses a lot of soft-fp implementation
(for modulo, clz, ctz, and multiplication):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
armhf (N1) | subnormal | 15.7284 | 14.5746
armhf (N1) | normal | 837.525 | 241.738
armhf (N1) | close-exponents | 16.2111 | 22.457
The fmodf shows a more moderate improvement:
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
x86_64 (Ryzen 9) | subnormals | 17.2549 | 9.35776
x86_64 (Ryzen 9) | normal | 85.4096 | 46.2761
x86_64 (Ryzen 9) | close-exponents | 19.1072 | 12.6199
aarch64 (N1) | subnormal | 10.2182 | 4.39188
aarch64 (N1) | normal | 60.0616 | 18.3888
aarch64 (N1) | close-exponents | 11.5256 | 5.93518
armhf (N1) | subnormal | 11.6662 | 7.75977
armhf (N1) | normal | 69.2759 | 31.623
armhf (N1) | close-exponents | 13.6472 | 15.6689
I also checked against H.J proposal to use fprem on x86_64 [2] and
against recent suggestion on libc-alpha [3], and on both cases
this newer implementation shows better performance.
Changes from v3:
* New tests cover more floating points types.
Changes from v2:
* Bug fixes and improve testsuite.
Changes from v1:
* Remove SVID error handling wrapper.
* Extend testing for subnormal with different signs.
* Code cleanup.
Adhemerval Zanella (5):
benchtests: Add fmod benchmark
benchtests: Add fmodf benchmark
math: Improve fmod
math: Improve fmodf
math: Remove the error handling wrapper from fmod and fmodf
benchtests/Makefile | 2 +
benchtests/fmod-inputs | 2182 +++++++++++++++++
benchtests/fmodf-inputs | 2182 +++++++++++++++++
math/Versions | 4 +
math/libm-test-fmod.inc | 18 +
math/w_fmod_compat.c | 13 +-
math/w_fmodf_compat.c | 6 +-
sysdeps/i386/fpu/w_fmod_compat.c | 14 +
sysdeps/i386/fpu/w_fmodf_compat.c | 14 +
sysdeps/ieee754/dbl-64/e_fmod.c | 248 +-
sysdeps/ieee754/dbl-64/math_config.h | 70 +
sysdeps/ieee754/dbl-64/math_err.c | 6 +
sysdeps/ieee754/dbl-64/w_fmod.c | 1 +
sysdeps/ieee754/flt-32/e_fmodf.c | 244 +-
sysdeps/ieee754/flt-32/math_config.h | 48 +
sysdeps/ieee754/flt-32/math_errf.c | 6 +
sysdeps/ieee754/flt-32/w_fmodf.c | 1 +
sysdeps/m68k/m680x0/fpu/w_fmod_compat.c | 14 +
sysdeps/m68k/m680x0/fpu/w_fmodf_compat.c | 14 +
sysdeps/unix/sysv/linux/aarch64/libm.abilist | 2 +
sysdeps/unix/sysv/linux/alpha/libm.abilist | 2 +
sysdeps/unix/sysv/linux/arm/be/libm.abilist | 2 +
sysdeps/unix/sysv/linux/arm/le/libm.abilist | 2 +
sysdeps/unix/sysv/linux/hppa/libm.abilist | 2 +
.../sysv/linux/m68k/coldfire/libm.abilist | 2 +
.../sysv/linux/microblaze/be/libm.abilist | 2 +
.../sysv/linux/microblaze/le/libm.abilist | 2 +
.../unix/sysv/linux/mips/mips32/libm.abilist | 2 +
.../unix/sysv/linux/mips/mips64/libm.abilist | 2 +
sysdeps/unix/sysv/linux/nios2/libm.abilist | 2 +
.../linux/powerpc/powerpc32/fpu/libm.abilist | 2 +
.../powerpc/powerpc32/nofpu/libm.abilist | 2 +
.../linux/powerpc/powerpc64/be/libm.abilist | 2 +
.../linux/powerpc/powerpc64/le/libm.abilist | 2 +
.../unix/sysv/linux/s390/s390-32/libm.abilist | 2 +
.../unix/sysv/linux/s390/s390-64/libm.abilist | 2 +
sysdeps/unix/sysv/linux/sh/be/libm.abilist | 2 +
sysdeps/unix/sysv/linux/sh/le/libm.abilist | 2 +
.../sysv/linux/sparc/sparc32/libm.abilist | 2 +
.../sysv/linux/sparc/sparc64/libm.abilist | 2 +
.../unix/sysv/linux/x86_64/64/libm.abilist | 2 +
.../unix/sysv/linux/x86_64/x32/libm.abilist | 2 +
42 files changed, 4936 insertions(+), 197 deletions(-)
create mode 100644 benchtests/fmod-inputs
create mode 100644 benchtests/fmodf-inputs
create mode 100644 sysdeps/i386/fpu/w_fmod_compat.c
create mode 100644 sysdeps/i386/fpu/w_fmodf_compat.c
create mode 100644 sysdeps/ieee754/dbl-64/w_fmod.c
create mode 100644 sysdeps/ieee754/flt-32/w_fmodf.c
create mode 100644 sysdeps/m68k/m680x0/fpu/w_fmod_compat.c
create mode 100644 sysdeps/m68k/m680x0/fpu/w_fmodf_compat.c
--
2.34.1
next reply other threads:[~2023-03-20 16:01 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-20 16:01 Adhemerval Zanella [this message]
2023-03-20 16:01 ` [PATCH v4 1/5] benchtests: Add fmod benchmark Adhemerval Zanella
2023-04-03 13:13 ` Wilco Dijkstra
2023-03-20 16:01 ` [PATCH v4 2/5] benchtests: Add fmodf benchmark Adhemerval Zanella
2023-04-03 13:16 ` Wilco Dijkstra
2023-03-20 16:01 ` [PATCH v4 3/5] math: Improve fmod Adhemerval Zanella
2023-04-03 13:29 ` Wilco Dijkstra
2023-03-20 16:01 ` [PATCH v4 4/5] math: Improve fmodf Adhemerval Zanella
2023-04-03 13:33 ` Wilco Dijkstra
2023-03-20 16:01 ` [PATCH v4 5/5] math: Remove the error handling wrapper from fmod and fmodf Adhemerval Zanella
2023-04-03 13:43 ` Wilco Dijkstra
2023-04-03 18:33 ` Adhemerval Zanella Netto
-- strict thread matches above, loose matches on Subject: below --
2023-03-20 13:47 [PATCH v4 0/5] Improve " Adhemerval Zanella
2023-03-20 16:00 ` Adhemerval Zanella Netto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230320160118.352206-1-adhemerval.zanella@linaro.org \
--to=adhemerval.zanella@linaro.org \
--cc=Wilco.Dijkstra@arm.com \
--cc=hjl.tools@gmail.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).