public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Improve fmod and fmodf
@ 2023-03-10 17:58 Adhemerval Zanella
  2023-03-10 17:58 ` [PATCH 1/4] benchtests: Add fmod benchmark Adhemerval Zanella
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Adhemerval Zanella @ 2023-03-10 17:58 UTC (permalink / raw)
  To: libc-alpha, Wilco Dijkstra, H . J . Lu

This is an updated version of a previous submission aimed to improve
fmod implementation [1] by Kirill Okhotnikov.  I extended it with:

  1. Proper benchmarks for both single and double.  The inputs are
     divided in 3 subsets: subnormals, normal nubmers, and close 
     exponents.  It uses a list with random generated values.

  2. Use math_config.h definitions instead math_private (so it might
     eventually get back on optimize-routines).

  3. Implement the same strategy for float version.

  4. Also tuned the final division to use multiplication with inverse
     instead of direct modulo.  It showed better performance on both
     x86_64 and aarch64 chips I have tested.

The performance shows a good improvement compared to current algorithm
for fmod (using gcc 11):

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  x86_64 (Ryzen 9) | subnormals      | 19.1584  | 12.0932
  x86_64 (Ryzen 9) | normal          | 1016.51  | 301.204
  x86_64 (Ryzen 9) | close-exponents | 18.4428  | 16.8506
  aarch64 (N1)     | subnormal       | 11.153   | 6.81778
  aarch64 (N1)     | normal          | 528.649  | 158.339
  aarch64 (N1)     | close-exponents | 11.4517  | 8.67894

I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it uses a lot of soft-fp implementation
(for modulo, clz, ctz, and multiplication):

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  armhf (N1)       | subnormal       | 15.7284  | 15.1083
  armhf (N1)       | normal          | 837.525  | 244.833
  armhf (N1)       | close-exponents | 16.2111  | 21.8182


The fmodf shows a more moderate improvement:

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  x86_64 (Ryzen 9) | subnormals      | 17.2549  | 12.3214
  x86_64 (Ryzen 9) | normal          | 85.4096  | 52.6625
  x86_64 (Ryzen 9) | close-exponents | 19.1072  | 17.4622
  aarch64 (N1)     | subnormal       | 10.2182  | 6.81778
  aarch64 (N1)     | normal          | 60.0616  | 158.339
  aarch64 (N1)     | close-exponents | 11.5256  | 8.67894
  armhf (N1)       | subnormal       | 11.6662  | 10.8955
  armhf (N1)       | normal          | 69.2759  | 35.4184
  armhf (N1)       | close-exponents | 13.6472  | 17.8539


I also checked against H.J proposal to use fprem on x86_64 [2] and
against recent suggestion on libc-alpha [3], and on both cases 
this newer implementation shows better performance.

[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html
[2] https://patchwork.sourceware.org/project/glibc/patch/20230309183312.205763-1-hjl.tools@gmail.com/
[3] https://sourceware.org/pipermail/libc-alpha/2023-March/146164.html

Adhemerval Zanella (4):
  benchtests: Add fmod benchmark
  benchtests: Add fmodf benchmark
  math: Improve fmod
  math: Improve fmodf

 benchtests/Makefile                  |    2 +
 benchtests/fmod-inputs               | 2182 ++++++++++++++++++++++++++
 benchtests/fmodf-inputs              | 2182 ++++++++++++++++++++++++++
 sysdeps/ieee754/dbl-64/e_fmod.c      |  234 +--
 sysdeps/ieee754/dbl-64/math_config.h |  110 ++
 sysdeps/ieee754/flt-32/e_fmodf.c     |  230 +--
 sysdeps/ieee754/flt-32/math_config.h |   89 ++
 7 files changed, 4840 insertions(+), 189 deletions(-)
 create mode 100644 benchtests/fmod-inputs
 create mode 100644 benchtests/fmodf-inputs

-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-03-15 17:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-10 17:58 [PATCH 0/4] Improve fmod and fmodf Adhemerval Zanella
2023-03-10 17:58 ` [PATCH 1/4] benchtests: Add fmod benchmark Adhemerval Zanella
2023-03-10 17:58 ` [PATCH 2/4] benchtests: Add fmodf benchmark Adhemerval Zanella
2023-03-10 17:58 ` [PATCH 3/4] math: Improve fmod Adhemerval Zanella
2023-03-10 17:59 ` [PATCH 4/4] math: Improve fmodf Adhemerval Zanella
2023-03-10 23:17   ` H.J. Lu
2023-03-13 15:19   ` Matt Turner
2023-03-13 16:38     ` Adhemerval Zanella Netto
2023-03-14 16:42   ` Wilco Dijkstra
2023-03-15 17:50     ` Adhemerval Zanella Netto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).