[PATCH v5 00/12] Improve hypot - Adhemerval Zanella

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: libc-alpha@sourceware.org,
	Paul Zimmermann <Paul.Zimmermann@inria.fr>,
	Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Subject: [PATCH v5 00/12] Improve hypot
Date: Tue,  7 Dec 2021 16:03:41 -0300	[thread overview]
Message-ID: <20211207190353.3282666-1-adhemerval.zanella@linaro.org> (raw)

This patchset add a different algorithm along with Wilco [1] performance
improvements.  The default implementation is based on the 'An Improved 
Algorithm for hypot(a,b)' by Carlos F. Borges [2] with some fixes and
improvements (although the dbl-64 one uses a slight different approach
when fast FMA is avaliable).  This method is also used by Julia language
runtime [2].

The motivation for this change are:

  1. Use a newer algotihm that favor FP operations over interger ones
     (which tends to show better performance specially on newer
     processor with multiple FP units).  It also allows consolidate
     the multiple implementation (for instance, the powerpc one).

  2. The new algorithm is more precise without minimum performance
     difference.

The current hypot() implementation seems already to be bounded to a
maximum of 1 ulp of error, however the new proposed algorithm shows an
slight precision improvement by showing more correctly rounded results.

With a random 1e9 inputs for different float format I see:

  - An improvement from 3427362 to 18457 results with 1 ulp of
    error for Binary64.
  - An improvement from 233442 to 1274 results with 1 ulp of
    error for Binary96 (x86_64).
  - An improvement from 453045 to 1294 results with 1 ulp of
    error for Binary96 (x86_64).

Also for the maximal known error master shows (in ulps, with
corresponding inputs), determined with [3]:

  binary32  0.500 0x1.3ac98p+67,-0x1.ba5ec2p+77
  binary64  0.987 -0x0.5a934b7eac967p-1022,-0x0.b5265a7e06b82p-1022
  binary96  0.981
0x1.73f339f61eda21dp-16384l,0x2.e45f9f9500877e2p-16384l
  binary128 0.985
-0x2.d8311789103b76133ea1d5bc38c4p-16384,-0x1.6d85492006d7dcc6cc52938684p-16384

With the new implementation:

  binary32  0.500 0x1.3ac98p+67,-0x1.ba5ec2p+77 [same]
  binary64  0.792 0x0.603e52daf0bfdp-1022,-0x0.a622d0a9a433bp-1022
  binary96  0.584
-0x2.97b86706043d619p+7240l,0x1.8256bdd12d2e163ep+7240l
  binary128 0.749
0x2.2d5faf4036d6e68566f01054612p-8192,0x3.5738e8e2505f5d1fc2973716f05p-8192

If FMA is uses the binary64 shows a slight worse precision:

I have adapted the dbl-64, ldbl-96, and ldbl-128, the flt-32 is not
required since it calls the dbl-64 one.  I have not adapated ldbl-128ibm
since the format has a lot of caveats and IBM aims to move to ldbl-128.

[1] https://sourceware.org/pipermail/libc-alpha/2021-November/133523.html
[1] https://arxiv.org/pdf/1904.09481.pdf
[2] https://github.com/JuliaLang/julia/commit/4a046009a3362ab5e17d369641dbbc9657eb680c
[3] https://gitlab.inria.fr/zimmerma/math_accuracy/-/blob/master/binary64/check_sample2.c

---
Changes from v5:
 * Add missing cast on generic hypotf.
 * Add missing math_narrow_eval on generic hypot.
 * Fixed return value on generic ldbl-96 hypotl.
 * Fixed return value on generic ldbl-128 hypotl.
 * Add POWER10 performance values.
 * Add missing cast on i686 hypotf.
 * Rewrite math-use-builtinds-fmax.h and math-use-builtinds-fmin.h
   based on Joseph's suggestions.
---

Adhemerval Zanella (9):
  math: Simplify hypotf implementation
  math: Use an improved algorithm for hypotl (ldbl-96)
  math: Use an improved algorithm for hypotl (ldbl-128)
  i386: Move hypot implementation to C
  math: Remove powerpc e_hypot
  math: Add math-use-builtinds-fmax.h
  math: Add math-use-builtinds-fmin.h
  aarch64: Add math-use-builtins-f{max,min}.h
  math: Remove the error handling wrapper from hypot and hypotf

Wilco Dijkstra (3):
  math: Use an improved algorithm for hypot (dbl-64)
  math: Improve hypot performance with FMA
  math: Use fmin/fmax on hypot

 math/Versions                                 |   2 +
 math/s_fmax_template.c                        |   5 +
 math/s_fmin_template.c                        |   6 +-
 math/w_hypot.c                                |   8 +
 math/w_hypot_compat.c                         |  13 +-
 math/w_hypotf.c                               |   8 +
 math/w_hypotf_compat.c                        |   6 +-
 sysdeps/aarch64/fpu/math-use-builtins-fmax.h  |   4 +
 sysdeps/aarch64/fpu/math-use-builtins-fmin.h  |   4 +
 sysdeps/aarch64/fpu/s_fmax.c                  |  28 --
 sysdeps/aarch64/fpu/s_fmaxf.c                 |  28 --
 sysdeps/aarch64/fpu/s_fmin.c                  |  28 --
 sysdeps/aarch64/fpu/s_fminf.c                 |  28 --
 sysdeps/generic/math-type-macros-double.h     |   1 +
 sysdeps/generic/math-type-macros-float.h      |   1 +
 sysdeps/generic/math-type-macros-float128.h   |   1 +
 sysdeps/generic/math-type-macros-ldouble.h    |   1 +
 sysdeps/generic/math-use-builtins-fmax.h      |   4 +
 sysdeps/generic/math-use-builtins-fmin.h      |   4 +
 sysdeps/generic/math-use-builtins.h           |   2 +
 sysdeps/i386/fpu/e_hypot.S                    |  75 -----
 sysdeps/i386/fpu/e_hypot.c                    |  58 ++++
 sysdeps/i386/fpu/e_hypotf.S                   |  64 -----
 sysdeps/ieee754/dbl-64/e_hypot.c              | 270 ++++++++----------
 sysdeps/ieee754/dbl-64/w_hypot.c              |   1 +
 sysdeps/ieee754/flt-32/e_hypotf.c             |  79 ++---
 sysdeps/ieee754/flt-32/math_config.h          |   9 +
 sysdeps/ieee754/flt-32/w_hypotf.c             |   1 +
 sysdeps/ieee754/ldbl-128/e_hypotl.c           | 226 +++++++--------
 sysdeps/ieee754/ldbl-96/e_hypotl.c            | 231 +++++++--------
 sysdeps/mach/hurd/i386/libm.abilist           |   2 +
 sysdeps/powerpc/fpu/e_hypot.c                 |  87 ------
 sysdeps/powerpc/fpu/e_hypotf.c                |  78 -----
 .../powerpc32/power4/fpu/multiarch/Makefile   |   5 +-
 .../power4/fpu/multiarch/e_hypot-power7.c     |  23 --
 .../power4/fpu/multiarch/e_hypot-ppc32.c      |  23 --
 .../powerpc32/power4/fpu/multiarch/e_hypot.c  |  33 ---
 .../power4/fpu/multiarch/e_hypotf-power7.c    |  23 --
 .../power4/fpu/multiarch/e_hypotf-ppc32.c     |  23 --
 .../powerpc32/power4/fpu/multiarch/e_hypotf.c |  33 ---
 sysdeps/unix/sysv/linux/aarch64/libm.abilist  |   2 +
 sysdeps/unix/sysv/linux/alpha/libm.abilist    |   2 +
 sysdeps/unix/sysv/linux/arm/be/libm.abilist   |   2 +
 sysdeps/unix/sysv/linux/arm/le/libm.abilist   |   2 +
 sysdeps/unix/sysv/linux/hppa/libm.abilist     |   2 +
 sysdeps/unix/sysv/linux/i386/libm.abilist     |   2 +
 .../sysv/linux/m68k/coldfire/libm.abilist     |   2 +
 .../unix/sysv/linux/m68k/m680x0/libm.abilist  |   2 +
 .../sysv/linux/microblaze/be/libm.abilist     |   2 +
 .../sysv/linux/microblaze/le/libm.abilist     |   2 +
 .../unix/sysv/linux/mips/mips32/libm.abilist  |   2 +
 .../unix/sysv/linux/mips/mips64/libm.abilist  |   2 +
 sysdeps/unix/sysv/linux/nios2/libm.abilist    |   2 +
 .../linux/powerpc/powerpc32/fpu/libm.abilist  |   2 +
 .../powerpc/powerpc32/nofpu/libm.abilist      |   2 +
 .../linux/powerpc/powerpc64/be/libm.abilist   |   2 +
 .../linux/powerpc/powerpc64/le/libm.abilist   |   2 +
 .../unix/sysv/linux/s390/s390-32/libm.abilist |   2 +
 .../unix/sysv/linux/s390/s390-64/libm.abilist |   2 +
 sysdeps/unix/sysv/linux/sh/be/libm.abilist    |   2 +
 sysdeps/unix/sysv/linux/sh/le/libm.abilist    |   2 +
 .../sysv/linux/sparc/sparc32/libm.abilist     |   2 +
 .../sysv/linux/sparc/sparc64/libm.abilist     |   2 +
 .../unix/sysv/linux/x86_64/64/libm.abilist    |   2 +
 .../unix/sysv/linux/x86_64/x32/libm.abilist   |   2 +
 65 files changed, 547 insertions(+), 1029 deletions(-)
 create mode 100644 math/w_hypot.c
 create mode 100644 math/w_hypotf.c
 create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmax.h
 create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmin.h
 delete mode 100644 sysdeps/aarch64/fpu/s_fmax.c
 delete mode 100644 sysdeps/aarch64/fpu/s_fmaxf.c
 delete mode 100644 sysdeps/aarch64/fpu/s_fmin.c
 delete mode 100644 sysdeps/aarch64/fpu/s_fminf.c
 create mode 100644 sysdeps/generic/math-use-builtins-fmax.h
 create mode 100644 sysdeps/generic/math-use-builtins-fmin.h
 delete mode 100644 sysdeps/i386/fpu/e_hypot.S
 create mode 100644 sysdeps/i386/fpu/e_hypot.c
 delete mode 100644 sysdeps/i386/fpu/e_hypotf.S
 create mode 100644 sysdeps/ieee754/dbl-64/w_hypot.c
 create mode 100644 sysdeps/ieee754/flt-32/w_hypotf.c
 delete mode 100644 sysdeps/powerpc/fpu/e_hypot.c
 delete mode 100644 sysdeps/powerpc/fpu/e_hypotf.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c

-- 
2.32.0

next             reply	other threads:[~2021-12-07 19:03 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-07 19:03 Adhemerval Zanella [this message]
2021-12-07 19:03 ` [PATCH v5 01/12] math: Simplify hypotf implementation Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 02/12] math: Use an improved algorithm for hypot (dbl-64) Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 03/12] math: Improve hypot performance with FMA Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 04/12] math: Use an improved algorithm for hypotl (ldbl-96) Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 05/12] math: Use an improved algorithm for hypotl (ldbl-128) Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 06/12] i386: Move hypot implementation to C Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 07/12] math: Remove powerpc e_hypot Adhemerval Zanella
2021-12-07 19:42   ` Paul A. Clarke
2021-12-07 19:03 ` [PATCH v5 08/12] math: Add math-use-builtinds-fmax.h Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 09/12] math: Add math-use-builtinds-fmin.h Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 10/12] aarch64: Add math-use-builtins-f{max,min}.h Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 11/12] math: Use fmin/fmax on hypot Adhemerval Zanella
2021-12-07 19:03 ` [PATCH v5 12/12] math: Remove the error handling wrapper from hypot and hypotf Adhemerval Zanella
2021-12-08  5:51 ` [PATCH v5 00/12] Improve hypot Paul Zimmermann
2021-12-10 19:31 ` Adhemerval Zanella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211207190353.3282666-1-adhemerval.zanella@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=Paul.Zimmermann@inria.fr \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).