[PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA

public inbox for newlib@sourceware.org
 help / color / mirror / Atom feed

* [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
@ 2020-08-08 22:34 Keith Packard
  2020-08-08 22:34 ` [PATCH 1/3] libm: ARM without HW double does not have fast FMA Keith Packard
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Keith Packard @ 2020-08-08 22:34 UTC (permalink / raw)
  To: newlib

I added some new test configurations to my CI system for picolibc and
discovered that when the new math code was built on 32-bit ARM
processors with only single-precision floating hardware, several math
functions were returning imprecise results. I got the expected results
on processors with no FPU and on processors with both 32- and 64- bit
FPUs.

I discovered that the affected functions were using the 'fma' function
on this hardware, even though (lacking 64-bit HW support), that
function was being emulated without the required precision.

This all boiled down to math_config.h incorrectly detecting 64-bit FMA
support on ARM processors.

This patch series contains three changes:

 1. fix the fast FMA process so that 32-bit ARM processors without 64-bit FMA
    support don't use 'fma' for the new math functions

 2. Add detection of fast FMAF, which 32-bit ARM processors with only
    32-bit FPUs *do* support.

 3. Add ARM versions of fma and fmaf which are used when those
    instructions are available.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/3] libm: ARM without HW double does not have fast FMA
  2020-08-08 22:34 [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA Keith Packard
@ 2020-08-08 22:34 ` Keith Packard
  2020-08-08 22:34 ` [PATCH 2/3] libm: Detect fast fmaf support Keith Packard
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Keith Packard @ 2020-08-08 22:34 UTC (permalink / raw)
  To: newlib

32-bit ARM processors with HW float (but not HW double) may define
__ARM_FEATURE_FMA, but that only means they have fast FMA for 32-bit
floats.

Signed-off-by: Keith Packard <keithp@keithp.com>
---
 newlib/libm/common/math_config.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/newlib/libm/common/math_config.h b/newlib/libm/common/math_config.h
index 1089b0ec6..df8f8d6e4 100644
--- a/newlib/libm/common/math_config.h
+++ b/newlib/libm/common/math_config.h
@@ -72,7 +72,7 @@
 
 /* Compiler can inline fma as a single instruction.  */
 #ifndef HAVE_FAST_FMA
-# if __aarch64__ || __ARM_FEATURE_FMA
+# if __aarch64__ || (__ARM_FEATURE_FMA && (__ARM_FP & 8))
 #   define HAVE_FAST_FMA 1
 # else
 #   define HAVE_FAST_FMA 0
-- 
2.28.0


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 2/3] libm: Detect fast fmaf support
  2020-08-08 22:34 [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA Keith Packard
  2020-08-08 22:34 ` [PATCH 1/3] libm: ARM without HW double does not have fast FMA Keith Packard
@ 2020-08-08 22:34 ` Keith Packard
  2020-08-08 22:34 ` [PATCH 3/3] libm/machine/arm: Add optimized fmaf and fma when available Keith Packard
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Keith Packard @ 2020-08-08 22:34 UTC (permalink / raw)
  To: newlib

Anything with fast FMA is assumed to have fast FMAF, along with
32-bit arms that advertise 32-bit FP support and __ARM_FEATURE_FMA

Signed-off-by: Keith Packard <keithp@keithp.com>
---
 newlib/libm/common/math_config.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/newlib/libm/common/math_config.h b/newlib/libm/common/math_config.h
index df8f8d6e4..e7a8bb7fe 100644
--- a/newlib/libm/common/math_config.h
+++ b/newlib/libm/common/math_config.h
@@ -79,6 +79,14 @@
 # endif
 #endif
 
+#ifndef HAVE_FAST_FMAF
+# if HAVE_FAST_FMA || (__ARM_FEATURE_FMA && (__ARM_FP & 4))
+#  define HAVE_FAST_FMAF 1
+# else
+#  define HAVE_FAST_FMAF 0
+# endif
+#endif
+
 #if HAVE_FAST_ROUND
 /* When set, the roundtoint and converttoint functions are provided with
    the semantics documented below.  */
-- 
2.28.0


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3/3] libm/machine/arm: Add optimized fmaf and fma when available
  2020-08-08 22:34 [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA Keith Packard
  2020-08-08 22:34 ` [PATCH 1/3] libm: ARM without HW double does not have fast FMA Keith Packard
  2020-08-08 22:34 ` [PATCH 2/3] libm: Detect fast fmaf support Keith Packard
@ 2020-08-08 22:34 ` Keith Packard
  2020-08-10  9:30 ` [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA Corinna Vinschen
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Keith Packard @ 2020-08-08 22:34 UTC (permalink / raw)
  To: newlib

When HAVE_FAST_FMAF is set, use the vfma.f32 instruction, when
HAVE_FAST_FMA is set, use the vfma.f64 instruction.

Usually the compiler built-ins will already have inlined these
instructions, but provide these symbols for cases where that doesn't
work instead of falling back to the (inaccurate) common code versions.

Signed-off-by: Keith Packard <keithp@keithp.com>
---
 newlib/libm/common/s_fma.c          |  4 +++
 newlib/libm/common/sf_fma.c         |  4 +++
 newlib/libm/machine/arm/Makefile.am |  2 ++
 newlib/libm/machine/arm/Makefile.in | 16 ++++++++++
 newlib/libm/machine/arm/s_fma.c     | 48 +++++++++++++++++++++++++++++
 newlib/libm/machine/arm/sf_fma.c    | 48 +++++++++++++++++++++++++++++
 6 files changed, 122 insertions(+)
 create mode 100644 newlib/libm/machine/arm/s_fma.c
 create mode 100644 newlib/libm/machine/arm/sf_fma.c

diff --git a/newlib/libm/common/s_fma.c b/newlib/libm/common/s_fma.c
index ab9e525b0..15c7d23f5 100644
--- a/newlib/libm/common/s_fma.c
+++ b/newlib/libm/common/s_fma.c
@@ -38,6 +38,8 @@ ANSI C, POSIX.
 
 #include "fdlibm.h"
 
+#if !HAVE_FAST_FMA
+
 #ifndef _DOUBLE_IS_32BITS
 
 #ifdef __STDC__
@@ -54,3 +56,5 @@ ANSI C, POSIX.
 }
 
 #endif /* _DOUBLE_IS_32BITS */
+
+#endif /* !HAVE_FAST_FMA */
diff --git a/newlib/libm/common/sf_fma.c b/newlib/libm/common/sf_fma.c
index 4360f400b..ce7f13bb2 100644
--- a/newlib/libm/common/sf_fma.c
+++ b/newlib/libm/common/sf_fma.c
@@ -6,6 +6,8 @@
 
 #include "fdlibm.h"
 
+#if !HAVE_FAST_FMAF
+
 #ifdef __STDC__
 	float fmaf(float x, float y, float z)
 #else
@@ -25,6 +27,8 @@
   return (float) (((double) x * (double) y) + (double) z);
 }
 
+#endif
+
 #ifdef _DOUBLE_IS_32BITS
 
 #ifdef __STDC__
diff --git a/newlib/libm/machine/arm/Makefile.am b/newlib/libm/machine/arm/Makefile.am
index 6574c56c9..09a266c43 100644
--- a/newlib/libm/machine/arm/Makefile.am
+++ b/newlib/libm/machine/arm/Makefile.am
@@ -10,12 +10,14 @@ LIB_SOURCES = \
 	ef_sqrt.c \
 	s_ceil.c \
 	s_floor.c \
+	s_fma.c \
 	s_nearbyint.c \
 	s_rint.c \
 	s_round.c \
 	s_trunc.c \
 	sf_ceil.c \
 	sf_floor.c \
+	sf_fma.c \
 	sf_nearbyint.c \
 	sf_rint.c \
 	sf_round.c \
diff --git a/newlib/libm/machine/arm/Makefile.in b/newlib/libm/machine/arm/Makefile.in
index 63de93443..16f5773d1 100644
--- a/newlib/libm/machine/arm/Makefile.in
+++ b/newlib/libm/machine/arm/Makefile.in
@@ -73,9 +73,11 @@ lib_a_AR = $(AR) $(ARFLAGS)
 lib_a_LIBADD =
 am__objects_1 = lib_a-e_sqrt.$(OBJEXT) lib_a-ef_sqrt.$(OBJEXT) \
 	lib_a-s_ceil.$(OBJEXT) lib_a-s_floor.$(OBJEXT) \
+	lib_a-s_fma.$(OBJEXT) \
 	lib_a-s_nearbyint.$(OBJEXT) lib_a-s_rint.$(OBJEXT) \
 	lib_a-s_round.$(OBJEXT) lib_a-s_trunc.$(OBJEXT) \
 	lib_a-sf_ceil.$(OBJEXT) lib_a-sf_floor.$(OBJEXT) \
+	lib_a-sf_fma.$(OBJEXT) \
 	lib_a-sf_nearbyint.$(OBJEXT) lib_a-sf_rint.$(OBJEXT) \
 	lib_a-sf_round.$(OBJEXT) lib_a-sf_trunc.$(OBJEXT) \
 	lib_a-feclearexcept.$(OBJEXT) lib_a-fe_dfl_env.$(OBJEXT) \
@@ -216,12 +218,14 @@ LIB_SOURCES = \
 	ef_sqrt.c \
 	s_ceil.c \
 	s_floor.c \
+	s_fma.c \
 	s_nearbyint.c \
 	s_rint.c \
 	s_round.c \
 	s_trunc.c \
 	sf_ceil.c \
 	sf_floor.c \
+	sf_fma.c \
 	sf_nearbyint.c \
 	sf_rint.c \
 	sf_round.c \
@@ -342,6 +346,12 @@ lib_a-s_floor.o: s_floor.c
 lib_a-s_floor.obj: s_floor.c
 	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(lib_a_CFLAGS) $(CFLAGS) -c -o lib_a-s_floor.obj `if test -f 's_floor.c'; then $(CYGPATH_W) 's_floor.c'; else $(CYGPATH_W) '$(srcdir)/s_floor.c'; fi`
 
+lib_a-s_fma.o: s_fma.c
+	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(lib_a_CFLAGS) $(CFLAGS) -c -o lib_a-s_fma.o `test -f 's_fma.c' || echo '$(srcdir)/'`s_fma.c
+
+lib_a-s_fma.obj: s_fma.c
+	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(lib_a_CFLAGS) $(CFLAGS) -c -o lib_a-s_fma.obj `if test -f 's_fma.c'; then $(CYGPATH_W) 's_fma.c'; else $(CYGPATH_W) '$(srcdir)/s_fma.c'; fi`
+
 lib_a-s_nearbyint.o: s_nearbyint.c
 	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(lib_a_CFLAGS) $(CFLAGS) -c -o lib_a-s_nearbyint.o `test -f 's_nearbyint.c' || echo '$(srcdir)/'`s_nearbyint.c
 
@@ -378,6 +388,12 @@ lib_a-sf_floor.o: sf_floor.c
 lib_a-sf_floor.obj: sf_floor.c
 	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(lib_a_CFLAGS) $(CFLAGS) -c -o lib_a-sf_floor.obj `if test -f 'sf_floor.c'; then $(CYGPATH_W) 'sf_floor.c'; else $(CYGPATH_W) '$(srcdir)/sf_floor.c'; fi`
 
+lib_a-sf_fma.o: sf_fma.c
+	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(lib_a_CFLAGS) $(CFLAGS) -c -o lib_a-sf_fma.o `test -f 'sf_fma.c' || echo '$(srcdir)/'`sf_fma.c
+
+lib_a-sf_fma.obj: sf_fma.c
+	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(lib_a_CFLAGS) $(CFLAGS) -c -o lib_a-sf_fma.obj `if test -f 'sf_fma.c'; then $(CYGPATH_W) 'sf_fma.c'; else $(CYGPATH_W) '$(srcdir)/sf_fma.c'; fi`
+
 lib_a-sf_nearbyint.o: sf_nearbyint.c
 	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(lib_a_CFLAGS) $(CFLAGS) -c -o lib_a-sf_nearbyint.o `test -f 'sf_nearbyint.c' || echo '$(srcdir)/'`sf_nearbyint.c
 
diff --git a/newlib/libm/machine/arm/s_fma.c b/newlib/libm/machine/arm/s_fma.c
new file mode 100644
index 000000000..f945419b5
--- /dev/null
+++ b/newlib/libm/machine/arm/s_fma.c
@@ -0,0 +1,48 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright © 2020 Keith Packard
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above
+ *    copyright notice, this list of conditions and the following
+ *    disclaimer in the documentation and/or other materials provided
+ *    with the distribution.
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <math.h>
+#include "math_config.h"
+
+#if HAVE_FAST_FMA
+
+double
+fma (double x, double y, double z)
+{
+  asm ("vfma.f64 %P0, %P1, %P2" : "=w" (z) : "w" (x), "w" (y));
+  return z;
+}
+
+#endif
diff --git a/newlib/libm/machine/arm/sf_fma.c b/newlib/libm/machine/arm/sf_fma.c
new file mode 100644
index 000000000..4befd9017
--- /dev/null
+++ b/newlib/libm/machine/arm/sf_fma.c
@@ -0,0 +1,48 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright © 2020 Keith Packard
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above
+ *    copyright notice, this list of conditions and the following
+ *    disclaimer in the documentation and/or other materials provided
+ *    with the distribution.
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <math.h>
+#include "math_config.h"
+
+#if HAVE_FAST_FMAF
+
+float
+fmaf (float x, float y, float z)
+{
+  asm ("vfma.f32 %0, %1, %2" : "=t" (z) : "t" (x), "t" (y));
+  return z;
+}
+
+#endif
-- 
2.28.0


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-08-08 22:34 [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA Keith Packard
                   ` (2 preceding siblings ...)
  2020-08-08 22:34 ` [PATCH 3/3] libm/machine/arm: Add optimized fmaf and fma when available Keith Packard
@ 2020-08-10  9:30 ` Corinna Vinschen
  2020-08-10 14:43   ` Szabolcs Nagy
  2020-08-10 19:06 ` Corinna Vinschen
  2020-09-01 16:32 ` Sebastian Huber
  5 siblings, 1 reply; 23+ messages in thread
From: Corinna Vinschen @ 2020-08-10  9:30 UTC (permalink / raw)
  To: Keith Packard; +Cc: newlib, Szabolcs Nagy

Hi Szabolcs,

ok to push?


Thanks,
Corinna

On Aug  8 15:34, Keith Packard via Newlib wrote:
> I added some new test configurations to my CI system for picolibc and
> discovered that when the new math code was built on 32-bit ARM
> processors with only single-precision floating hardware, several math
> functions were returning imprecise results. I got the expected results
> on processors with no FPU and on processors with both 32- and 64- bit
> FPUs.
> 
> I discovered that the affected functions were using the 'fma' function
> on this hardware, even though (lacking 64-bit HW support), that
> function was being emulated without the required precision.
> 
> This all boiled down to math_config.h incorrectly detecting 64-bit FMA
> support on ARM processors.
> 
> This patch series contains three changes:
> 
>  1. fix the fast FMA process so that 32-bit ARM processors without 64-bit FMA
>     support don't use 'fma' for the new math functions
> 
>  2. Add detection of fast FMAF, which 32-bit ARM processors with only
>     32-bit FPUs *do* support.
> 
>  3. Add ARM versions of fma and fmaf which are used when those
>     instructions are available.
> 

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-08-10  9:30 ` [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA Corinna Vinschen
@ 2020-08-10 14:43   ` Szabolcs Nagy
  2020-08-10 15:19     ` Keith Packard
  0 siblings, 1 reply; 23+ messages in thread
From: Szabolcs Nagy @ 2020-08-10 14:43 UTC (permalink / raw)
  To: Keith Packard, newlib

The 08/10/2020 11:30, Corinna Vinschen wrote:
> Hi Szabolcs,
> 
> ok to push?
> 

this looks ok.

i would have used the arm specific macros
( __ARM_FEATURE_FMA, __ARM_FP) directly
in arm specific code.

but using HAVE_FAST_FMA{F} works too.
(note that these macros currently only
do something useful on aarch64 and arm.)


> 
> Thanks,
> Corinna
> 
> On Aug  8 15:34, Keith Packard via Newlib wrote:
> > I added some new test configurations to my CI system for picolibc and
> > discovered that when the new math code was built on 32-bit ARM
> > processors with only single-precision floating hardware, several math
> > functions were returning imprecise results. I got the expected results
> > on processors with no FPU and on processors with both 32- and 64- bit
> > FPUs.
> > 
> > I discovered that the affected functions were using the 'fma' function
> > on this hardware, even though (lacking 64-bit HW support), that
> > function was being emulated without the required precision.
> > 
> > This all boiled down to math_config.h incorrectly detecting 64-bit FMA
> > support on ARM processors.
> > 
> > This patch series contains three changes:
> > 
> >  1. fix the fast FMA process so that 32-bit ARM processors without 64-bit FMA
> >     support don't use 'fma' for the new math functions
> > 
> >  2. Add detection of fast FMAF, which 32-bit ARM processors with only
> >     32-bit FPUs *do* support.
> > 
> >  3. Add ARM versions of fma and fmaf which are used when those
> >     instructions are available.
> > 
> 
> -- 
> Corinna Vinschen
> Cygwin Maintainer
> Red Hat
> 

-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-08-10 14:43   ` Szabolcs Nagy
@ 2020-08-10 15:19     ` Keith Packard
  0 siblings, 0 replies; 23+ messages in thread
From: Keith Packard @ 2020-08-10 15:19 UTC (permalink / raw)
  To: Szabolcs Nagy, newlib

[-- Attachment #1: Type: text/plain, Size: 248 bytes --]

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:

> but using HAVE_FAST_FMA{F} works too.
> (note that these macros currently only
> do something useful on aarch64 and arm.)

I've got a patch for RISC-V FMA support in the works.

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-08-08 22:34 [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA Keith Packard
                   ` (3 preceding siblings ...)
  2020-08-10  9:30 ` [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA Corinna Vinschen
@ 2020-08-10 19:06 ` Corinna Vinschen
  2020-09-01 16:32 ` Sebastian Huber
  5 siblings, 0 replies; 23+ messages in thread
From: Corinna Vinschen @ 2020-08-10 19:06 UTC (permalink / raw)
  To: Keith Packard; +Cc: newlib

On Aug  8 15:34, Keith Packard via Newlib wrote:
> I added some new test configurations to my CI system for picolibc and
> discovered that when the new math code was built on 32-bit ARM
> processors with only single-precision floating hardware, several math
> functions were returning imprecise results. I got the expected results
> on processors with no FPU and on processors with both 32- and 64- bit
> FPUs.
> 
> I discovered that the affected functions were using the 'fma' function
> on this hardware, even though (lacking 64-bit HW support), that
> function was being emulated without the required precision.
> 
> This all boiled down to math_config.h incorrectly detecting 64-bit FMA
> support on ARM processors.
> 
> This patch series contains three changes:
> 
>  1. fix the fast FMA process so that 32-bit ARM processors without 64-bit FMA
>     support don't use 'fma' for the new math functions
> 
>  2. Add detection of fast FMAF, which 32-bit ARM processors with only
>     32-bit FPUs *do* support.
> 
>  3. Add ARM versions of fma and fmaf which are used when those
>     instructions are available.
> 

Pushed.  I just regen'ed newlib/libm/machine/arm/Makefile.in.


Thanks,
Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-08-08 22:34 [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA Keith Packard
                   ` (4 preceding siblings ...)
  2020-08-10 19:06 ` Corinna Vinschen
@ 2020-09-01 16:32 ` Sebastian Huber
  2020-09-01 17:21   ` Sebastian Huber
  5 siblings, 1 reply; 23+ messages in thread
From: Sebastian Huber @ 2020-09-01 16:32 UTC (permalink / raw)
  To: Keith Packard, newlib

Hello,

with the latest Newlib, I get a linker error in the RTEMS test suite:


undefined reference to `fma'
undefined reference to `fmaf'

The following machine flags are used:

'-march=armv7-a', '-mthumb', '-mfpu=neon', '-mfloat-abi=hard', 
'-mtune=cortex-a9'

It seems to be missing in the corresponding multilib:

nm /build/rtems/6/arm-rtems6/lib/thumb/armv7-a+simd/hard/libm.a | grep fma
lib_a-fmal.o:
          U fma
00000000 T fmal
lib_a-fmaxl.o:
          U fmax
00000000 T fmaxl
lib_a-s_fma.o:
lib_a-s_fmax.o:
00000000 T fmax
lib_a-sf_fma.o:
lib_a-sf_fmax.o:
00000000 T fmaxf



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-01 16:32 ` Sebastian Huber
@ 2020-09-01 17:21   ` Sebastian Huber
  2020-09-01 18:04     ` Sebastian Huber
  2020-09-01 19:50     ` Keith Packard
  0 siblings, 2 replies; 23+ messages in thread
From: Sebastian Huber @ 2020-09-01 17:21 UTC (permalink / raw)
  To: Keith Packard, newlib

On 01/09/2020 18:32, Sebastian Huber wrote:

> Hello,
>
> with the latest Newlib, I get a linker error in the RTEMS test suite:
>
>
> undefined reference to `fma'
> undefined reference to `fmaf'
>
> The following machine flags are used:
>
> '-march=armv7-a', '-mthumb', '-mfpu=neon', '-mfloat-abi=hard', 
> '-mtune=cortex-a9'
>
> It seems to be missing in the corresponding multilib:
>
> nm /build/rtems/6/arm-rtems6/lib/thumb/armv7-a+simd/hard/libm.a | grep 
> fma
> lib_a-fmal.o:
>          U fma
> 00000000 T fmal
> lib_a-fmaxl.o:
>          U fmax
> 00000000 T fmaxl
> lib_a-s_fma.o:
> lib_a-s_fmax.o:
> 00000000 T fmax
> lib_a-sf_fma.o:
> lib_a-sf_fmax.o:
> 00000000 T fmaxf
>
It seems to be present in only some multilibs:

for i in $(find /build/rtems/6/arm-rtems6/lib -name libm.a); do echo $i 
; nm --defined-only $i | grep 'T.*\<fma\>'; done
/build/rtems/6/arm-rtems6/lib/eb/thumb/armv7-r+fp/hard/libm.a
/build/rtems/6/arm-rtems6/lib/eb/thumb/armv7-r/libm.a
/build/rtems/6/arm-rtems6/lib/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/cortex-m4/hard/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/cortex-m3/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/cortex-m7/hard/libm.a
00000000 T fma
/build/rtems/6/arm-rtems6/lib/thumb/armv6-m/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/armv7-r+fp/hard/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/armv7-r/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/cortex-m4+nofp/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/armv7-a+simd/hard/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/armv7-a/libm.a
/build/rtems/6/arm-rtems6/lib/armv5te+fp/hard/libm.a

for i in $(find /build/rtems/6/arm-rtems6/lib -name libm.a); do echo $i 
; nm --defined-only $i | grep 'T.*\<fmaf\>'; done
/build/rtems/6/arm-rtems6/lib/eb/thumb/armv7-r+fp/hard/libm.a
/build/rtems/6/arm-rtems6/lib/eb/thumb/armv7-r/libm.a
/build/rtems/6/arm-rtems6/lib/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/cortex-m4/hard/libm.a
00000000 T fmaf
/build/rtems/6/arm-rtems6/lib/thumb/cortex-m3/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/cortex-m7/hard/libm.a
00000000 T fmaf
/build/rtems/6/arm-rtems6/lib/thumb/armv6-m/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/armv7-r+fp/hard/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/armv7-r/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/cortex-m4+nofp/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/armv7-a+simd/hard/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/armv7-a/libm.a
/build/rtems/6/arm-rtems6/lib/armv5te+fp/hard/libm.a


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-01 17:21   ` Sebastian Huber
@ 2020-09-01 18:04     ` Sebastian Huber
  2020-09-01 19:28       ` Keith Packard
  2020-09-01 19:50     ` Keith Packard
  1 sibling, 1 reply; 23+ messages in thread
From: Sebastian Huber @ 2020-09-01 18:04 UTC (permalink / raw)
  To: Keith Packard, newlib

On 01/09/2020 19:21, Sebastian Huber wrote:

> On 01/09/2020 18:32, Sebastian Huber wrote:
>
>> Hello,
>>
>> with the latest Newlib, I get a linker error in the RTEMS test suite:
>>
>>
>> undefined reference to `fma'
>> undefined reference to `fmaf'
>>
>> The following machine flags are used:
>>
>> '-march=armv7-a', '-mthumb', '-mfpu=neon', '-mfloat-abi=hard', 
>> '-mtune=cortex-a9'
>>
>> It seems to be missing in the corresponding multilib:
>>
>> nm /build/rtems/6/arm-rtems6/lib/thumb/armv7-a+simd/hard/libm.a | 
>> grep fma
>> lib_a-fmal.o:
>>          U fma
>> 00000000 T fmal
>> lib_a-fmaxl.o:
>>          U fmax
>> 00000000 T fmaxl
>> lib_a-s_fma.o:
>> lib_a-s_fmax.o:
>> 00000000 T fmax
>> lib_a-sf_fma.o:
>> lib_a-sf_fmax.o:
>> 00000000 T fmaxf
>>
> It seems to be present in only some multilibs:
>
> for i in $(find /build/rtems/6/arm-rtems6/lib -name libm.a); do echo 
> $i ; nm --defined-only $i | grep 'T.*\<fma\>'; done
> /build/rtems/6/arm-rtems6/lib/eb/thumb/armv7-r+fp/hard/libm.a
> /build/rtems/6/arm-rtems6/lib/eb/thumb/armv7-r/libm.a
> /build/rtems/6/arm-rtems6/lib/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/cortex-m4/hard/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/cortex-m3/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/cortex-m7/hard/libm.a
> 00000000 T fma
> /build/rtems/6/arm-rtems6/lib/thumb/armv6-m/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/armv7-r+fp/hard/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/armv7-r/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/cortex-m4+nofp/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/armv7-a+simd/hard/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/armv7-a/libm.a
> /build/rtems/6/arm-rtems6/lib/armv5te+fp/hard/libm.a
>
> for i in $(find /build/rtems/6/arm-rtems6/lib -name libm.a); do echo 
> $i ; nm --defined-only $i | grep 'T.*\<fmaf\>'; done
> /build/rtems/6/arm-rtems6/lib/eb/thumb/armv7-r+fp/hard/libm.a
> /build/rtems/6/arm-rtems6/lib/eb/thumb/armv7-r/libm.a
> /build/rtems/6/arm-rtems6/lib/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/cortex-m4/hard/libm.a
> 00000000 T fmaf
> /build/rtems/6/arm-rtems6/lib/thumb/cortex-m3/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/cortex-m7/hard/libm.a
> 00000000 T fmaf
> /build/rtems/6/arm-rtems6/lib/thumb/armv6-m/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/armv7-r+fp/hard/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/armv7-r/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/cortex-m4+nofp/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/armv7-a+simd/hard/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/armv7-a/libm.a
> /build/rtems/6/arm-rtems6/lib/armv5te+fp/hard/libm.a

I think the problem is somewhere in the build system:

find -name s_fma.c
./newlib/libm/machine/arm/s_fma.c
./newlib/libm/machine/aarch64/s_fma.c
./newlib/libm/machine/riscv/s_fma.c
./newlib/libm/machine/spu/s_fma.c
./newlib/libm/common/s_fma.c

I guess the machine-specific file overrides the common file. If the 
machine-specific file is empty due to pre-processor magic, then the 
default implementation is still not present.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-01 18:04     ` Sebastian Huber
@ 2020-09-01 19:28       ` Keith Packard
  2020-09-01 21:16         ` Joseph Myers
  0 siblings, 1 reply; 23+ messages in thread
From: Keith Packard @ 2020-09-01 19:28 UTC (permalink / raw)
  To: Sebastian Huber, newlib

[-- Attachment #1: Type: text/plain, Size: 1407 bytes --]

Sebastian Huber <sebastian.huber@embedded-brains.de> writes:

> I think the problem is somewhere in the build system:
>
> find -name s_fma.c
> ./newlib/libm/machine/arm/s_fma.c
> ./newlib/libm/machine/aarch64/s_fma.c
> ./newlib/libm/machine/riscv/s_fma.c
> ./newlib/libm/machine/spu/s_fma.c
> ./newlib/libm/common/s_fma.c
>
> I guess the machine-specific file overrides the common file. If the 
> machine-specific file is empty due to pre-processor magic, then the 
> default implementation is still not present.

newlib shouldn't be calling fma if the underlying hardware support isn't
present -- fma is used in some math functions to improve performance
where the code can take full advantage of the additional precision of
the intermediate value.

Are you using fma directly? If your hardware supports it, the C compiler
should be directly emitting the relevant instruction sequence so you
shouldn't be seeing an undefined function appear.

If not, then one of the two versions of fma should be getting compiled
as they have opposite tests -- newlib/libm/machine/arm/s_fma.c checks
for '#if HAVE_FAST_FMA' while newlib/libm/common/s_fma.c checks for
'#if !HAVE_FAST_FMA'.

I recently did some work in this area, so it's possible I broke
something in your environment that I didn't catch in mine; I don't test
newlib builds, only downstream picolibc builds.

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-01 17:21   ` Sebastian Huber
  2020-09-01 18:04     ` Sebastian Huber
@ 2020-09-01 19:50     ` Keith Packard
  1 sibling, 0 replies; 23+ messages in thread
From: Keith Packard @ 2020-09-01 19:50 UTC (permalink / raw)
  To: Sebastian Huber, newlib

[-- Attachment #1: Type: text/plain, Size: 855 bytes --]

Sebastian Huber <sebastian.huber@embedded-brains.de> writes:

> It seems to be present in only some multilibs:

I did some more digging -- the 'common' one is getting built, but the
resulting math library doesn't have it included, which (as you suggest)
indicates a problem in the build system. It turns out that the autotools
build requires that all filenames across the whole math library must be
unique; having 's_fma.c' in both common and machine/arm causes the one
from common to be overwritten by the one in machine/arm due to the
manual construction of libm.a from the constituent sub-libraries.

As all of my testing was using meson instead of autotools, I guess I
shouldn't be surprised that I broke the autotools build.

I've sent a patch that renames libm/machine/arm/*fma.c and that appears
to fix the problem.

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-01 19:28       ` Keith Packard
@ 2020-09-01 21:16         ` Joseph Myers
  2020-09-01 23:06           ` Keith Packard
  0 siblings, 1 reply; 23+ messages in thread
From: Joseph Myers @ 2020-09-01 21:16 UTC (permalink / raw)
  To: Keith Packard; +Cc: Sebastian Huber, newlib

On Tue, 1 Sep 2020, Keith Packard via Newlib wrote:

> If not, then one of the two versions of fma should be getting compiled
> as they have opposite tests -- newlib/libm/machine/arm/s_fma.c checks
> for '#if HAVE_FAST_FMA' while newlib/libm/common/s_fma.c checks for
> '#if !HAVE_FAST_FMA'.

But note that newlib/libm/common/s_fma.c doesn't actually do anything 
useful; it's not a fused operation.  Implementing correct fma in software 
is highly nontrivial, especially when you want to handle exceptions and 
rounding modes correctly (including machine-specific differences in 
whether tininess is detected before or after rounding).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-01 21:16         ` Joseph Myers
@ 2020-09-01 23:06           ` Keith Packard
  2020-09-02  4:41             ` Sebastian Huber
  0 siblings, 1 reply; 23+ messages in thread
From: Keith Packard @ 2020-09-01 23:06 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Sebastian Huber, newlib

[-- Attachment #1: Type: text/plain, Size: 836 bytes --]

Joseph Myers <joseph@codesourcery.com> writes:

> But note that newlib/libm/common/s_fma.c doesn't actually do anything 
> useful; it's not a fused operation.  Implementing correct fma in software 
> is highly nontrivial, especially when you want to handle exceptions and 
> rounding modes correctly (including machine-specific differences in 
> whether tininess is detected before or after rounding).

Should we just stop providing the generic fma/fmaf implementations? That
seems like a good idea to me as it will prevent applications from
getting the wrong answer.

The fmaf one does offer increased precision by doing the operation in
double instead of float, which is 'different' from doing it in float,
but it still gets a different answer from 'a * b + c'. It also gets the
wrong exception status.

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-01 23:06           ` Keith Packard
@ 2020-09-02  4:41             ` Sebastian Huber
  2020-09-02  5:25               ` Keith Packard
  2020-09-02 17:12               ` Joseph Myers
  0 siblings, 2 replies; 23+ messages in thread
From: Sebastian Huber @ 2020-09-02  4:41 UTC (permalink / raw)
  To: Keith Packard, Joseph Myers; +Cc: newlib

On 02/09/2020 01:06, Keith Packard wrote:
> Joseph Myers <joseph@codesourcery.com> writes:
> 
>> But note that newlib/libm/common/s_fma.c doesn't actually do anything
>> useful; it's not a fused operation.  Implementing correct fma in software
>> is highly nontrivial, especially when you want to handle exceptions and
>> rounding modes correctly (including machine-specific differences in
>> whether tininess is detected before or after rounding).
> 
> Should we just stop providing the generic fma/fmaf implementations? That
> seems like a good idea to me as it will prevent applications from
> getting the wrong answer.
> 
> The fmaf one does offer increased precision by doing the operation in
> double instead of float, which is 'different' from doing it in float,
> but it still gets a different answer from 'a * b + c'. It also gets the
> wrong exception status.

Our failing test is pretty basic, it just checks if fma() and fmaf() 
library functions are present as per C99. The glibc offers also a simple 
default implementation, for example:

https://sourceware.org/git/?p=glibc.git;a=blob;f=math/s_fma.c;h=4d73af4f65d511594b2395d032a135721c578484;hb=HEAD

-- 
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.huber@embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-02  4:41             ` Sebastian Huber
@ 2020-09-02  5:25               ` Keith Packard
  2020-09-02  5:35                 ` Keith Packard
  2020-09-02 17:12               ` Joseph Myers
  1 sibling, 1 reply; 23+ messages in thread
From: Keith Packard @ 2020-09-02  5:25 UTC (permalink / raw)
  To: Sebastian Huber, Joseph Myers; +Cc: newlib

[-- Attachment #1: Type: text/plain, Size: 906 bytes --]

Sebastian Huber <sebastian.huber@embedded-brains.de> writes:

> Our failing test is pretty basic, it just checks if fma() and fmaf() 
> library functions are present as per C99. The glibc offers also a simple 
> default implementation, for example:
>
> https://sourceware.org/git/?p=glibc.git;a=blob;f=math/s_fma.c;h=4d73af4f65d511594b2395d032a135721c578484;hb=HEAD

That implementation violates the spec though because it does two
binary operations involving two roundings, so you get a different answer
than you would with a true fma.

Is it better to implement the function incorrectly or better to not
implement it at all? If your hardware doesn't support the operation,
then doing this in software would be a lot slower than adapting your
algorithm to deal with a sequence of binary operations, even though you
will likely need more of them to reach the same accuracy.

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-02  5:25               ` Keith Packard
@ 2020-09-02  5:35                 ` Keith Packard
  0 siblings, 0 replies; 23+ messages in thread
From: Keith Packard @ 2020-09-02  5:35 UTC (permalink / raw)
  To: Sebastian Huber, Joseph Myers; +Cc: newlib

[-- Attachment #1: Type: text/plain, Size: 1029 bytes --]

"Keith Packard" <keithp@keithp.com> writes:

> That implementation violates the spec though because it does two
> binary operations involving two roundings, so you get a different answer
> than you would with a true fma.

Hrm. C99 and C17 both have macros to detect whether fma is 'fast' or
not: FP_FAST_FMA, FP_FAST_FMAF and FP_FAST_FMAL. This page:

        https://en.cppreference.com/w/cpp/numeric/math/fma

has a nice parenthetical comment:

 "If ... defined, the function std::fma evaluates faster (in addition to
  being more precise) than the expression x*y+z."

If C99 or C17 included 'in addition to being more precise', it would
be much more obvious to me that we should include the fall-back fma
implementation.

So, we should at least change the CPP defines that we have in
math_config.h to match the C99 and C17 specs.

Is it reasonable to assume that applications which care about accuracy
will also be checking these defines and using them as the C++ standard
appears to?

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-02  4:41             ` Sebastian Huber
  2020-09-02  5:25               ` Keith Packard
@ 2020-09-02 17:12               ` Joseph Myers
  2020-09-02 17:59                 ` Sebastian Huber
  1 sibling, 1 reply; 23+ messages in thread
From: Joseph Myers @ 2020-09-02 17:12 UTC (permalink / raw)
  To: Sebastian Huber; +Cc: Keith Packard, newlib

On Wed, 2 Sep 2020, Sebastian Huber wrote:

> https://sourceware.org/git/?p=glibc.git;a=blob;f=math/s_fma.c;h=4d73af4f65d511594b2395d032a135721c578484;hb=HEAD

No glibc configurations use that; they all use either a hardware 
instruction, an implementation based on sticky rounding as described by 
Boldo and Melquiond, or, in the absence of hardware exceptions and 
rounding modes, a soft-fp implementation.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-02 17:12               ` Joseph Myers
@ 2020-09-02 17:59                 ` Sebastian Huber
  2020-09-02 20:39                   ` Keith Packard
  0 siblings, 1 reply; 23+ messages in thread
From: Sebastian Huber @ 2020-09-02 17:59 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Keith Packard, newlib

On 02/09/2020 19:12, Joseph Myers wrote:

> On Wed, 2 Sep 2020, Sebastian Huber wrote:
>
>> https://sourceware.org/git/?p=glibc.git;a=blob;f=math/s_fma.c;h=4d73af4f65d511594b2395d032a135721c578484;hb=HEAD
> No glibc configurations use that; they all use either a hardware
> instruction, an implementation based on sticky rounding as described by
> Boldo and Melquiond, or, in the absence of hardware exceptions and
> rounding modes, a soft-fp implementation.

Sorry for pointing to this dead code in glibc.

Maybe we can use the FreeBSD implementation:

https://github.com/freebsd/freebsd/blob/master/lib/msun/src/s_fma.c

It is probably also used by Bionic:

https://android.googlesource.com/platform/bionic/+/refs/heads/master/libm/upstream-freebsd/lib/msun/src/s_fma.c


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-02 17:59                 ` Sebastian Huber
@ 2020-09-02 20:39                   ` Keith Packard
  0 siblings, 0 replies; 23+ messages in thread
From: Keith Packard @ 2020-09-02 20:39 UTC (permalink / raw)
  To: Sebastian Huber, Joseph Myers; +Cc: newlib

[-- Attachment #1: Type: text/plain, Size: 272 bytes --]

Sebastian Huber <sebastian.huber@embedded-brains.de> writes:

> Maybe we can use the FreeBSD implementation:

And along with that, define the FP_FAST_FMA macros so that applications
can avoid this correct-but-slow version unless absolutely necessary.

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-07 20:16   ` Brian Inglis
@ 2020-09-07 22:23     ` Keith Packard
  0 siblings, 0 replies; 23+ messages in thread
From: Keith Packard @ 2020-09-07 22:23 UTC (permalink / raw)
  To: Brian Inglis, newlib

[-- Attachment #1: Type: text/plain, Size: 612 bytes --]

Brian Inglis <Brian.Inglis@SystematicSw.ab.ca> writes:

> Can't the "super-smart" compiler use that information to work around your
> careful approach by conditionally skipping the FMA and conditionally return just
> z, or even unconditionally return z, as C makes no guarantees?
> And couldn't the "super-smart" instruction scheduler do similar at the hardware
> level?

I don't think that would be in conformance with the C specification
which says that arithmetic follows IEC 60559 that defines the various
exceptions and results. Now, if you enable -ffast-math, all bets are off...

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-07 17:16 ` Keith Packard
@ 2020-09-07 20:16   ` Brian Inglis
  2020-09-07 22:23     ` Keith Packard
  0 siblings, 1 reply; 23+ messages in thread
From: Brian Inglis @ 2020-09-07 20:16 UTC (permalink / raw)
  To: newlib

On 2020-09-07 11:16, Keith Packard via Newlib wrote:
> Eric Bresie via Newlib <newlib@sourceware.org> writes:
> 
>> Not directly related (and as I’m not really an expert on these things, nor able to change in any way) but was looking at the code mentioned and saw line like:
>>
>> if (x == 0.0 || y == 0.0)
>>
>> return (x * y + z);
>>
>> If either x or y is zero would it be better to just return z and avoid
>> an extra multiplication operation here?
> 
> You want to compute the correct result and get the right exceptions in
> all of the delightful IEEE754 corner cases (e.g. 0 × ∞). It's easier to
> just execute the two operations than to try and synthesize the right
> result (which is implementation-dependent in the case of 0 × ∞ +
> qNaN). The key here is that if x or y is zero, then you won't lose any
> intermediate precision by performing the operation this way.

Can't the "super-smart" compiler use that information to work around your
careful approach by conditionally skipping the FMA and conditionally return just
z, or even unconditionally return z, as C makes no guarantees?
And couldn't the "super-smart" instruction scheduler do similar at the hardware
level?

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-09-07 22:23 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-08 22:34 [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA Keith Packard
2020-08-08 22:34 ` [PATCH 1/3] libm: ARM without HW double does not have fast FMA Keith Packard
2020-08-08 22:34 ` [PATCH 2/3] libm: Detect fast fmaf support Keith Packard
2020-08-08 22:34 ` [PATCH 3/3] libm/machine/arm: Add optimized fmaf and fma when available Keith Packard
2020-08-10  9:30 ` [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA Corinna Vinschen
2020-08-10 14:43   ` Szabolcs Nagy
2020-08-10 15:19     ` Keith Packard
2020-08-10 19:06 ` Corinna Vinschen
2020-09-01 16:32 ` Sebastian Huber
2020-09-01 17:21   ` Sebastian Huber
2020-09-01 18:04     ` Sebastian Huber
2020-09-01 19:28       ` Keith Packard
2020-09-01 21:16         ` Joseph Myers
2020-09-01 23:06           ` Keith Packard
2020-09-02  4:41             ` Sebastian Huber
2020-09-02  5:25               ` Keith Packard
2020-09-02  5:35                 ` Keith Packard
2020-09-02 17:12               ` Joseph Myers
2020-09-02 17:59                 ` Sebastian Huber
2020-09-02 20:39                   ` Keith Packard
2020-09-01 19:50     ` Keith Packard
2020-09-07 14:09 Eric Bresie
2020-09-07 17:16 ` Keith Packard
2020-09-07 20:16   ` Brian Inglis
2020-09-07 22:23     ` Keith Packard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).