Re: Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA

public inbox for newlib@sourceware.org
 help / color / mirror / Atom feed

* Re: Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
@ 2020-09-07 14:09 Eric Bresie
  2020-09-07 17:16 ` Keith Packard
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Bresie @ 2020-09-07 14:09 UTC (permalink / raw)
  To: newlib

Not directly related (and as I’m not really an expert on these things, nor able to change in any way) but was looking at the code mentioned and saw line like:

if (x == 0.0 || y == 0.0)

return (x * y + z);

If either x or y is zero would it be better to just return z and avoid an extra multiplication operation here?
Eric Bresie
Ebresie@gmail.com

> On September 2, 2020 at 12:59:43 PM CDT, Sebastian Huber <sebastian.huber@embedded-brains.de> wrote:
> On 02/09/2020 19:12, Joseph Myers wrote:
>
> > On Wed, 2 Sep 2020, Sebastian Huber wrote:
> >
> > > https://sourceware.org/git/?p=glibc.git;a=blob;f=math/s_fma.c;h=4d73af4f65d511594b2395d032a135721c578484;hb=HEAD
> > No glibc configurations use that; they all use either a hardware
> > instruction, an implementation based on sticky rounding as described by
> > Boldo and Melquiond, or, in the absence of hardware exceptions and
> > rounding modes, a soft-fp implementation.
>
> Sorry for pointing to this dead code in glibc.
>
> Maybe we can use the FreeBSD implementation:
>
> https://github.com/freebsd/freebsd/blob/master/lib/msun/src/s_fma.c
>
> It is probably also used by Bionic:
>
> https://android.googlesource.com/platform/bionic/+/refs/heads/master/libm/upstream-freebsd/lib/msun/src/s_fma.c
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-07 14:09 Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA Eric Bresie
@ 2020-09-07 17:16 ` Keith Packard
  2020-09-07 20:16   ` Brian Inglis
  0 siblings, 1 reply; 22+ messages in thread
From: Keith Packard @ 2020-09-07 17:16 UTC (permalink / raw)
  To: Eric Bresie, newlib

[-- Attachment #1: Type: text/plain, Size: 838 bytes --]

Eric Bresie via Newlib <newlib@sourceware.org> writes:

> Not directly related (and as I’m not really an expert on these things, nor able to change in any way) but was looking at the code mentioned and saw line like:
>
> if (x == 0.0 || y == 0.0)
>
> return (x * y + z);
>
> If either x or y is zero would it be better to just return z and avoid
> an extra multiplication operation here?

You want to compute the correct result and get the right exceptions in
all of the delightful IEEE754 corner cases (e.g. 0 × ∞). It's easier to
just execute the two operations than to try and synthesize the right
result (which is implementation-dependent in the case of 0 × ∞ +
qNaN). The key here is that if x or y is zero, then you won't lose any
intermediate precision by performing the operation this way.

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-07 17:16 ` Keith Packard
@ 2020-09-07 20:16   ` Brian Inglis
  2020-09-07 22:23     ` Keith Packard
  0 siblings, 1 reply; 22+ messages in thread
From: Brian Inglis @ 2020-09-07 20:16 UTC (permalink / raw)
  To: newlib

On 2020-09-07 11:16, Keith Packard via Newlib wrote:
> Eric Bresie via Newlib <newlib@sourceware.org> writes:
> 
>> Not directly related (and as I’m not really an expert on these things, nor able to change in any way) but was looking at the code mentioned and saw line like:
>>
>> if (x == 0.0 || y == 0.0)
>>
>> return (x * y + z);
>>
>> If either x or y is zero would it be better to just return z and avoid
>> an extra multiplication operation here?
> 
> You want to compute the correct result and get the right exceptions in
> all of the delightful IEEE754 corner cases (e.g. 0 × ∞). It's easier to
> just execute the two operations than to try and synthesize the right
> result (which is implementation-dependent in the case of 0 × ∞ +
> qNaN). The key here is that if x or y is zero, then you won't lose any
> intermediate precision by performing the operation this way.

Can't the "super-smart" compiler use that information to work around your
careful approach by conditionally skipping the FMA and conditionally return just
z, or even unconditionally return z, as C makes no guarantees?
And couldn't the "super-smart" instruction scheduler do similar at the hardware
level?

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-07 20:16   ` Brian Inglis
@ 2020-09-07 22:23     ` Keith Packard
  0 siblings, 0 replies; 22+ messages in thread
From: Keith Packard @ 2020-09-07 22:23 UTC (permalink / raw)
  To: Brian Inglis, newlib

[-- Attachment #1: Type: text/plain, Size: 612 bytes --]

Brian Inglis <Brian.Inglis@SystematicSw.ab.ca> writes:

> Can't the "super-smart" compiler use that information to work around your
> careful approach by conditionally skipping the FMA and conditionally return just
> z, or even unconditionally return z, as C makes no guarantees?
> And couldn't the "super-smart" instruction scheduler do similar at the hardware
> level?

I don't think that would be in conformance with the C specification
which says that arithmetic follows IEC 60559 that defines the various
exceptions and results. Now, if you enable -ffast-math, all bets are off...

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-02 17:59                 ` Sebastian Huber
@ 2020-09-02 20:39                   ` Keith Packard
  0 siblings, 0 replies; 22+ messages in thread
From: Keith Packard @ 2020-09-02 20:39 UTC (permalink / raw)
  To: Sebastian Huber, Joseph Myers; +Cc: newlib

[-- Attachment #1: Type: text/plain, Size: 272 bytes --]

Sebastian Huber <sebastian.huber@embedded-brains.de> writes:

> Maybe we can use the FreeBSD implementation:

And along with that, define the FP_FAST_FMA macros so that applications
can avoid this correct-but-slow version unless absolutely necessary.

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-02 17:12               ` Joseph Myers
@ 2020-09-02 17:59                 ` Sebastian Huber
  2020-09-02 20:39                   ` Keith Packard
  0 siblings, 1 reply; 22+ messages in thread
From: Sebastian Huber @ 2020-09-02 17:59 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Keith Packard, newlib

On 02/09/2020 19:12, Joseph Myers wrote:

> On Wed, 2 Sep 2020, Sebastian Huber wrote:
>
>> https://sourceware.org/git/?p=glibc.git;a=blob;f=math/s_fma.c;h=4d73af4f65d511594b2395d032a135721c578484;hb=HEAD
> No glibc configurations use that; they all use either a hardware
> instruction, an implementation based on sticky rounding as described by
> Boldo and Melquiond, or, in the absence of hardware exceptions and
> rounding modes, a soft-fp implementation.

Sorry for pointing to this dead code in glibc.

Maybe we can use the FreeBSD implementation:

https://github.com/freebsd/freebsd/blob/master/lib/msun/src/s_fma.c

It is probably also used by Bionic:

https://android.googlesource.com/platform/bionic/+/refs/heads/master/libm/upstream-freebsd/lib/msun/src/s_fma.c


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-02  4:41             ` Sebastian Huber
  2020-09-02  5:25               ` Keith Packard
@ 2020-09-02 17:12               ` Joseph Myers
  2020-09-02 17:59                 ` Sebastian Huber
  1 sibling, 1 reply; 22+ messages in thread
From: Joseph Myers @ 2020-09-02 17:12 UTC (permalink / raw)
  To: Sebastian Huber; +Cc: Keith Packard, newlib

On Wed, 2 Sep 2020, Sebastian Huber wrote:

> https://sourceware.org/git/?p=glibc.git;a=blob;f=math/s_fma.c;h=4d73af4f65d511594b2395d032a135721c578484;hb=HEAD

No glibc configurations use that; they all use either a hardware 
instruction, an implementation based on sticky rounding as described by 
Boldo and Melquiond, or, in the absence of hardware exceptions and 
rounding modes, a soft-fp implementation.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-02  5:25               ` Keith Packard
@ 2020-09-02  5:35                 ` Keith Packard
  0 siblings, 0 replies; 22+ messages in thread
From: Keith Packard @ 2020-09-02  5:35 UTC (permalink / raw)
  To: Sebastian Huber, Joseph Myers; +Cc: newlib

[-- Attachment #1: Type: text/plain, Size: 1029 bytes --]

"Keith Packard" <keithp@keithp.com> writes:

> That implementation violates the spec though because it does two
> binary operations involving two roundings, so you get a different answer
> than you would with a true fma.

Hrm. C99 and C17 both have macros to detect whether fma is 'fast' or
not: FP_FAST_FMA, FP_FAST_FMAF and FP_FAST_FMAL. This page:

        https://en.cppreference.com/w/cpp/numeric/math/fma

has a nice parenthetical comment:

 "If ... defined, the function std::fma evaluates faster (in addition to
  being more precise) than the expression x*y+z."

If C99 or C17 included 'in addition to being more precise', it would
be much more obvious to me that we should include the fall-back fma
implementation.

So, we should at least change the CPP defines that we have in
math_config.h to match the C99 and C17 specs.

Is it reasonable to assume that applications which care about accuracy
will also be checking these defines and using them as the C++ standard
appears to?

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-02  4:41             ` Sebastian Huber
@ 2020-09-02  5:25               ` Keith Packard
  2020-09-02  5:35                 ` Keith Packard
  2020-09-02 17:12               ` Joseph Myers
  1 sibling, 1 reply; 22+ messages in thread
From: Keith Packard @ 2020-09-02  5:25 UTC (permalink / raw)
  To: Sebastian Huber, Joseph Myers; +Cc: newlib

[-- Attachment #1: Type: text/plain, Size: 906 bytes --]

Sebastian Huber <sebastian.huber@embedded-brains.de> writes:

> Our failing test is pretty basic, it just checks if fma() and fmaf() 
> library functions are present as per C99. The glibc offers also a simple 
> default implementation, for example:
>
> https://sourceware.org/git/?p=glibc.git;a=blob;f=math/s_fma.c;h=4d73af4f65d511594b2395d032a135721c578484;hb=HEAD

That implementation violates the spec though because it does two
binary operations involving two roundings, so you get a different answer
than you would with a true fma.

Is it better to implement the function incorrectly or better to not
implement it at all? If your hardware doesn't support the operation,
then doing this in software would be a lot slower than adapting your
algorithm to deal with a sequence of binary operations, even though you
will likely need more of them to reach the same accuracy.

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-01 23:06           ` Keith Packard
@ 2020-09-02  4:41             ` Sebastian Huber
  2020-09-02  5:25               ` Keith Packard
  2020-09-02 17:12               ` Joseph Myers
  0 siblings, 2 replies; 22+ messages in thread
From: Sebastian Huber @ 2020-09-02  4:41 UTC (permalink / raw)
  To: Keith Packard, Joseph Myers; +Cc: newlib

On 02/09/2020 01:06, Keith Packard wrote:
> Joseph Myers <joseph@codesourcery.com> writes:
> 
>> But note that newlib/libm/common/s_fma.c doesn't actually do anything
>> useful; it's not a fused operation.  Implementing correct fma in software
>> is highly nontrivial, especially when you want to handle exceptions and
>> rounding modes correctly (including machine-specific differences in
>> whether tininess is detected before or after rounding).
> 
> Should we just stop providing the generic fma/fmaf implementations? That
> seems like a good idea to me as it will prevent applications from
> getting the wrong answer.
> 
> The fmaf one does offer increased precision by doing the operation in
> double instead of float, which is 'different' from doing it in float,
> but it still gets a different answer from 'a * b + c'. It also gets the
> wrong exception status.

Our failing test is pretty basic, it just checks if fma() and fmaf() 
library functions are present as per C99. The glibc offers also a simple 
default implementation, for example:

https://sourceware.org/git/?p=glibc.git;a=blob;f=math/s_fma.c;h=4d73af4f65d511594b2395d032a135721c578484;hb=HEAD

-- 
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.huber@embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-01 21:16         ` Joseph Myers
@ 2020-09-01 23:06           ` Keith Packard
  2020-09-02  4:41             ` Sebastian Huber
  0 siblings, 1 reply; 22+ messages in thread
From: Keith Packard @ 2020-09-01 23:06 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Sebastian Huber, newlib

[-- Attachment #1: Type: text/plain, Size: 836 bytes --]

Joseph Myers <joseph@codesourcery.com> writes:

> But note that newlib/libm/common/s_fma.c doesn't actually do anything 
> useful; it's not a fused operation.  Implementing correct fma in software 
> is highly nontrivial, especially when you want to handle exceptions and 
> rounding modes correctly (including machine-specific differences in 
> whether tininess is detected before or after rounding).

Should we just stop providing the generic fma/fmaf implementations? That
seems like a good idea to me as it will prevent applications from
getting the wrong answer.

The fmaf one does offer increased precision by doing the operation in
double instead of float, which is 'different' from doing it in float,
but it still gets a different answer from 'a * b + c'. It also gets the
wrong exception status.

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-01 19:28       ` Keith Packard
@ 2020-09-01 21:16         ` Joseph Myers
  2020-09-01 23:06           ` Keith Packard
  0 siblings, 1 reply; 22+ messages in thread
From: Joseph Myers @ 2020-09-01 21:16 UTC (permalink / raw)
  To: Keith Packard; +Cc: Sebastian Huber, newlib

On Tue, 1 Sep 2020, Keith Packard via Newlib wrote:

> If not, then one of the two versions of fma should be getting compiled
> as they have opposite tests -- newlib/libm/machine/arm/s_fma.c checks
> for '#if HAVE_FAST_FMA' while newlib/libm/common/s_fma.c checks for
> '#if !HAVE_FAST_FMA'.

But note that newlib/libm/common/s_fma.c doesn't actually do anything 
useful; it's not a fused operation.  Implementing correct fma in software 
is highly nontrivial, especially when you want to handle exceptions and 
rounding modes correctly (including machine-specific differences in 
whether tininess is detected before or after rounding).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-01 17:21   ` Sebastian Huber
  2020-09-01 18:04     ` Sebastian Huber
@ 2020-09-01 19:50     ` Keith Packard
  1 sibling, 0 replies; 22+ messages in thread
From: Keith Packard @ 2020-09-01 19:50 UTC (permalink / raw)
  To: Sebastian Huber, newlib

[-- Attachment #1: Type: text/plain, Size: 855 bytes --]

Sebastian Huber <sebastian.huber@embedded-brains.de> writes:

> It seems to be present in only some multilibs:

I did some more digging -- the 'common' one is getting built, but the
resulting math library doesn't have it included, which (as you suggest)
indicates a problem in the build system. It turns out that the autotools
build requires that all filenames across the whole math library must be
unique; having 's_fma.c' in both common and machine/arm causes the one
from common to be overwritten by the one in machine/arm due to the
manual construction of libm.a from the constituent sub-libraries.

As all of my testing was using meson instead of autotools, I guess I
shouldn't be surprised that I broke the autotools build.

I've sent a patch that renames libm/machine/arm/*fma.c and that appears
to fix the problem.

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-01 18:04     ` Sebastian Huber
@ 2020-09-01 19:28       ` Keith Packard
  2020-09-01 21:16         ` Joseph Myers
  0 siblings, 1 reply; 22+ messages in thread
From: Keith Packard @ 2020-09-01 19:28 UTC (permalink / raw)
  To: Sebastian Huber, newlib

[-- Attachment #1: Type: text/plain, Size: 1407 bytes --]

Sebastian Huber <sebastian.huber@embedded-brains.de> writes:

> I think the problem is somewhere in the build system:
>
> find -name s_fma.c
> ./newlib/libm/machine/arm/s_fma.c
> ./newlib/libm/machine/aarch64/s_fma.c
> ./newlib/libm/machine/riscv/s_fma.c
> ./newlib/libm/machine/spu/s_fma.c
> ./newlib/libm/common/s_fma.c
>
> I guess the machine-specific file overrides the common file. If the 
> machine-specific file is empty due to pre-processor magic, then the 
> default implementation is still not present.

newlib shouldn't be calling fma if the underlying hardware support isn't
present -- fma is used in some math functions to improve performance
where the code can take full advantage of the additional precision of
the intermediate value.

Are you using fma directly? If your hardware supports it, the C compiler
should be directly emitting the relevant instruction sequence so you
shouldn't be seeing an undefined function appear.

If not, then one of the two versions of fma should be getting compiled
as they have opposite tests -- newlib/libm/machine/arm/s_fma.c checks
for '#if HAVE_FAST_FMA' while newlib/libm/common/s_fma.c checks for
'#if !HAVE_FAST_FMA'.

I recently did some work in this area, so it's possible I broke
something in your environment that I didn't catch in mine; I don't test
newlib builds, only downstream picolibc builds.

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-01 17:21   ` Sebastian Huber
@ 2020-09-01 18:04     ` Sebastian Huber
  2020-09-01 19:28       ` Keith Packard
  2020-09-01 19:50     ` Keith Packard
  1 sibling, 1 reply; 22+ messages in thread
From: Sebastian Huber @ 2020-09-01 18:04 UTC (permalink / raw)
  To: Keith Packard, newlib

On 01/09/2020 19:21, Sebastian Huber wrote:

> On 01/09/2020 18:32, Sebastian Huber wrote:
>
>> Hello,
>>
>> with the latest Newlib, I get a linker error in the RTEMS test suite:
>>
>>
>> undefined reference to `fma'
>> undefined reference to `fmaf'
>>
>> The following machine flags are used:
>>
>> '-march=armv7-a', '-mthumb', '-mfpu=neon', '-mfloat-abi=hard', 
>> '-mtune=cortex-a9'
>>
>> It seems to be missing in the corresponding multilib:
>>
>> nm /build/rtems/6/arm-rtems6/lib/thumb/armv7-a+simd/hard/libm.a | 
>> grep fma
>> lib_a-fmal.o:
>>          U fma
>> 00000000 T fmal
>> lib_a-fmaxl.o:
>>          U fmax
>> 00000000 T fmaxl
>> lib_a-s_fma.o:
>> lib_a-s_fmax.o:
>> 00000000 T fmax
>> lib_a-sf_fma.o:
>> lib_a-sf_fmax.o:
>> 00000000 T fmaxf
>>
> It seems to be present in only some multilibs:
>
> for i in $(find /build/rtems/6/arm-rtems6/lib -name libm.a); do echo 
> $i ; nm --defined-only $i | grep 'T.*\<fma\>'; done
> /build/rtems/6/arm-rtems6/lib/eb/thumb/armv7-r+fp/hard/libm.a
> /build/rtems/6/arm-rtems6/lib/eb/thumb/armv7-r/libm.a
> /build/rtems/6/arm-rtems6/lib/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/cortex-m4/hard/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/cortex-m3/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/cortex-m7/hard/libm.a
> 00000000 T fma
> /build/rtems/6/arm-rtems6/lib/thumb/armv6-m/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/armv7-r+fp/hard/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/armv7-r/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/cortex-m4+nofp/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/armv7-a+simd/hard/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/armv7-a/libm.a
> /build/rtems/6/arm-rtems6/lib/armv5te+fp/hard/libm.a
>
> for i in $(find /build/rtems/6/arm-rtems6/lib -name libm.a); do echo 
> $i ; nm --defined-only $i | grep 'T.*\<fmaf\>'; done
> /build/rtems/6/arm-rtems6/lib/eb/thumb/armv7-r+fp/hard/libm.a
> /build/rtems/6/arm-rtems6/lib/eb/thumb/armv7-r/libm.a
> /build/rtems/6/arm-rtems6/lib/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/cortex-m4/hard/libm.a
> 00000000 T fmaf
> /build/rtems/6/arm-rtems6/lib/thumb/cortex-m3/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/cortex-m7/hard/libm.a
> 00000000 T fmaf
> /build/rtems/6/arm-rtems6/lib/thumb/armv6-m/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/armv7-r+fp/hard/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/armv7-r/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/cortex-m4+nofp/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/armv7-a+simd/hard/libm.a
> /build/rtems/6/arm-rtems6/lib/thumb/armv7-a/libm.a
> /build/rtems/6/arm-rtems6/lib/armv5te+fp/hard/libm.a

I think the problem is somewhere in the build system:

find -name s_fma.c
./newlib/libm/machine/arm/s_fma.c
./newlib/libm/machine/aarch64/s_fma.c
./newlib/libm/machine/riscv/s_fma.c
./newlib/libm/machine/spu/s_fma.c
./newlib/libm/common/s_fma.c

I guess the machine-specific file overrides the common file. If the 
machine-specific file is empty due to pre-processor magic, then the 
default implementation is still not present.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-09-01 16:32 ` Sebastian Huber
@ 2020-09-01 17:21   ` Sebastian Huber
  2020-09-01 18:04     ` Sebastian Huber
  2020-09-01 19:50     ` Keith Packard
  0 siblings, 2 replies; 22+ messages in thread
From: Sebastian Huber @ 2020-09-01 17:21 UTC (permalink / raw)
  To: Keith Packard, newlib

On 01/09/2020 18:32, Sebastian Huber wrote:

> Hello,
>
> with the latest Newlib, I get a linker error in the RTEMS test suite:
>
>
> undefined reference to `fma'
> undefined reference to `fmaf'
>
> The following machine flags are used:
>
> '-march=armv7-a', '-mthumb', '-mfpu=neon', '-mfloat-abi=hard', 
> '-mtune=cortex-a9'
>
> It seems to be missing in the corresponding multilib:
>
> nm /build/rtems/6/arm-rtems6/lib/thumb/armv7-a+simd/hard/libm.a | grep 
> fma
> lib_a-fmal.o:
>          U fma
> 00000000 T fmal
> lib_a-fmaxl.o:
>          U fmax
> 00000000 T fmaxl
> lib_a-s_fma.o:
> lib_a-s_fmax.o:
> 00000000 T fmax
> lib_a-sf_fma.o:
> lib_a-sf_fmax.o:
> 00000000 T fmaxf
>
It seems to be present in only some multilibs:

for i in $(find /build/rtems/6/arm-rtems6/lib -name libm.a); do echo $i 
; nm --defined-only $i | grep 'T.*\<fma\>'; done
/build/rtems/6/arm-rtems6/lib/eb/thumb/armv7-r+fp/hard/libm.a
/build/rtems/6/arm-rtems6/lib/eb/thumb/armv7-r/libm.a
/build/rtems/6/arm-rtems6/lib/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/cortex-m4/hard/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/cortex-m3/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/cortex-m7/hard/libm.a
00000000 T fma
/build/rtems/6/arm-rtems6/lib/thumb/armv6-m/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/armv7-r+fp/hard/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/armv7-r/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/cortex-m4+nofp/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/armv7-a+simd/hard/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/armv7-a/libm.a
/build/rtems/6/arm-rtems6/lib/armv5te+fp/hard/libm.a

for i in $(find /build/rtems/6/arm-rtems6/lib -name libm.a); do echo $i 
; nm --defined-only $i | grep 'T.*\<fmaf\>'; done
/build/rtems/6/arm-rtems6/lib/eb/thumb/armv7-r+fp/hard/libm.a
/build/rtems/6/arm-rtems6/lib/eb/thumb/armv7-r/libm.a
/build/rtems/6/arm-rtems6/lib/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/cortex-m4/hard/libm.a
00000000 T fmaf
/build/rtems/6/arm-rtems6/lib/thumb/cortex-m3/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/cortex-m7/hard/libm.a
00000000 T fmaf
/build/rtems/6/arm-rtems6/lib/thumb/armv6-m/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/armv7-r+fp/hard/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/armv7-r/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/cortex-m4+nofp/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/armv7-a+simd/hard/libm.a
/build/rtems/6/arm-rtems6/lib/thumb/armv7-a/libm.a
/build/rtems/6/arm-rtems6/lib/armv5te+fp/hard/libm.a


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-08-08 22:34 Keith Packard
  2020-08-10  9:30 ` Corinna Vinschen
  2020-08-10 19:06 ` Corinna Vinschen
@ 2020-09-01 16:32 ` Sebastian Huber
  2020-09-01 17:21   ` Sebastian Huber
  2 siblings, 1 reply; 22+ messages in thread
From: Sebastian Huber @ 2020-09-01 16:32 UTC (permalink / raw)
  To: Keith Packard, newlib

Hello,

with the latest Newlib, I get a linker error in the RTEMS test suite:


undefined reference to `fma'
undefined reference to `fmaf'

The following machine flags are used:

'-march=armv7-a', '-mthumb', '-mfpu=neon', '-mfloat-abi=hard', 
'-mtune=cortex-a9'

It seems to be missing in the corresponding multilib:

nm /build/rtems/6/arm-rtems6/lib/thumb/armv7-a+simd/hard/libm.a | grep fma
lib_a-fmal.o:
          U fma
00000000 T fmal
lib_a-fmaxl.o:
          U fmax
00000000 T fmaxl
lib_a-s_fma.o:
lib_a-s_fmax.o:
00000000 T fmax
lib_a-sf_fma.o:
lib_a-sf_fmax.o:
00000000 T fmaxf



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-08-08 22:34 Keith Packard
  2020-08-10  9:30 ` Corinna Vinschen
@ 2020-08-10 19:06 ` Corinna Vinschen
  2020-09-01 16:32 ` Sebastian Huber
  2 siblings, 0 replies; 22+ messages in thread
From: Corinna Vinschen @ 2020-08-10 19:06 UTC (permalink / raw)
  To: Keith Packard; +Cc: newlib

On Aug  8 15:34, Keith Packard via Newlib wrote:
> I added some new test configurations to my CI system for picolibc and
> discovered that when the new math code was built on 32-bit ARM
> processors with only single-precision floating hardware, several math
> functions were returning imprecise results. I got the expected results
> on processors with no FPU and on processors with both 32- and 64- bit
> FPUs.
> 
> I discovered that the affected functions were using the 'fma' function
> on this hardware, even though (lacking 64-bit HW support), that
> function was being emulated without the required precision.
> 
> This all boiled down to math_config.h incorrectly detecting 64-bit FMA
> support on ARM processors.
> 
> This patch series contains three changes:
> 
>  1. fix the fast FMA process so that 32-bit ARM processors without 64-bit FMA
>     support don't use 'fma' for the new math functions
> 
>  2. Add detection of fast FMAF, which 32-bit ARM processors with only
>     32-bit FPUs *do* support.
> 
>  3. Add ARM versions of fma and fmaf which are used when those
>     instructions are available.
> 

Pushed.  I just regen'ed newlib/libm/machine/arm/Makefile.in.


Thanks,
Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-08-10 14:43   ` Szabolcs Nagy
@ 2020-08-10 15:19     ` Keith Packard
  0 siblings, 0 replies; 22+ messages in thread
From: Keith Packard @ 2020-08-10 15:19 UTC (permalink / raw)
  To: Szabolcs Nagy, newlib

[-- Attachment #1: Type: text/plain, Size: 248 bytes --]

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:

> but using HAVE_FAST_FMA{F} works too.
> (note that these macros currently only
> do something useful on aarch64 and arm.)

I've got a patch for RISC-V FMA support in the works.

-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-08-10  9:30 ` Corinna Vinschen
@ 2020-08-10 14:43   ` Szabolcs Nagy
  2020-08-10 15:19     ` Keith Packard
  0 siblings, 1 reply; 22+ messages in thread
From: Szabolcs Nagy @ 2020-08-10 14:43 UTC (permalink / raw)
  To: Keith Packard, newlib

The 08/10/2020 11:30, Corinna Vinschen wrote:
> Hi Szabolcs,
> 
> ok to push?
> 

this looks ok.

i would have used the arm specific macros
( __ARM_FEATURE_FMA, __ARM_FP) directly
in arm specific code.

but using HAVE_FAST_FMA{F} works too.
(note that these macros currently only
do something useful on aarch64 and arm.)


> 
> Thanks,
> Corinna
> 
> On Aug  8 15:34, Keith Packard via Newlib wrote:
> > I added some new test configurations to my CI system for picolibc and
> > discovered that when the new math code was built on 32-bit ARM
> > processors with only single-precision floating hardware, several math
> > functions were returning imprecise results. I got the expected results
> > on processors with no FPU and on processors with both 32- and 64- bit
> > FPUs.
> > 
> > I discovered that the affected functions were using the 'fma' function
> > on this hardware, even though (lacking 64-bit HW support), that
> > function was being emulated without the required precision.
> > 
> > This all boiled down to math_config.h incorrectly detecting 64-bit FMA
> > support on ARM processors.
> > 
> > This patch series contains three changes:
> > 
> >  1. fix the fast FMA process so that 32-bit ARM processors without 64-bit FMA
> >     support don't use 'fma' for the new math functions
> > 
> >  2. Add detection of fast FMAF, which 32-bit ARM processors with only
> >     32-bit FPUs *do* support.
> > 
> >  3. Add ARM versions of fma and fmaf which are used when those
> >     instructions are available.
> > 
> 
> -- 
> Corinna Vinschen
> Cygwin Maintainer
> Red Hat
> 

-- 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
  2020-08-08 22:34 Keith Packard
@ 2020-08-10  9:30 ` Corinna Vinschen
  2020-08-10 14:43   ` Szabolcs Nagy
  2020-08-10 19:06 ` Corinna Vinschen
  2020-09-01 16:32 ` Sebastian Huber
  2 siblings, 1 reply; 22+ messages in thread
From: Corinna Vinschen @ 2020-08-10  9:30 UTC (permalink / raw)
  To: Keith Packard; +Cc: newlib, Szabolcs Nagy

Hi Szabolcs,

ok to push?


Thanks,
Corinna

On Aug  8 15:34, Keith Packard via Newlib wrote:
> I added some new test configurations to my CI system for picolibc and
> discovered that when the new math code was built on 32-bit ARM
> processors with only single-precision floating hardware, several math
> functions were returning imprecise results. I got the expected results
> on processors with no FPU and on processors with both 32- and 64- bit
> FPUs.
> 
> I discovered that the affected functions were using the 'fma' function
> on this hardware, even though (lacking 64-bit HW support), that
> function was being emulated without the required precision.
> 
> This all boiled down to math_config.h incorrectly detecting 64-bit FMA
> support on ARM processors.
> 
> This patch series contains three changes:
> 
>  1. fix the fast FMA process so that 32-bit ARM processors without 64-bit FMA
>     support don't use 'fma' for the new math functions
> 
>  2. Add detection of fast FMAF, which 32-bit ARM processors with only
>     32-bit FPUs *do* support.
> 
>  3. Add ARM versions of fma and fmaf which are used when those
>     instructions are available.
> 

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA
@ 2020-08-08 22:34 Keith Packard
  2020-08-10  9:30 ` Corinna Vinschen
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Keith Packard @ 2020-08-08 22:34 UTC (permalink / raw)
  To: newlib

I added some new test configurations to my CI system for picolibc and
discovered that when the new math code was built on 32-bit ARM
processors with only single-precision floating hardware, several math
functions were returning imprecise results. I got the expected results
on processors with no FPU and on processors with both 32- and 64- bit
FPUs.

I discovered that the affected functions were using the 'fma' function
on this hardware, even though (lacking 64-bit HW support), that
function was being emulated without the required precision.

This all boiled down to math_config.h incorrectly detecting 64-bit FMA
support on ARM processors.

This patch series contains three changes:

 1. fix the fast FMA process so that 32-bit ARM processors without 64-bit FMA
    support don't use 'fma' for the new math functions

 2. Add detection of fast FMAF, which 32-bit ARM processors with only
    32-bit FPUs *do* support.

 3. Add ARM versions of fma and fmaf which are used when those
    instructions are available.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-09-07 22:23 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-07 14:09 Re: [PATCH 0/3] ARM with only 32-bit floats do not have fast 64-bit FMA Eric Bresie
2020-09-07 17:16 ` Keith Packard
2020-09-07 20:16   ` Brian Inglis
2020-09-07 22:23     ` Keith Packard
  -- strict thread matches above, loose matches on Subject: below --
2020-08-08 22:34 Keith Packard
2020-08-10  9:30 ` Corinna Vinschen
2020-08-10 14:43   ` Szabolcs Nagy
2020-08-10 15:19     ` Keith Packard
2020-08-10 19:06 ` Corinna Vinschen
2020-09-01 16:32 ` Sebastian Huber
2020-09-01 17:21   ` Sebastian Huber
2020-09-01 18:04     ` Sebastian Huber
2020-09-01 19:28       ` Keith Packard
2020-09-01 21:16         ` Joseph Myers
2020-09-01 23:06           ` Keith Packard
2020-09-02  4:41             ` Sebastian Huber
2020-09-02  5:25               ` Keith Packard
2020-09-02  5:35                 ` Keith Packard
2020-09-02 17:12               ` Joseph Myers
2020-09-02 17:59                 ` Sebastian Huber
2020-09-02 20:39                   ` Keith Packard
2020-09-01 19:50     ` Keith Packard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).