* Implement fma in soft-fp
@ 2013-06-22 17:17 Joseph S. Myers
2013-06-25 15:30 ` Richard Henderson
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Joseph S. Myers @ 2013-06-22 17:17 UTC (permalink / raw)
To: libc-alpha
Cc: libc-ports, Marcus Shawcroft, Richard Henderson, David Holsgrove,
Chris Metcalf, David S. Miller
This patch implements fused multiply-add in soft-fp. The immediate
motivation is to avoid testsuite failures for configurations without
support for FE_TOWARDZERO and FE_INEXACT (required for the existing
fma implementations to work correctly), without needing to put ulps
for fma in libm-test-ulps files (fma should never have any ulps).
This patch has been tested for MIPS64 (soft float and hard float, all
three ABIs). It depends on my previous patch
<http://sourceware.org/ml/libc-alpha/2013-06/msg00851.html> to fix
shdowing of variables with the same names in __FP_FRAC_ADD_3 and
_FP_MUL_MEAT_2_wide_3mul, which showed up as test-ldouble test
failures for fmal (0x0.ffffffffffffffffffffffffffff8p0L,
0x0.ffffffffffffffffffffffffffff8p0L,
-0x0.ffffffffffffffffffffffffffffp0L) and similar tests.
General notes:
* It wasn't clear where to put the complete function implementations
of fmaf / fma / fmal in terms of soft-fp so they could be shared
between architectures using them. This patch puts them in the
soft-fp directory, with files named as if they were libgcc functions
(fmasf4.c, fmadf4.c, fmatf4.c), and with the same license as the
rest of soft-fp, although defining the functions with the same names
and aliases as usual for libm; architecture sysdeps directories then
need to #include those files (possibly inside #if conditionals, as
done here). Given the existence of cases where you may wish to use
some but not others of those files (e.g. MIPS hard-float at least
for now only using fmatf4.c and not the others), using these files
directly via sysdeps directories would involve extremely specific
sysdeps directories containing just one or two files.
Much the same issue applies to sqrtl, where AArch64 and MIPS have
essentially identical code and Alpha has very similar code to those
architectures (and one difference - lack of __sqrtl_finite alias -
is a bug: bug 15666).
* Bug 13304 isn't completely fixed by this; apart from the need for
Tile and MicroBlaze to start using the new implementation, mentioned
below, I also need to make ARM soft-float use it, and at least
ColdFire and SH can also be configured for soft-float and should
also use it in that case. In addition, there is no proper fmal
implementation for IBM long double in glibc at all.
* The new code generally follows the surrounding code's style, which
doesn't always follow GNU style particularly well.
* If someone updates the copy of soft-fp in the Linux kernel - long
overdue, but probably a fair amount of work given the extent of
changes on both sides (you'd need to start by working out what the
common ancestor is, so as to tell what changes there are local to
the Linux kernel) - then the powerpc fpu emulation could be changed
to use this code for correct fmadd etc. operation (at present it
appears to do a non-fused operation, though with the extra three low
bits left around between the multiplication and the addition). It's
possible this might apply to other architectures using this code in
the kernel as well.
Notes for architecture maintainers:
* I expect that for architectures using soft-fp to implement ldbl-128,
the soft-fp fmal will be a lot faster than the existing ldbl-128
implementation that uses large numbers of floating-point operations
(each in turn implemented with soft-fp) internally. But I haven't
benchmarked this. (Depending on the speeds of various operations, I
don't rule out the possibility that it's actually faster for flt-32
or dbl-64 than the existing implementations on some architectures,
even when hardware floating point is in use.)
* Using this code for quad with 32-bit _FP_W_TYPE will require various
operations to be added to op-8.h that _FP_FMA uses internally for
double-width mantissas. None of these should be hard to add, but
the configurations I generally build and test don't include any for
which this is relevant. (In particular, this is relevant for
sparc32.)
* I refactored most of the soft-fp implementations of multiplication
to separate the part producing a double-width exact result (as
required for use within fma) from the part shifting that right.
However, _FP_MUL_MEAT_2_120_240_double does not directly produce a
double-width result in the integer form expected by soft-fp. The
same approach could be used to produce such a result if desired, but
without being able to use the _DW_ macro directly in the ordinary
one; that would be something to consider if using this code for
sparc64 (the only architecture to use
_FP_MUL_MEAT_2_120_240_double).
* Using this code on an architecture that detects tininess after
rounding may require soft-fp support for after-rounding tininess
detection to be added to avoid failures of fma tests for that
particular case. (In particular, this is relevant for alpha. It's
not currently relevant for MIPS because of the general lack of
exceptions support in the cases where this code gets used.)
* I'd encourage Tile and MicroBlaze maintainers to move to using this
code so they no longer have any ulps for fma.
2013-06-22 Joseph Myers <joseph@codesourcery.com>
[BZ #13304]
* soft-fp/op-common.h (_FP_FMA): New macro.
* soft-fp/op-1.h (_FP_FRAC_HIGHBIT_DW_1): New macro.
(_FP_MUL_MEAT_DW_1_imm): Likewise. Split out of ...
(_FP_MUL_MEAT_1_imm): ... here.
(_FP_MUL_MEAT_DW_1_wide): New macro. Split out of ...
(_FP_MUL_MEAT_1_wide): ... here.
(_FP_MUL_MEAT_DW_1_hard): Likewise. Split out of ...
(_FP_MUL_MEAT_1_hard): ... here.
* soft-fp/op-2.h (_FP_FRAC_HIGHBIT_DW_2): New macro.
(_FP_MUL_MEAT_DW_2_wide): Likewise. Split out of ...
(_FP_MUL_MEAT_2_wide): ... here.
(_FP_MUL_MEAT_DW_2_wide_3mul): New macro. Split out of ...
(_FP_MUL_MEAT_2_wide_3mul): ... here.
(_FP_MUL_MEAT_DW_2_gmp): New macro. Split out of ...
(_FP_MUL_MEAT_2_gmp): ... here.
* soft-fp/op-4.h (_FP_FRAC_HIGHBIT_DW_4): New macro.
(_FP_MUL_MEAT_DW_4_wide): Likewise. Split out of ...
(_FP_MUL_MEAT_4_wide): ... here.
(_FP_MUL_MEAT_DW_4_gmp): New macro. Split out of ...
(_FP_MUL_MEAT_4_gmp): ... here.
* soft-fp/single.h (_FP_FRACTBITS_DW_S): New macro.
(_FP_WFRACBITS_DW_S): Likewise.
(_FP_WFRACXBITS_DW_S): Likewise.
(_FP_HIGHBIT_DW_S): Likewise.
(FP_FMA_S): Likewise.
(_FP_FRAC_HIGH_DW_S): Likewise.
* soft-fp/double.h (_FP_FRACTBITS_DW_D): New macro.
(_FP_WFRACBITS_DW_D): Likewise.
(_FP_WFRACXBITS_DW_D): Likewise.
(_FP_HIGHBIT_DW_D): Likewise.
(FP_FMA_D): Likewise.
(_FP_FRAC_HIGH_DW_D): Likewise.
* soft-fp/extended.h (_FP_FRACTBITS_DW_E): New macro.
(_FP_WFRACBITS_DW_E): Likewise.
(_FP_WFRACXBITS_DW_E): Likewise.
(_FP_HIGHBIT_DW_E): Likewise.
(FP_FMA_E): Likewise.
(_FP_FRAC_HIGH_DW_E): Likewise.
* soft-fp/quad.h (_FP_FRACTBITS_DW_Q): New macro.
(_FP_WFRACBITS_DW_Q): Likewise.
(_FP_WFRACXBITS_DW_Q): Likewise.
(_FP_HIGHBIT_DW_Q): Likewise.
(FP_FMA_Q): Likewise.
(_FP_FRAC_HIGH_DW_Q): Likewise.
* soft-fp/fmasf4.c: New file.
* soft-fp/fmadf4.c: Likewise.
* soft-fp/fmatf4.c: Likewise.
ports/ChangeLog.mips:
2013-06-22 Joseph Myers <joseph@codesourcery.com>
[BZ #13304]
* sysdeps/mips/ieee754/s_fma.c: New file.
* sysdeps/mips/ieee754/s_fmaf.c: Likewise.
* sysdeps/mips/ieee754/s_fmal.c: Likewise.
* sysdeps/mips/mips32/Implies: Add mips/soft-fp.
* sysdeps/mips/mips64/n32/s_fma.c: Remove file.
* sysdeps/mips/mips64/n64/s_fma.c: Likewise.
* sysdeps/mips/mips64/soft-fp/sfp-machine.h (_FP_MUL_MEAT_DW_S):
New macro.
(_FP_MUL_MEAT_DW_D): Likewise.
(_FP_MUL_MEAT_DW_Q): Likewise.
* sysdeps/mips/soft-fp/sfp-machine.h (_FP_MUL_MEAT_DW_S): New
macro.
(_FP_MUL_MEAT_DW_D): Likewise.
(_FP_MUL_MEAT_DW_Q): Likewise.
diff --git a/ports/sysdeps/mips/ieee754/s_fma.c b/ports/sysdeps/mips/ieee754/s_fma.c
new file mode 100644
index 0000000..5741414
--- /dev/null
+++ b/ports/sysdeps/mips/ieee754/s_fma.c
@@ -0,0 +1,5 @@
+#ifdef __mips_hard_float
+# include <sysdeps/ieee754/dbl-64/s_fma.c>
+#else
+# include <soft-fp/fmadf4.c>
+#endif
diff --git a/ports/sysdeps/mips/ieee754/s_fmaf.c b/ports/sysdeps/mips/ieee754/s_fmaf.c
new file mode 100644
index 0000000..30bcdae
--- /dev/null
+++ b/ports/sysdeps/mips/ieee754/s_fmaf.c
@@ -0,0 +1,5 @@
+#ifdef __mips_hard_float
+# include <sysdeps/ieee754/dbl-64/s_fmaf.c>
+#else
+# include <soft-fp/fmasf4.c>
+#endif
diff --git a/ports/sysdeps/mips/ieee754/s_fmal.c b/ports/sysdeps/mips/ieee754/s_fmal.c
new file mode 100644
index 0000000..6b83e91
--- /dev/null
+++ b/ports/sysdeps/mips/ieee754/s_fmal.c
@@ -0,0 +1,7 @@
+#include <sgidefs.h>
+
+#if _MIPS_SIM == _ABIO32
+# error "long double fma being compiled for o32 ABI"
+#endif
+
+#include <soft-fp/fmatf4.c>
diff --git a/ports/sysdeps/mips/mips32/Implies b/ports/sysdeps/mips/mips32/Implies
index 6473f25..42df98f 100644
--- a/ports/sysdeps/mips/mips32/Implies
+++ b/ports/sysdeps/mips/mips32/Implies
@@ -1,3 +1,4 @@
mips/ieee754
+mips/soft-fp
mips
wordsize-32
diff --git a/ports/sysdeps/mips/mips64/n32/s_fma.c b/ports/sysdeps/mips/mips64/n32/s_fma.c
deleted file mode 100644
index 74a1e01..0000000
--- a/ports/sysdeps/mips/mips64/n32/s_fma.c
+++ /dev/null
@@ -1,6 +0,0 @@
-/* MIPS long double is implemented in software by fp-bit (as of GCC
- 4.7) without support for exceptions or rounding modes, so the fma
- implementation in terms of long double is slow and will not produce
- correctly rounding results. */
-
-#include <sysdeps/ieee754/dbl-64/s_fma.c>
diff --git a/ports/sysdeps/mips/mips64/n64/s_fma.c b/ports/sysdeps/mips/mips64/n64/s_fma.c
deleted file mode 100644
index 74a1e01..0000000
--- a/ports/sysdeps/mips/mips64/n64/s_fma.c
+++ /dev/null
@@ -1,6 +0,0 @@
-/* MIPS long double is implemented in software by fp-bit (as of GCC
- 4.7) without support for exceptions or rounding modes, so the fma
- implementation in terms of long double is slow and will not produce
- correctly rounding results. */
-
-#include <sysdeps/ieee754/dbl-64/s_fma.c>
diff --git a/ports/sysdeps/mips/mips64/soft-fp/sfp-machine.h b/ports/sysdeps/mips/mips64/soft-fp/sfp-machine.h
index 1bdde5a..9cfd6fb 100644
--- a/ports/sysdeps/mips/mips64/soft-fp/sfp-machine.h
+++ b/ports/sysdeps/mips/mips64/soft-fp/sfp-machine.h
@@ -13,6 +13,13 @@
#define _FP_MUL_MEAT_Q(R,X,Y) \
_FP_MUL_MEAT_2_wide_3mul(_FP_WFRACBITS_Q,R,X,Y,umul_ppmm)
+#define _FP_MUL_MEAT_DW_S(R,X,Y) \
+ _FP_MUL_MEAT_DW_1_imm(_FP_WFRACBITS_S,R,X,Y)
+#define _FP_MUL_MEAT_DW_D(R,X,Y) \
+ _FP_MUL_MEAT_DW_1_wide(_FP_WFRACBITS_D,R,X,Y,umul_ppmm)
+#define _FP_MUL_MEAT_DW_Q(R,X,Y) \
+ _FP_MUL_MEAT_DW_2_wide_3mul(_FP_WFRACBITS_Q,R,X,Y,umul_ppmm)
+
#define _FP_DIV_MEAT_S(R,X,Y) _FP_DIV_MEAT_1_imm(S,R,X,Y,_FP_DIV_HELP_imm)
#define _FP_DIV_MEAT_D(R,X,Y) _FP_DIV_MEAT_1_udiv_norm(D,R,X,Y)
#define _FP_DIV_MEAT_Q(R,X,Y) _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
diff --git a/ports/sysdeps/mips/soft-fp/sfp-machine.h b/ports/sysdeps/mips/soft-fp/sfp-machine.h
index 8ccfaa6..a60bef7 100644
--- a/ports/sysdeps/mips/soft-fp/sfp-machine.h
+++ b/ports/sysdeps/mips/soft-fp/sfp-machine.h
@@ -10,6 +10,13 @@
#define _FP_MUL_MEAT_Q(R,X,Y) \
_FP_MUL_MEAT_4_wide(_FP_WFRACBITS_Q,R,X,Y,umul_ppmm)
+#define _FP_MUL_MEAT_DW_S(R,X,Y) \
+ _FP_MUL_MEAT_DW_1_wide(_FP_WFRACBITS_S,R,X,Y,umul_ppmm)
+#define _FP_MUL_MEAT_DW_D(R,X,Y) \
+ _FP_MUL_MEAT_DW_2_wide(_FP_WFRACBITS_D,R,X,Y,umul_ppmm)
+#define _FP_MUL_MEAT_DW_Q(R,X,Y) \
+ _FP_MUL_MEAT_DW_4_wide(_FP_WFRACBITS_Q,R,X,Y,umul_ppmm)
+
#define _FP_DIV_MEAT_S(R,X,Y) _FP_DIV_MEAT_1_udiv_norm(S,R,X,Y)
#define _FP_DIV_MEAT_D(R,X,Y) _FP_DIV_MEAT_2_udiv(D,R,X,Y)
#define _FP_DIV_MEAT_Q(R,X,Y) _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
diff --git a/soft-fp/double.h b/soft-fp/double.h
index 759c2eb..8653f69 100644
--- a/soft-fp/double.h
+++ b/soft-fp/double.h
@@ -36,8 +36,10 @@
#if _FP_W_TYPE_SIZE < 64
#define _FP_FRACTBITS_D (2 * _FP_W_TYPE_SIZE)
+#define _FP_FRACTBITS_DW_D (4 * _FP_W_TYPE_SIZE)
#else
#define _FP_FRACTBITS_D _FP_W_TYPE_SIZE
+#define _FP_FRACTBITS_DW_D (2 * _FP_W_TYPE_SIZE)
#endif
#define _FP_FRACBITS_D 53
@@ -59,6 +61,11 @@
#define _FP_OVERFLOW_D \
((_FP_W_TYPE)1 << _FP_WFRACBITS_D % _FP_W_TYPE_SIZE)
+#define _FP_WFRACBITS_DW_D (2 * _FP_WFRACBITS_D)
+#define _FP_WFRACXBITS_DW_D (_FP_FRACTBITS_DW_D - _FP_WFRACBITS_DW_D)
+#define _FP_HIGHBIT_DW_D \
+ ((_FP_W_TYPE)1 << (_FP_WFRACBITS_DW_D - 1) % _FP_W_TYPE_SIZE)
+
typedef float DFtype __attribute__((mode(DF)));
#if _FP_W_TYPE_SIZE < 64
@@ -149,6 +156,7 @@ union _FP_UNION_D
#define FP_DIV_D(R,X,Y) _FP_DIV(D,2,R,X,Y)
#define FP_SQRT_D(R,X) _FP_SQRT(D,2,R,X)
#define _FP_SQRT_MEAT_D(R,S,T,X,Q) _FP_SQRT_MEAT_2(R,S,T,X,Q)
+#define FP_FMA_D(R,X,Y,Z) _FP_FMA(D,2,4,R,X,Y,Z)
#define FP_CMP_D(r,X,Y,un) _FP_CMP(D,2,r,X,Y,un)
#define FP_CMP_EQ_D(r,X,Y) _FP_CMP_EQ(D,2,r,X,Y)
@@ -160,6 +168,8 @@ union _FP_UNION_D
#define _FP_FRAC_HIGH_D(X) _FP_FRAC_HIGH_2(X)
#define _FP_FRAC_HIGH_RAW_D(X) _FP_FRAC_HIGH_2(X)
+#define _FP_FRAC_HIGH_DW_D(X) _FP_FRAC_HIGH_4(X)
+
#else
union _FP_UNION_D
@@ -246,6 +256,7 @@ union _FP_UNION_D
#define FP_DIV_D(R,X,Y) _FP_DIV(D,1,R,X,Y)
#define FP_SQRT_D(R,X) _FP_SQRT(D,1,R,X)
#define _FP_SQRT_MEAT_D(R,S,T,X,Q) _FP_SQRT_MEAT_1(R,S,T,X,Q)
+#define FP_FMA_D(R,X,Y,Z) _FP_FMA(D,1,2,R,X,Y,Z)
/* The implementation of _FP_MUL_D and _FP_DIV_D should be chosen by
the target machine. */
@@ -260,4 +271,6 @@ union _FP_UNION_D
#define _FP_FRAC_HIGH_D(X) _FP_FRAC_HIGH_1(X)
#define _FP_FRAC_HIGH_RAW_D(X) _FP_FRAC_HIGH_1(X)
+#define _FP_FRAC_HIGH_DW_D(X) _FP_FRAC_HIGH_2(X)
+
#endif /* W_TYPE_SIZE < 64 */
diff --git a/soft-fp/extended.h b/soft-fp/extended.h
index 7492755..c8b1583 100644
--- a/soft-fp/extended.h
+++ b/soft-fp/extended.h
@@ -33,8 +33,10 @@
#if _FP_W_TYPE_SIZE < 64
#define _FP_FRACTBITS_E (4*_FP_W_TYPE_SIZE)
+#define _FP_FRACTBITS_DW_E (8*_FP_W_TYPE_SIZE)
#else
#define _FP_FRACTBITS_E (2*_FP_W_TYPE_SIZE)
+#define _FP_FRACTBITS_DW_E (4*_FP_W_TYPE_SIZE)
#endif
#define _FP_FRACBITS_E 64
@@ -56,6 +58,11 @@
#define _FP_OVERFLOW_E \
((_FP_W_TYPE)1 << (_FP_WFRACBITS_E % _FP_W_TYPE_SIZE))
+#define _FP_WFRACBITS_DW_E (2 * _FP_WFRACBITS_E)
+#define _FP_WFRACXBITS_DW_E (_FP_FRACTBITS_DW_E - _FP_WFRACBITS_DW_E)
+#define _FP_HIGHBIT_DW_E \
+ ((_FP_W_TYPE)1 << (_FP_WFRACBITS_DW_E - 1) % _FP_W_TYPE_SIZE)
+
typedef float XFtype __attribute__((mode(XF)));
#if _FP_W_TYPE_SIZE < 64
@@ -192,6 +199,7 @@ union _FP_UNION_E
#define FP_MUL_E(R,X,Y) _FP_MUL(E,4,R,X,Y)
#define FP_DIV_E(R,X,Y) _FP_DIV(E,4,R,X,Y)
#define FP_SQRT_E(R,X) _FP_SQRT(E,4,R,X)
+#define FP_FMA_E(R,X,Y,Z) _FP_FMA(E,4,8,R,X,Y,Z)
/*
* Square root algorithms:
@@ -258,6 +266,8 @@ union _FP_UNION_E
#define _FP_FRAC_HIGH_E(X) (X##_f[2])
#define _FP_FRAC_HIGH_RAW_E(X) (X##_f[1])
+#define _FP_FRAC_HIGH_DW_E(X) (X##_f[4])
+
#else /* not _FP_W_TYPE_SIZE < 64 */
union _FP_UNION_E
{
@@ -383,6 +393,7 @@ union _FP_UNION_E
#define FP_MUL_E(R,X,Y) _FP_MUL(E,2,R,X,Y)
#define FP_DIV_E(R,X,Y) _FP_DIV(E,2,R,X,Y)
#define FP_SQRT_E(R,X) _FP_SQRT(E,2,R,X)
+#define FP_FMA_E(R,X,Y,Z) _FP_FMA(E,2,4,R,X,Y,Z)
/*
* Square root algorithms:
@@ -427,4 +438,6 @@ union _FP_UNION_E
#define _FP_FRAC_HIGH_E(X) (X##_f1)
#define _FP_FRAC_HIGH_RAW_E(X) (X##_f0)
+#define _FP_FRAC_HIGH_DW_E(X) (X##_f[2])
+
#endif /* not _FP_W_TYPE_SIZE < 64 */
diff --git a/soft-fp/fmadf4.c b/soft-fp/fmadf4.c
new file mode 100644
index 0000000..ebdc2b1
--- /dev/null
+++ b/soft-fp/fmadf4.c
@@ -0,0 +1,56 @@
+/* Implement fma using soft-fp.
+ Copyright (C) 2013 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ In addition to the permissions in the GNU Lesser General Public
+ License, the Free Software Foundation gives you unlimited
+ permission to link the compiled version of this file into
+ combinations with other programs, and to distribute those
+ combinations without any restriction coming from the use of this
+ file. (The Lesser General Public License restrictions do apply in
+ other respects; for example, they cover modification of the file,
+ and distribution when not linked into a combine executable.)
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <math.h>
+#include "soft-fp.h"
+#include "double.h"
+
+double
+__fma (double a, double b, double c)
+{
+ FP_DECL_EX;
+ FP_DECL_D(A); FP_DECL_D(B); FP_DECL_D(C); FP_DECL_D(R);
+ double r;
+
+ FP_INIT_ROUNDMODE;
+ FP_UNPACK_D(A, a);
+ FP_UNPACK_D(B, b);
+ FP_UNPACK_D(C, c);
+ FP_FMA_D(R, A, B, C);
+ FP_PACK_D(r, R);
+ FP_HANDLE_EXCEPTIONS;
+
+ return r;
+}
+#ifndef __fma
+weak_alias (__fma, fma)
+#endif
+
+#ifdef NO_LONG_DOUBLE
+strong_alias (__fma, __fmal)
+weak_alias (__fmal, fmal)
+#endif
diff --git a/soft-fp/fmasf4.c b/soft-fp/fmasf4.c
new file mode 100644
index 0000000..e8d60fb
--- /dev/null
+++ b/soft-fp/fmasf4.c
@@ -0,0 +1,51 @@
+/* Implement fmaf using soft-fp.
+ Copyright (C) 2013 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ In addition to the permissions in the GNU Lesser General Public
+ License, the Free Software Foundation gives you unlimited
+ permission to link the compiled version of this file into
+ combinations with other programs, and to distribute those
+ combinations without any restriction coming from the use of this
+ file. (The Lesser General Public License restrictions do apply in
+ other respects; for example, they cover modification of the file,
+ and distribution when not linked into a combine executable.)
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <math.h>
+#include "soft-fp.h"
+#include "single.h"
+
+float
+__fmaf (float a, float b, float c)
+{
+ FP_DECL_EX;
+ FP_DECL_S(A); FP_DECL_S(B); FP_DECL_S(C); FP_DECL_S(R);
+ float r;
+
+ FP_INIT_ROUNDMODE;
+ FP_UNPACK_S(A, a);
+ FP_UNPACK_S(B, b);
+ FP_UNPACK_S(C, c);
+ FP_FMA_S(R, A, B, C);
+ FP_PACK_S(r, R);
+ FP_HANDLE_EXCEPTIONS;
+
+ return r;
+}
+#ifndef __fmaf
+weak_alias (__fmaf, fmaf)
+#endif
diff --git a/soft-fp/fmatf4.c b/soft-fp/fmatf4.c
new file mode 100644
index 0000000..cf48988
--- /dev/null
+++ b/soft-fp/fmatf4.c
@@ -0,0 +1,49 @@
+/* Implement fmal using soft-fp.
+ Copyright (C) 2013 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ In addition to the permissions in the GNU Lesser General Public
+ License, the Free Software Foundation gives you unlimited
+ permission to link the compiled version of this file into
+ combinations with other programs, and to distribute those
+ combinations without any restriction coming from the use of this
+ file. (The Lesser General Public License restrictions do apply in
+ other respects; for example, they cover modification of the file,
+ and distribution when not linked into a combine executable.)
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <math.h>
+#include "soft-fp.h"
+#include "quad.h"
+
+long double
+__fmal (long double a, long double b, long double c)
+{
+ FP_DECL_EX;
+ FP_DECL_Q(A); FP_DECL_Q(B); FP_DECL_Q(C); FP_DECL_Q(R);
+ long double r;
+
+ FP_INIT_ROUNDMODE;
+ FP_UNPACK_Q(A, a);
+ FP_UNPACK_Q(B, b);
+ FP_UNPACK_Q(C, c);
+ FP_FMA_Q(R, A, B, C);
+ FP_PACK_Q(r, R);
+ FP_HANDLE_EXCEPTIONS;
+
+ return r;
+}
+weak_alias (__fmal, fmal)
diff --git a/soft-fp/op-1.h b/soft-fp/op-1.h
index 8e05e2f..d3ccbcb 100644
--- a/soft-fp/op-1.h
+++ b/soft-fp/op-1.h
@@ -72,6 +72,7 @@ do { \
#define _FP_FRAC_ZEROP_1(X) (X##_f == 0)
#define _FP_FRAC_OVERP_1(fs,X) (X##_f & _FP_OVERFLOW_##fs)
#define _FP_FRAC_CLEAR_OVERP_1(fs,X) (X##_f &= ~_FP_OVERFLOW_##fs)
+#define _FP_FRAC_HIGHBIT_DW_1(fs,X) (X##_f & _FP_HIGHBIT_DW_##fs)
#define _FP_FRAC_EQ_1(X, Y) (X##_f == Y##_f)
#define _FP_FRAC_GE_1(X, Y) (X##_f >= Y##_f)
#define _FP_FRAC_GT_1(X, Y) (X##_f > Y##_f)
@@ -137,9 +138,14 @@ do { \
/* Basic. Assuming the host word size is >= 2*FRACBITS, we can do the
multiplication immediately. */
-#define _FP_MUL_MEAT_1_imm(wfracbits, R, X, Y) \
+#define _FP_MUL_MEAT_DW_1_imm(wfracbits, R, X, Y) \
do { \
R##_f = X##_f * Y##_f; \
+ } while (0)
+
+#define _FP_MUL_MEAT_1_imm(wfracbits, R, X, Y) \
+ do { \
+ _FP_MUL_MEAT_DW_1_imm(wfracbits, R, X, Y); \
/* Normalize since we know where the msb of the multiplicands \
were (bit B), we know that the msb of the of the product is \
at either 2B or 2B-1. */ \
@@ -148,10 +154,15 @@ do { \
/* Given a 1W * 1W => 2W primitive, do the extended multiplication. */
+#define _FP_MUL_MEAT_DW_1_wide(wfracbits, R, X, Y, doit) \
+ do { \
+ doit(R##_f1, R##_f0, X##_f, Y##_f); \
+ } while (0)
+
#define _FP_MUL_MEAT_1_wide(wfracbits, R, X, Y, doit) \
do { \
- _FP_W_TYPE _Z_f0, _Z_f1; \
- doit(_Z_f1, _Z_f0, X##_f, Y##_f); \
+ _FP_FRAC_DECL_2(_Z); \
+ _FP_MUL_MEAT_DW_1_wide(wfracbits, _Z, X, Y, doit); \
/* Normalize since we know where the msb of the multiplicands \
were (bit B), we know that the msb of the of the product is \
at either 2B or 2B-1. */ \
@@ -161,9 +172,10 @@ do { \
/* Finally, a simple widening multiply algorithm. What fun! */
-#define _FP_MUL_MEAT_1_hard(wfracbits, R, X, Y) \
+#define _FP_MUL_MEAT_DW_1_hard(wfracbits, R, X, Y) \
do { \
- _FP_W_TYPE _xh, _xl, _yh, _yl, _z_f0, _z_f1, _a_f0, _a_f1; \
+ _FP_W_TYPE _xh, _xl, _yh, _yl; \
+ _FP_FRAC_DECL_2(_a); \
\
/* split the words in half */ \
_xh = X##_f >> (_FP_W_TYPE_SIZE/2); \
@@ -172,17 +184,23 @@ do { \
_yl = Y##_f & (((_FP_W_TYPE)1 << (_FP_W_TYPE_SIZE/2)) - 1); \
\
/* multiply the pieces */ \
- _z_f0 = _xl * _yl; \
+ R##f0 = _xl * _yl; \
_a_f0 = _xh * _yl; \
_a_f1 = _xl * _yh; \
- _z_f1 = _xh * _yh; \
+ R##f1 = _xh * _yh; \
\
/* reassemble into two full words */ \
if ((_a_f0 += _a_f1) < _a_f1) \
- _z_f1 += (_FP_W_TYPE)1 << (_FP_W_TYPE_SIZE/2); \
+ R##_f1 += (_FP_W_TYPE)1 << (_FP_W_TYPE_SIZE/2); \
_a_f1 = _a_f0 >> (_FP_W_TYPE_SIZE/2); \
_a_f0 = _a_f0 << (_FP_W_TYPE_SIZE/2); \
- _FP_FRAC_ADD_2(_z, _z, _a); \
+ _FP_FRAC_ADD_2(R, R, _a); \
+ } while (0)
+
+#define _FP_MUL_MEAT_1_hard(wfracbits, R, X, Y) \
+ do { \
+ _FP_FRAC_DECL_2(_z); \
+ _FP_MUL_MEAT_DW_1_hard(wfracbits, _z, X, Y); \
\
/* normalize */ \
_FP_FRAC_SRS_2(_z, wfracbits - 1, 2*wfracbits); \
diff --git a/soft-fp/op-2.h b/soft-fp/op-2.h
index 48e01d2..2008822 100644
--- a/soft-fp/op-2.h
+++ b/soft-fp/op-2.h
@@ -134,6 +134,8 @@
#define _FP_FRAC_ZEROP_2(X) ((X##_f1 | X##_f0) == 0)
#define _FP_FRAC_OVERP_2(fs,X) (_FP_FRAC_HIGH_##fs(X) & _FP_OVERFLOW_##fs)
#define _FP_FRAC_CLEAR_OVERP_2(fs,X) (_FP_FRAC_HIGH_##fs(X) &= ~_FP_OVERFLOW_##fs)
+#define _FP_FRAC_HIGHBIT_DW_2(fs,X) \
+ (_FP_FRAC_HIGH_DW_##fs(X) & _FP_HIGHBIT_DW_##fs)
#define _FP_FRAC_EQ_2(X, Y) (X##_f1 == Y##_f1 && X##_f0 == Y##_f0)
#define _FP_FRAC_GT_2(X, Y) \
(X##_f1 > Y##_f1 || (X##_f1 == Y##_f1 && X##_f0 > Y##_f0))
@@ -257,23 +259,30 @@
/* Given a 1W * 1W => 2W primitive, do the extended multiplication. */
-#define _FP_MUL_MEAT_2_wide(wfracbits, R, X, Y, doit) \
+#define _FP_MUL_MEAT_DW_2_wide(wfracbits, R, X, Y, doit) \
do { \
- _FP_FRAC_DECL_4(_z); _FP_FRAC_DECL_2(_b); _FP_FRAC_DECL_2(_c); \
+ _FP_FRAC_DECL_2(_b); _FP_FRAC_DECL_2(_c); \
\
- doit(_FP_FRAC_WORD_4(_z,1), _FP_FRAC_WORD_4(_z,0), X##_f0, Y##_f0); \
+ doit(_FP_FRAC_WORD_4(R,1), _FP_FRAC_WORD_4(R,0), X##_f0, Y##_f0); \
doit(_b_f1, _b_f0, X##_f0, Y##_f1); \
doit(_c_f1, _c_f0, X##_f1, Y##_f0); \
- doit(_FP_FRAC_WORD_4(_z,3), _FP_FRAC_WORD_4(_z,2), X##_f1, Y##_f1); \
+ doit(_FP_FRAC_WORD_4(R,3), _FP_FRAC_WORD_4(R,2), X##_f1, Y##_f1); \
\
- __FP_FRAC_ADD_3(_FP_FRAC_WORD_4(_z,3),_FP_FRAC_WORD_4(_z,2), \
- _FP_FRAC_WORD_4(_z,1), 0, _b_f1, _b_f0, \
- _FP_FRAC_WORD_4(_z,3),_FP_FRAC_WORD_4(_z,2), \
- _FP_FRAC_WORD_4(_z,1)); \
- __FP_FRAC_ADD_3(_FP_FRAC_WORD_4(_z,3),_FP_FRAC_WORD_4(_z,2), \
- _FP_FRAC_WORD_4(_z,1), 0, _c_f1, _c_f0, \
- _FP_FRAC_WORD_4(_z,3),_FP_FRAC_WORD_4(_z,2), \
- _FP_FRAC_WORD_4(_z,1)); \
+ __FP_FRAC_ADD_3(_FP_FRAC_WORD_4(R,3),_FP_FRAC_WORD_4(R,2), \
+ _FP_FRAC_WORD_4(R,1), 0, _b_f1, _b_f0, \
+ _FP_FRAC_WORD_4(R,3),_FP_FRAC_WORD_4(R,2), \
+ _FP_FRAC_WORD_4(R,1)); \
+ __FP_FRAC_ADD_3(_FP_FRAC_WORD_4(R,3),_FP_FRAC_WORD_4(R,2), \
+ _FP_FRAC_WORD_4(R,1), 0, _c_f1, _c_f0, \
+ _FP_FRAC_WORD_4(R,3),_FP_FRAC_WORD_4(R,2), \
+ _FP_FRAC_WORD_4(R,1)); \
+ } while (0)
+
+#define _FP_MUL_MEAT_2_wide(wfracbits, R, X, Y, doit) \
+ do { \
+ _FP_FRAC_DECL_4(_z); \
+ \
+ _FP_MUL_MEAT_DW_2_wide(wfracbits, _z, X, Y, doit); \
\
/* Normalize since we know where the msb of the multiplicands \
were (bit B), we know that the msb of the of the product is \
@@ -287,9 +296,9 @@
Do only 3 multiplications instead of four. This one is for machines
where multiplication is much more expensive than subtraction. */
-#define _FP_MUL_MEAT_2_wide_3mul(wfracbits, R, X, Y, doit) \
+#define _FP_MUL_MEAT_DW_2_wide_3mul(wfracbits, R, X, Y, doit) \
do { \
- _FP_FRAC_DECL_4(_z); _FP_FRAC_DECL_2(_b); _FP_FRAC_DECL_2(_c); \
+ _FP_FRAC_DECL_2(_b); _FP_FRAC_DECL_2(_c); \
_FP_W_TYPE _d; \
int _c1, _c2; \
\
@@ -297,27 +306,34 @@
_c1 = _b_f0 < X##_f0; \
_b_f1 = Y##_f0 + Y##_f1; \
_c2 = _b_f1 < Y##_f0; \
- doit(_d, _FP_FRAC_WORD_4(_z,0), X##_f0, Y##_f0); \
- doit(_FP_FRAC_WORD_4(_z,2), _FP_FRAC_WORD_4(_z,1), _b_f0, _b_f1); \
+ doit(_d, _FP_FRAC_WORD_4(R,0), X##_f0, Y##_f0); \
+ doit(_FP_FRAC_WORD_4(R,2), _FP_FRAC_WORD_4(R,1), _b_f0, _b_f1); \
doit(_c_f1, _c_f0, X##_f1, Y##_f1); \
\
_b_f0 &= -_c2; \
_b_f1 &= -_c1; \
- __FP_FRAC_ADD_3(_FP_FRAC_WORD_4(_z,3),_FP_FRAC_WORD_4(_z,2), \
- _FP_FRAC_WORD_4(_z,1), (_c1 & _c2), 0, _d, \
- 0, _FP_FRAC_WORD_4(_z,2), _FP_FRAC_WORD_4(_z,1)); \
- __FP_FRAC_ADDI_2(_FP_FRAC_WORD_4(_z,3),_FP_FRAC_WORD_4(_z,2), \
+ __FP_FRAC_ADD_3(_FP_FRAC_WORD_4(R,3),_FP_FRAC_WORD_4(R,2), \
+ _FP_FRAC_WORD_4(R,1), (_c1 & _c2), 0, _d, \
+ 0, _FP_FRAC_WORD_4(R,2), _FP_FRAC_WORD_4(R,1)); \
+ __FP_FRAC_ADDI_2(_FP_FRAC_WORD_4(R,3),_FP_FRAC_WORD_4(R,2), \
_b_f0); \
- __FP_FRAC_ADDI_2(_FP_FRAC_WORD_4(_z,3),_FP_FRAC_WORD_4(_z,2), \
+ __FP_FRAC_ADDI_2(_FP_FRAC_WORD_4(R,3),_FP_FRAC_WORD_4(R,2), \
_b_f1); \
- __FP_FRAC_DEC_3(_FP_FRAC_WORD_4(_z,3),_FP_FRAC_WORD_4(_z,2), \
- _FP_FRAC_WORD_4(_z,1), \
- 0, _d, _FP_FRAC_WORD_4(_z,0)); \
- __FP_FRAC_DEC_3(_FP_FRAC_WORD_4(_z,3),_FP_FRAC_WORD_4(_z,2), \
- _FP_FRAC_WORD_4(_z,1), 0, _c_f1, _c_f0); \
- __FP_FRAC_ADD_2(_FP_FRAC_WORD_4(_z,3), _FP_FRAC_WORD_4(_z,2), \
+ __FP_FRAC_DEC_3(_FP_FRAC_WORD_4(R,3),_FP_FRAC_WORD_4(R,2), \
+ _FP_FRAC_WORD_4(R,1), \
+ 0, _d, _FP_FRAC_WORD_4(R,0)); \
+ __FP_FRAC_DEC_3(_FP_FRAC_WORD_4(R,3),_FP_FRAC_WORD_4(R,2), \
+ _FP_FRAC_WORD_4(R,1), 0, _c_f1, _c_f0); \
+ __FP_FRAC_ADD_2(_FP_FRAC_WORD_4(R,3), _FP_FRAC_WORD_4(R,2), \
_c_f1, _c_f0, \
- _FP_FRAC_WORD_4(_z,3), _FP_FRAC_WORD_4(_z,2)); \
+ _FP_FRAC_WORD_4(R,3), _FP_FRAC_WORD_4(R,2)); \
+ } while (0)
+
+#define _FP_MUL_MEAT_2_wide_3mul(wfracbits, R, X, Y, doit) \
+ do { \
+ _FP_FRAC_DECL_4(_z); \
+ \
+ _FP_MUL_MEAT_DW_2_wide_3mul(wfracbits, _z, X, Y, doit); \
\
/* Normalize since we know where the msb of the multiplicands \
were (bit B), we know that the msb of the of the product is \
@@ -327,14 +343,20 @@
R##_f1 = _FP_FRAC_WORD_4(_z,1); \
} while (0)
-#define _FP_MUL_MEAT_2_gmp(wfracbits, R, X, Y) \
+#define _FP_MUL_MEAT_DW_2_gmp(wfracbits, R, X, Y) \
do { \
- _FP_FRAC_DECL_4(_z); \
_FP_W_TYPE _x[2], _y[2]; \
_x[0] = X##_f0; _x[1] = X##_f1; \
_y[0] = Y##_f0; _y[1] = Y##_f1; \
\
- mpn_mul_n(_z_f, _x, _y, 2); \
+ mpn_mul_n(R##_f, _x, _y, 2); \
+ } while (0)
+
+#define _FP_MUL_MEAT_2_gmp(wfracbits, R, X, Y) \
+ do { \
+ _FP_FRAC_DECL_4(_z); \
+ \
+ _FP_MUL_MEAT_DW_2_gmp(wfracbits, _z, X, Y); \
\
/* Normalize since we know where the msb of the multiplicands \
were (bit B), we know that the msb of the of the product is \
diff --git a/soft-fp/op-4.h b/soft-fp/op-4.h
index 007b01f..0e26d48 100644
--- a/soft-fp/op-4.h
+++ b/soft-fp/op-4.h
@@ -142,6 +142,8 @@
#define _FP_FRAC_ZEROP_4(X) ((X##_f[0] | X##_f[1] | X##_f[2] | X##_f[3]) == 0)
#define _FP_FRAC_NEGP_4(X) ((_FP_WS_TYPE)X##_f[3] < 0)
#define _FP_FRAC_OVERP_4(fs,X) (_FP_FRAC_HIGH_##fs(X) & _FP_OVERFLOW_##fs)
+#define _FP_FRAC_HIGHBIT_DW_4(fs,X) \
+ (_FP_FRAC_HIGH_DW_##fs(X) & _FP_HIGHBIT_DW_##fs)
#define _FP_FRAC_CLEAR_OVERP_4(fs,X) (_FP_FRAC_HIGH_##fs(X) &= ~_FP_OVERFLOW_##fs)
#define _FP_FRAC_EQ_4(X,Y) \
@@ -246,81 +248,88 @@
/* Given a 1W * 1W => 2W primitive, do the extended multiplication. */
-#define _FP_MUL_MEAT_4_wide(wfracbits, R, X, Y, doit) \
+#define _FP_MUL_MEAT_DW_4_wide(wfracbits, R, X, Y, doit) \
do { \
- _FP_FRAC_DECL_8(_z); _FP_FRAC_DECL_2(_b); _FP_FRAC_DECL_2(_c); \
+ _FP_FRAC_DECL_2(_b); _FP_FRAC_DECL_2(_c); \
_FP_FRAC_DECL_2(_d); _FP_FRAC_DECL_2(_e); _FP_FRAC_DECL_2(_f); \
\
- doit(_FP_FRAC_WORD_8(_z,1), _FP_FRAC_WORD_8(_z,0), X##_f[0], Y##_f[0]); \
+ doit(_FP_FRAC_WORD_8(R,1), _FP_FRAC_WORD_8(R,0), X##_f[0], Y##_f[0]); \
doit(_b_f1, _b_f0, X##_f[0], Y##_f[1]); \
doit(_c_f1, _c_f0, X##_f[1], Y##_f[0]); \
doit(_d_f1, _d_f0, X##_f[1], Y##_f[1]); \
doit(_e_f1, _e_f0, X##_f[0], Y##_f[2]); \
doit(_f_f1, _f_f0, X##_f[2], Y##_f[0]); \
- __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(_z,3),_FP_FRAC_WORD_8(_z,2), \
- _FP_FRAC_WORD_8(_z,1), 0,_b_f1,_b_f0, \
- 0,0,_FP_FRAC_WORD_8(_z,1)); \
- __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(_z,3),_FP_FRAC_WORD_8(_z,2), \
- _FP_FRAC_WORD_8(_z,1), 0,_c_f1,_c_f0, \
- _FP_FRAC_WORD_8(_z,3),_FP_FRAC_WORD_8(_z,2), \
- _FP_FRAC_WORD_8(_z,1)); \
- __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(_z,4),_FP_FRAC_WORD_8(_z,3), \
- _FP_FRAC_WORD_8(_z,2), 0,_d_f1,_d_f0, \
- 0,_FP_FRAC_WORD_8(_z,3),_FP_FRAC_WORD_8(_z,2)); \
- __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(_z,4),_FP_FRAC_WORD_8(_z,3), \
- _FP_FRAC_WORD_8(_z,2), 0,_e_f1,_e_f0, \
- _FP_FRAC_WORD_8(_z,4),_FP_FRAC_WORD_8(_z,3), \
- _FP_FRAC_WORD_8(_z,2)); \
- __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(_z,4),_FP_FRAC_WORD_8(_z,3), \
- _FP_FRAC_WORD_8(_z,2), 0,_f_f1,_f_f0, \
- _FP_FRAC_WORD_8(_z,4),_FP_FRAC_WORD_8(_z,3), \
- _FP_FRAC_WORD_8(_z,2)); \
+ __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(R,3),_FP_FRAC_WORD_8(R,2), \
+ _FP_FRAC_WORD_8(R,1), 0,_b_f1,_b_f0, \
+ 0,0,_FP_FRAC_WORD_8(R,1)); \
+ __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(R,3),_FP_FRAC_WORD_8(R,2), \
+ _FP_FRAC_WORD_8(R,1), 0,_c_f1,_c_f0, \
+ _FP_FRAC_WORD_8(R,3),_FP_FRAC_WORD_8(R,2), \
+ _FP_FRAC_WORD_8(R,1)); \
+ __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(R,4),_FP_FRAC_WORD_8(R,3), \
+ _FP_FRAC_WORD_8(R,2), 0,_d_f1,_d_f0, \
+ 0,_FP_FRAC_WORD_8(R,3),_FP_FRAC_WORD_8(R,2)); \
+ __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(R,4),_FP_FRAC_WORD_8(R,3), \
+ _FP_FRAC_WORD_8(R,2), 0,_e_f1,_e_f0, \
+ _FP_FRAC_WORD_8(R,4),_FP_FRAC_WORD_8(R,3), \
+ _FP_FRAC_WORD_8(R,2)); \
+ __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(R,4),_FP_FRAC_WORD_8(R,3), \
+ _FP_FRAC_WORD_8(R,2), 0,_f_f1,_f_f0, \
+ _FP_FRAC_WORD_8(R,4),_FP_FRAC_WORD_8(R,3), \
+ _FP_FRAC_WORD_8(R,2)); \
doit(_b_f1, _b_f0, X##_f[0], Y##_f[3]); \
doit(_c_f1, _c_f0, X##_f[3], Y##_f[0]); \
doit(_d_f1, _d_f0, X##_f[1], Y##_f[2]); \
doit(_e_f1, _e_f0, X##_f[2], Y##_f[1]); \
- __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(_z,5),_FP_FRAC_WORD_8(_z,4), \
- _FP_FRAC_WORD_8(_z,3), 0,_b_f1,_b_f0, \
- 0,_FP_FRAC_WORD_8(_z,4),_FP_FRAC_WORD_8(_z,3)); \
- __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(_z,5),_FP_FRAC_WORD_8(_z,4), \
- _FP_FRAC_WORD_8(_z,3), 0,_c_f1,_c_f0, \
- _FP_FRAC_WORD_8(_z,5),_FP_FRAC_WORD_8(_z,4), \
- _FP_FRAC_WORD_8(_z,3)); \
- __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(_z,5),_FP_FRAC_WORD_8(_z,4), \
- _FP_FRAC_WORD_8(_z,3), 0,_d_f1,_d_f0, \
- _FP_FRAC_WORD_8(_z,5),_FP_FRAC_WORD_8(_z,4), \
- _FP_FRAC_WORD_8(_z,3)); \
- __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(_z,5),_FP_FRAC_WORD_8(_z,4), \
- _FP_FRAC_WORD_8(_z,3), 0,_e_f1,_e_f0, \
- _FP_FRAC_WORD_8(_z,5),_FP_FRAC_WORD_8(_z,4), \
- _FP_FRAC_WORD_8(_z,3)); \
+ __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(R,5),_FP_FRAC_WORD_8(R,4), \
+ _FP_FRAC_WORD_8(R,3), 0,_b_f1,_b_f0, \
+ 0,_FP_FRAC_WORD_8(R,4),_FP_FRAC_WORD_8(R,3)); \
+ __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(R,5),_FP_FRAC_WORD_8(R,4), \
+ _FP_FRAC_WORD_8(R,3), 0,_c_f1,_c_f0, \
+ _FP_FRAC_WORD_8(R,5),_FP_FRAC_WORD_8(R,4), \
+ _FP_FRAC_WORD_8(R,3)); \
+ __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(R,5),_FP_FRAC_WORD_8(R,4), \
+ _FP_FRAC_WORD_8(R,3), 0,_d_f1,_d_f0, \
+ _FP_FRAC_WORD_8(R,5),_FP_FRAC_WORD_8(R,4), \
+ _FP_FRAC_WORD_8(R,3)); \
+ __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(R,5),_FP_FRAC_WORD_8(R,4), \
+ _FP_FRAC_WORD_8(R,3), 0,_e_f1,_e_f0, \
+ _FP_FRAC_WORD_8(R,5),_FP_FRAC_WORD_8(R,4), \
+ _FP_FRAC_WORD_8(R,3)); \
doit(_b_f1, _b_f0, X##_f[2], Y##_f[2]); \
doit(_c_f1, _c_f0, X##_f[1], Y##_f[3]); \
doit(_d_f1, _d_f0, X##_f[3], Y##_f[1]); \
doit(_e_f1, _e_f0, X##_f[2], Y##_f[3]); \
doit(_f_f1, _f_f0, X##_f[3], Y##_f[2]); \
- __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(_z,6),_FP_FRAC_WORD_8(_z,5), \
- _FP_FRAC_WORD_8(_z,4), 0,_b_f1,_b_f0, \
- 0,_FP_FRAC_WORD_8(_z,5),_FP_FRAC_WORD_8(_z,4)); \
- __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(_z,6),_FP_FRAC_WORD_8(_z,5), \
- _FP_FRAC_WORD_8(_z,4), 0,_c_f1,_c_f0, \
- _FP_FRAC_WORD_8(_z,6),_FP_FRAC_WORD_8(_z,5), \
- _FP_FRAC_WORD_8(_z,4)); \
- __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(_z,6),_FP_FRAC_WORD_8(_z,5), \
- _FP_FRAC_WORD_8(_z,4), 0,_d_f1,_d_f0, \
- _FP_FRAC_WORD_8(_z,6),_FP_FRAC_WORD_8(_z,5), \
- _FP_FRAC_WORD_8(_z,4)); \
- __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(_z,7),_FP_FRAC_WORD_8(_z,6), \
- _FP_FRAC_WORD_8(_z,5), 0,_e_f1,_e_f0, \
- 0,_FP_FRAC_WORD_8(_z,6),_FP_FRAC_WORD_8(_z,5)); \
- __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(_z,7),_FP_FRAC_WORD_8(_z,6), \
- _FP_FRAC_WORD_8(_z,5), 0,_f_f1,_f_f0, \
- _FP_FRAC_WORD_8(_z,7),_FP_FRAC_WORD_8(_z,6), \
- _FP_FRAC_WORD_8(_z,5)); \
+ __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(R,6),_FP_FRAC_WORD_8(R,5), \
+ _FP_FRAC_WORD_8(R,4), 0,_b_f1,_b_f0, \
+ 0,_FP_FRAC_WORD_8(R,5),_FP_FRAC_WORD_8(R,4)); \
+ __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(R,6),_FP_FRAC_WORD_8(R,5), \
+ _FP_FRAC_WORD_8(R,4), 0,_c_f1,_c_f0, \
+ _FP_FRAC_WORD_8(R,6),_FP_FRAC_WORD_8(R,5), \
+ _FP_FRAC_WORD_8(R,4)); \
+ __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(R,6),_FP_FRAC_WORD_8(R,5), \
+ _FP_FRAC_WORD_8(R,4), 0,_d_f1,_d_f0, \
+ _FP_FRAC_WORD_8(R,6),_FP_FRAC_WORD_8(R,5), \
+ _FP_FRAC_WORD_8(R,4)); \
+ __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(R,7),_FP_FRAC_WORD_8(R,6), \
+ _FP_FRAC_WORD_8(R,5), 0,_e_f1,_e_f0, \
+ 0,_FP_FRAC_WORD_8(R,6),_FP_FRAC_WORD_8(R,5)); \
+ __FP_FRAC_ADD_3(_FP_FRAC_WORD_8(R,7),_FP_FRAC_WORD_8(R,6), \
+ _FP_FRAC_WORD_8(R,5), 0,_f_f1,_f_f0, \
+ _FP_FRAC_WORD_8(R,7),_FP_FRAC_WORD_8(R,6), \
+ _FP_FRAC_WORD_8(R,5)); \
doit(_b_f1, _b_f0, X##_f[3], Y##_f[3]); \
- __FP_FRAC_ADD_2(_FP_FRAC_WORD_8(_z,7),_FP_FRAC_WORD_8(_z,6), \
+ __FP_FRAC_ADD_2(_FP_FRAC_WORD_8(R,7),_FP_FRAC_WORD_8(R,6), \
_b_f1,_b_f0, \
- _FP_FRAC_WORD_8(_z,7),_FP_FRAC_WORD_8(_z,6)); \
+ _FP_FRAC_WORD_8(R,7),_FP_FRAC_WORD_8(R,6)); \
+ } while (0)
+
+#define _FP_MUL_MEAT_4_wide(wfracbits, R, X, Y, doit) \
+ do { \
+ _FP_FRAC_DECL_8(_z); \
+ \
+ _FP_MUL_MEAT_DW_4_wide(wfracbits, _z, X, Y, doit); \
\
/* Normalize since we know where the msb of the multiplicands \
were (bit B), we know that the msb of the of the product is \
@@ -330,11 +339,16 @@
_FP_FRAC_WORD_8(_z,1), _FP_FRAC_WORD_8(_z,0)); \
} while (0)
+#define _FP_MUL_MEAT_DW_4_gmp(wfracbits, R, X, Y) \
+ do { \
+ mpn_mul_n(R##_f, _x_f, _y_f, 4); \
+ } while (0)
+
#define _FP_MUL_MEAT_4_gmp(wfracbits, R, X, Y) \
do { \
_FP_FRAC_DECL_8(_z); \
\
- mpn_mul_n(_z_f, _x_f, _y_f, 4); \
+ _FP_MUL_MEAT_DW_4_gmp(wfracbits, _z, X, Y); \
\
/* Normalize since we know where the msb of the multiplicands \
were (bit B), we know that the msb of the of the product is \
diff --git a/soft-fp/op-common.h b/soft-fp/op-common.h
index c4acb99..bed1e21 100644
--- a/soft-fp/op-common.h
+++ b/soft-fp/op-common.h
@@ -847,6 +847,217 @@ do { \
} while (0)
+/* Fused multiply-add. The input values should be cooked. */
+
+#define _FP_FMA(fs, wc, dwc, R, X, Y, Z) \
+do { \
+ FP_DECL_##fs(T); \
+ T##_s = X##_s ^ Y##_s; \
+ T##_e = X##_e + Y##_e + 1; \
+ switch (_FP_CLS_COMBINE(X##_c, Y##_c)) \
+ { \
+ case _FP_CLS_COMBINE(FP_CLS_NORMAL,FP_CLS_NORMAL): \
+ switch (Z##_c) \
+ { \
+ case FP_CLS_INF: \
+ case FP_CLS_NAN: \
+ R##_s = Z##_s; \
+ _FP_FRAC_COPY_##wc(R, Z); \
+ R##_c = Z##_c; \
+ break; \
+ \
+ case FP_CLS_ZERO: \
+ R##_c = FP_CLS_NORMAL; \
+ R##_s = T##_s; \
+ R##_e = T##_e; \
+ \
+ _FP_MUL_MEAT_##fs(R, X, Y); \
+ \
+ if (_FP_FRAC_OVERP_##wc(fs, R)) \
+ _FP_FRAC_SRS_##wc(R, 1, _FP_WFRACBITS_##fs); \
+ else \
+ R##_e--; \
+ break; \
+ \
+ case FP_CLS_NORMAL:; \
+ _FP_FRAC_DECL_##dwc(TD); \
+ _FP_FRAC_DECL_##dwc(ZD); \
+ _FP_FRAC_DECL_##dwc(RD); \
+ _FP_MUL_MEAT_DW_##fs(TD, X, Y); \
+ R##_e = T##_e; \
+ int tsh = _FP_FRAC_HIGHBIT_DW_##dwc(fs, TD) == 0; \
+ T##_e -= tsh; \
+ int ediff = T##_e - Z##_e; \
+ if (ediff >= 0) \
+ { \
+ int shift = _FP_WFRACBITS_##fs - tsh - ediff; \
+ if (shift <= -_FP_WFRACBITS_##fs) \
+ _FP_FRAC_SET_##dwc(ZD, _FP_MINFRAC_##dwc); \
+ else \
+ { \
+ _FP_FRAC_COPY_##dwc##_##wc(ZD, Z); \
+ if (shift < 0) \
+ _FP_FRAC_SRS_##dwc(ZD, -shift, \
+ _FP_WFRACBITS_DW_##fs); \
+ else if (shift > 0) \
+ _FP_FRAC_SLL_##dwc(ZD, shift); \
+ } \
+ R##_s = T##_s; \
+ if (T##_s == Z##_s) \
+ _FP_FRAC_ADD_##dwc(RD, TD, ZD); \
+ else \
+ { \
+ _FP_FRAC_SUB_##dwc(RD, TD, ZD); \
+ if (_FP_FRAC_NEGP_##dwc(RD)) \
+ { \
+ R##_s = Z##_s; \
+ _FP_FRAC_SUB_##dwc(RD, ZD, TD); \
+ } \
+ } \
+ } \
+ else \
+ { \
+ R##_e = Z##_e; \
+ R##_s = Z##_s; \
+ _FP_FRAC_COPY_##dwc##_##wc(ZD, Z); \
+ _FP_FRAC_SLL_##dwc(ZD, _FP_WFRACBITS_##fs); \
+ int shift = -ediff - tsh; \
+ if (shift >= _FP_WFRACBITS_DW_##fs) \
+ _FP_FRAC_SET_##dwc(TD, _FP_MINFRAC_##dwc); \
+ else if (shift > 0) \
+ _FP_FRAC_SRS_##dwc(TD, shift, \
+ _FP_WFRACBITS_DW_##fs); \
+ if (Z##_s == T##_s) \
+ _FP_FRAC_ADD_##dwc(RD, ZD, TD); \
+ else \
+ _FP_FRAC_SUB_##dwc(RD, ZD, TD); \
+ } \
+ if (_FP_FRAC_ZEROP_##dwc(RD)) \
+ { \
+ if (T##_s == Z##_s) \
+ R##_s = Z##_s; \
+ else \
+ R##_s = (FP_ROUNDMODE == FP_RND_MINF); \
+ _FP_FRAC_SET_##wc(R, _FP_ZEROFRAC_##wc); \
+ R##_c = FP_CLS_ZERO; \
+ } \
+ else \
+ { \
+ int rlz; \
+ _FP_FRAC_CLZ_##dwc(rlz, RD); \
+ rlz -= _FP_WFRACXBITS_DW_##fs; \
+ R##_e -= rlz; \
+ int shift = _FP_WFRACBITS_##fs - rlz; \
+ if (shift > 0) \
+ _FP_FRAC_SRS_##dwc(RD, shift, \
+ _FP_WFRACBITS_DW_##fs); \
+ else if (shift < 0) \
+ _FP_FRAC_SLL_##dwc(RD, -shift); \
+ _FP_FRAC_COPY_##wc##_##dwc(R, RD); \
+ R##_c = FP_CLS_NORMAL; \
+ } \
+ break; \
+ } \
+ goto done_fma; \
+ \
+ case _FP_CLS_COMBINE(FP_CLS_NAN,FP_CLS_NAN): \
+ _FP_CHOOSENAN(fs, wc, T, X, Y, '*'); \
+ break; \
+ \
+ case _FP_CLS_COMBINE(FP_CLS_NAN,FP_CLS_NORMAL): \
+ case _FP_CLS_COMBINE(FP_CLS_NAN,FP_CLS_INF): \
+ case _FP_CLS_COMBINE(FP_CLS_NAN,FP_CLS_ZERO): \
+ T##_s = X##_s; \
+ \
+ case _FP_CLS_COMBINE(FP_CLS_INF,FP_CLS_INF): \
+ case _FP_CLS_COMBINE(FP_CLS_INF,FP_CLS_NORMAL): \
+ case _FP_CLS_COMBINE(FP_CLS_ZERO,FP_CLS_NORMAL): \
+ case _FP_CLS_COMBINE(FP_CLS_ZERO,FP_CLS_ZERO): \
+ _FP_FRAC_COPY_##wc(T, X); \
+ T##_c = X##_c; \
+ break; \
+ \
+ case _FP_CLS_COMBINE(FP_CLS_NORMAL,FP_CLS_NAN): \
+ case _FP_CLS_COMBINE(FP_CLS_INF,FP_CLS_NAN): \
+ case _FP_CLS_COMBINE(FP_CLS_ZERO,FP_CLS_NAN): \
+ T##_s = Y##_s; \
+ \
+ case _FP_CLS_COMBINE(FP_CLS_NORMAL,FP_CLS_INF): \
+ case _FP_CLS_COMBINE(FP_CLS_NORMAL,FP_CLS_ZERO): \
+ _FP_FRAC_COPY_##wc(T, Y); \
+ T##_c = Y##_c; \
+ break; \
+ \
+ case _FP_CLS_COMBINE(FP_CLS_INF,FP_CLS_ZERO): \
+ case _FP_CLS_COMBINE(FP_CLS_ZERO,FP_CLS_INF): \
+ T##_s = _FP_NANSIGN_##fs; \
+ T##_c = FP_CLS_NAN; \
+ _FP_FRAC_SET_##wc(T, _FP_NANFRAC_##fs); \
+ FP_SET_EXCEPTION(FP_EX_INVALID); \
+ break; \
+ \
+ default: \
+ abort(); \
+ } \
+ \
+ /* T = X * Y is zero, infinity or NaN. */ \
+ switch (_FP_CLS_COMBINE(T##_c, Z##_c)) \
+ { \
+ case _FP_CLS_COMBINE(FP_CLS_NAN,FP_CLS_NAN): \
+ _FP_CHOOSENAN(fs, wc, R, T, Z, '+'); \
+ break; \
+ \
+ case _FP_CLS_COMBINE(FP_CLS_NAN,FP_CLS_NORMAL): \
+ case _FP_CLS_COMBINE(FP_CLS_NAN,FP_CLS_INF): \
+ case _FP_CLS_COMBINE(FP_CLS_NAN,FP_CLS_ZERO): \
+ case _FP_CLS_COMBINE(FP_CLS_INF,FP_CLS_NORMAL): \
+ case _FP_CLS_COMBINE(FP_CLS_INF,FP_CLS_ZERO): \
+ R##_s = T##_s; \
+ _FP_FRAC_COPY_##wc(R, T); \
+ R##_c = T##_c; \
+ break; \
+ \
+ case _FP_CLS_COMBINE(FP_CLS_INF,FP_CLS_NAN): \
+ case _FP_CLS_COMBINE(FP_CLS_ZERO,FP_CLS_NAN): \
+ case _FP_CLS_COMBINE(FP_CLS_ZERO,FP_CLS_NORMAL): \
+ case _FP_CLS_COMBINE(FP_CLS_ZERO,FP_CLS_INF): \
+ R##_s = Z##_s; \
+ _FP_FRAC_COPY_##wc(R, Z); \
+ R##_c = Z##_c; \
+ break; \
+ \
+ case _FP_CLS_COMBINE(FP_CLS_INF,FP_CLS_INF): \
+ if (T##_s == Z##_s) \
+ { \
+ R##_s = Z##_s; \
+ _FP_FRAC_COPY_##wc(R, Z); \
+ R##_c = Z##_c; \
+ } \
+ else \
+ { \
+ R##_s = _FP_NANSIGN_##fs; \
+ R##_c = FP_CLS_NAN; \
+ _FP_FRAC_SET_##wc(R, _FP_NANFRAC_##fs); \
+ FP_SET_EXCEPTION(FP_EX_INVALID); \
+ } \
+ break; \
+ \
+ case _FP_CLS_COMBINE(FP_CLS_ZERO,FP_CLS_ZERO): \
+ if (T##_s == Z##_s) \
+ R##_s = Z##_s; \
+ else \
+ R##_s = (FP_ROUNDMODE == FP_RND_MINF); \
+ _FP_FRAC_COPY_##wc(R, Z); \
+ R##_c = Z##_c; \
+ break; \
+ \
+ default: \
+ abort(); \
+ } \
+ done_fma: ; \
+} while (0)
+
+
/*
* Main division routine. The input values should be cooked.
*/
diff --git a/soft-fp/quad.h b/soft-fp/quad.h
index f0aa07e..9a16bf3 100644
--- a/soft-fp/quad.h
+++ b/soft-fp/quad.h
@@ -36,8 +36,10 @@
#if _FP_W_TYPE_SIZE < 64
#define _FP_FRACTBITS_Q (4*_FP_W_TYPE_SIZE)
+#define _FP_FRACTBITS_DW_Q (8*_FP_W_TYPE_SIZE)
#else
#define _FP_FRACTBITS_Q (2*_FP_W_TYPE_SIZE)
+#define _FP_FRACTBITS_DW_Q (4*_FP_W_TYPE_SIZE)
#endif
#define _FP_FRACBITS_Q 113
@@ -59,6 +61,11 @@
#define _FP_OVERFLOW_Q \
((_FP_W_TYPE)1 << (_FP_WFRACBITS_Q % _FP_W_TYPE_SIZE))
+#define _FP_WFRACBITS_DW_Q (2 * _FP_WFRACBITS_Q)
+#define _FP_WFRACXBITS_DW_Q (_FP_FRACTBITS_DW_Q - _FP_WFRACBITS_DW_Q)
+#define _FP_HIGHBIT_DW_Q \
+ ((_FP_W_TYPE)1 << (_FP_WFRACBITS_DW_Q - 1) % _FP_W_TYPE_SIZE)
+
typedef float TFtype __attribute__((mode(TF)));
#if _FP_W_TYPE_SIZE < 64
@@ -155,6 +162,7 @@ union _FP_UNION_Q
#define FP_DIV_Q(R,X,Y) _FP_DIV(Q,4,R,X,Y)
#define FP_SQRT_Q(R,X) _FP_SQRT(Q,4,R,X)
#define _FP_SQRT_MEAT_Q(R,S,T,X,Q) _FP_SQRT_MEAT_4(R,S,T,X,Q)
+#define FP_FMA_Q(R,X,Y,Z) _FP_FMA(Q,4,8,R,X,Y,Z)
#define FP_CMP_Q(r,X,Y,un) _FP_CMP(Q,4,r,X,Y,un)
#define FP_CMP_EQ_Q(r,X,Y) _FP_CMP_EQ(Q,4,r,X,Y)
@@ -166,6 +174,8 @@ union _FP_UNION_Q
#define _FP_FRAC_HIGH_Q(X) _FP_FRAC_HIGH_4(X)
#define _FP_FRAC_HIGH_RAW_Q(X) _FP_FRAC_HIGH_4(X)
+#define _FP_FRAC_HIGH_DW_Q(X) _FP_FRAC_HIGH_8(X)
+
#else /* not _FP_W_TYPE_SIZE < 64 */
union _FP_UNION_Q
{
@@ -256,6 +266,7 @@ union _FP_UNION_Q
#define FP_DIV_Q(R,X,Y) _FP_DIV(Q,2,R,X,Y)
#define FP_SQRT_Q(R,X) _FP_SQRT(Q,2,R,X)
#define _FP_SQRT_MEAT_Q(R,S,T,X,Q) _FP_SQRT_MEAT_2(R,S,T,X,Q)
+#define FP_FMA_Q(R,X,Y,Z) _FP_FMA(Q,2,4,R,X,Y,Z)
#define FP_CMP_Q(r,X,Y,un) _FP_CMP(Q,2,r,X,Y,un)
#define FP_CMP_EQ_Q(r,X,Y) _FP_CMP_EQ(Q,2,r,X,Y)
@@ -267,4 +278,6 @@ union _FP_UNION_Q
#define _FP_FRAC_HIGH_Q(X) _FP_FRAC_HIGH_2(X)
#define _FP_FRAC_HIGH_RAW_Q(X) _FP_FRAC_HIGH_2(X)
+#define _FP_FRAC_HIGH_DW_Q(X) _FP_FRAC_HIGH_4(X)
+
#endif /* not _FP_W_TYPE_SIZE < 64 */
diff --git a/soft-fp/single.h b/soft-fp/single.h
index dec0031..c94f31f 100644
--- a/soft-fp/single.h
+++ b/soft-fp/single.h
@@ -36,6 +36,12 @@
#define _FP_FRACTBITS_S _FP_W_TYPE_SIZE
+#if _FP_W_TYPE_SIZE < 64
+# define _FP_FRACTBITS_DW_S (2 * _FP_W_TYPE_SIZE)
+#else
+# define _FP_FRACTBITS_DW_S _FP_W_TYPE_SIZE
+#endif
+
#define _FP_FRACBITS_S 24
#define _FP_FRACXBITS_S (_FP_FRACTBITS_S - _FP_FRACBITS_S)
#define _FP_WFRACBITS_S (_FP_WORKBITS + _FP_FRACBITS_S)
@@ -49,6 +55,11 @@
#define _FP_IMPLBIT_SH_S ((_FP_W_TYPE)1 << (_FP_FRACBITS_S-1+_FP_WORKBITS))
#define _FP_OVERFLOW_S ((_FP_W_TYPE)1 << (_FP_WFRACBITS_S))
+#define _FP_WFRACBITS_DW_S (2 * _FP_WFRACBITS_S)
+#define _FP_WFRACXBITS_DW_S (_FP_FRACTBITS_DW_S - _FP_WFRACBITS_DW_S)
+#define _FP_HIGHBIT_DW_S \
+ ((_FP_W_TYPE)1 << (_FP_WFRACBITS_DW_S - 1) % _FP_W_TYPE_SIZE)
+
/* The implementation of _FP_MUL_MEAT_S and _FP_DIV_MEAT_S should be
chosen by the target machine. */
@@ -139,6 +150,12 @@ union _FP_UNION_S
#define FP_SQRT_S(R,X) _FP_SQRT(S,1,R,X)
#define _FP_SQRT_MEAT_S(R,S,T,X,Q) _FP_SQRT_MEAT_1(R,S,T,X,Q)
+#if _FP_W_TYPE_SIZE < 64
+# define FP_FMA_S(R, X, Y, Z) _FP_FMA(S, 1, 2, R, X, Y, Z)
+#else
+# define FP_FMA_S(R, X, Y, Z) _FP_FMA(S, 1, 1, R, X, Y, Z)
+#endif
+
#define FP_CMP_S(r,X,Y,un) _FP_CMP(S,1,r,X,Y,un)
#define FP_CMP_EQ_S(r,X,Y) _FP_CMP_EQ(S,1,r,X,Y)
#define FP_CMP_UNORD_S(r,X,Y) _FP_CMP_UNORD(S,1,r,X,Y)
@@ -148,3 +165,9 @@ union _FP_UNION_S
#define _FP_FRAC_HIGH_S(X) _FP_FRAC_HIGH_1(X)
#define _FP_FRAC_HIGH_RAW_S(X) _FP_FRAC_HIGH_1(X)
+
+#if _FP_W_TYPE_SIZE < 64
+# define _FP_FRAC_HIGH_DW_S(X) _FP_FRAC_HIGH_2(X)
+#else
+# define _FP_FRAC_HIGH_DW_S(X) _FP_FRAC_HIGH_1(X)
+#endif
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Implement fma in soft-fp
2013-06-22 17:17 Implement fma in soft-fp Joseph S. Myers
@ 2013-06-25 15:30 ` Richard Henderson
2013-06-25 16:13 ` Joseph S. Myers
2013-06-25 16:31 ` Richard Henderson
2013-06-27 23:26 ` Joseph S. Myers
2 siblings, 1 reply; 9+ messages in thread
From: Richard Henderson @ 2013-06-25 15:30 UTC (permalink / raw)
To: Joseph S. Myers
Cc: libc-alpha, libc-ports, Marcus Shawcroft, Richard Henderson,
David Holsgrove, Chris Metcalf, David S. Miller
On 06/22/2013 10:17 AM, Joseph S. Myers wrote:
> * Using this code on an architecture that detects tininess after
> rounding may require soft-fp support for after-rounding tininess
> detection to be added to avoid failures of fma tests for that
> particular case. (In particular, this is relevant for alpha. It's
> not currently relevant for MIPS because of the general lack of
> exceptions support in the cases where this code gets used.)
I'll give this a bit o testing for alpha and see what happens.
> -#define _FP_MUL_MEAT_1_hard(wfracbits, R, X, Y) \
> +#define _FP_MUL_MEAT_DW_1_hard(wfracbits, R, X, Y) \
> do { \
> - _FP_W_TYPE _xh, _xl, _yh, _yl, _z_f0, _z_f1, _a_f0, _a_f1; \
> + _FP_W_TYPE _xh, _xl, _yh, _yl; \
> + _FP_FRAC_DECL_2(_a); \
> \
> /* split the words in half */ \
> _xh = X##_f >> (_FP_W_TYPE_SIZE/2); \
> @@ -172,17 +184,23 @@ do { \
> _yl = Y##_f & (((_FP_W_TYPE)1 << (_FP_W_TYPE_SIZE/2)) - 1); \
> \
> /* multiply the pieces */ \
> - _z_f0 = _xl * _yl; \
> + R##f0 = _xl * _yl; \
> _a_f0 = _xh * _yl; \
> _a_f1 = _xl * _yh; \
> - _z_f1 = _xh * _yh; \
> + R##f1 = _xh * _yh; \
Missing _ before f[01] there.
I've been wondering from time to time about replacing all of this macro
nonsense with c++. Overloaded functions in some cases, templates in others.
It might eliminate silly typos like this. And be easier to debug.
I believe that all of this code pre-dates the SRA pass by a large margin,
so it's possible the compiler could do the same good structure splitting
job that we did here by hand.
Of course, that would require glibc to build parts with the c++ compiler,
so that may be a non-starter...
r~
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Implement fma in soft-fp
2013-06-25 15:30 ` Richard Henderson
@ 2013-06-25 16:13 ` Joseph S. Myers
2013-06-25 16:23 ` Richard Henderson
0 siblings, 1 reply; 9+ messages in thread
From: Joseph S. Myers @ 2013-06-25 16:13 UTC (permalink / raw)
To: Richard Henderson
Cc: libc-alpha, libc-ports, Marcus Shawcroft, Richard Henderson,
David Holsgrove, Chris Metcalf, David S. Miller
On Tue, 25 Jun 2013, Richard Henderson wrote:
> Of course, that would require glibc to build parts with the c++ compiler,
> so that may be a non-starter...
And libgcc, and the Linux kernel - I don't think requiring a C++ compiler
all over the place would be a good idea (and it would complicate
bootstrapping - you can't build libstdc++ without system headers, but need
to have built libgcc before building glibc, so a bootstrap compiler would
need the C++ compiler and a subset of the runtime libraries).
It's possible something could be done simply with inline always_inline
functions in C. The code doesn't just predate SRA, it predates inlining
on trees. I think I once discussed that issue, for this code, of why
inlines were less efficient than macros with GCC as it stood at that time,
with Peter Maydell around 1997/8.
(But maybe things like getting access to exception / rounding mode state,
which is in a local variable in the toplevel function, would complicate
any such macro-avoidance scheme, including the C++ ones.)
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Implement fma in soft-fp
2013-06-25 16:13 ` Joseph S. Myers
@ 2013-06-25 16:23 ` Richard Henderson
0 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2013-06-25 16:23 UTC (permalink / raw)
To: Joseph S. Myers
Cc: libc-alpha, libc-ports, Marcus Shawcroft, Richard Henderson,
David Holsgrove, Chris Metcalf, David S. Miller
On 06/25/2013 09:13 AM, Joseph S. Myers wrote:
> (But maybe things like getting access to exception / rounding mode state,
> which is in a local variable in the toplevel function, would complicate
> any such macro-avoidance scheme, including the C++ ones.)
Maybe, but I assumed that would be passable by reference in one
exception/rounding state structure.
r~
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Implement fma in soft-fp
2013-06-22 17:17 Implement fma in soft-fp Joseph S. Myers
2013-06-25 15:30 ` Richard Henderson
@ 2013-06-25 16:31 ` Richard Henderson
2013-06-25 16:45 ` Joseph S. Myers
2013-06-27 23:26 ` Joseph S. Myers
2 siblings, 1 reply; 9+ messages in thread
From: Richard Henderson @ 2013-06-25 16:31 UTC (permalink / raw)
To: Joseph S. Myers
Cc: libc-alpha, libc-ports, Marcus Shawcroft, Richard Henderson,
David Holsgrove, Chris Metcalf, David S. Miller
[-- Attachment #1: Type: text/plain, Size: 1321 bytes --]
With the following patch, I get
> testing long double (without inline functions)
> Failure: fma (0x1.fffffffffffffffffffffffffffcp-16382, 0x1.0000000000000000000000000001p-1, 0x1p-16494): Exception "Underflow" set
> Failure: fma (-0x1.fffffffffffffffffffffffffffcp-16382, 0x1.0000000000000000000000000001p-1, -0x1p-16494): Exception "Underflow" set
> Failure: fma (0x1p-16494, -0x1p-16494, 0x1p-16382): Exception "Underflow" set
> Failure: fma (0x1p-16494, 0x1p-16494, -0x1p-16382): Exception "Underflow" set
> Failure: fma_downward (-0x1.fffffffffffffffffffffffffffcp-16382, 0x1.0000000000000000000000000001p-1, -0x1p-16494): Exception "Underflow" set
> Failure: fma_downward (-0x1p-16494, 0x1.1p-1, -0x0.ffffffffffffffffffffffffffffp-16382): Exception "Underflow" set
> Failure: fma_downward (0x1p-16494, 0x1p-16494, -0x1p-16382): Exception "Underflow" set
> Failure: fma_upward (0x1.fffffffffffffffffffffffffffcp-16382, 0x1.0000000000000000000000000001p-1, 0x1p-16494): Exception "Underflow" set
> Failure: fma_upward (0x1p-16494, 0x1.1p-1, 0x0.ffffffffffffffffffffffffffffp-16382): Exception "Underflow" set
> Failure: fma_upward (0x1p-16494, -0x1p-16494, 0x1p-16382): Exception "Underflow" set
but I thought you'd already committed the patch to solve this
as commit 695c378f81263640618bdebf56eaa065f578251f ?
r~
[-- Attachment #2: z --]
[-- Type: text/plain, Size: 3765 bytes --]
diff --git a/ports/sysdeps/alpha/Implies b/ports/sysdeps/alpha/Implies
index d03783b..35ccba3 100644
--- a/ports/sysdeps/alpha/Implies
+++ b/ports/sysdeps/alpha/Implies
@@ -1,7 +1,7 @@
wordsize-64
+alpha/soft-fp
# Alpha uses IEEE 754 single, double and quad precision floating point.
ieee754/ldbl-128
ieee754/dbl-64/wordsize-64
ieee754/dbl-64
ieee754/flt-32
-alpha/soft-fp
diff --git a/ports/sysdeps/alpha/soft-fp/s_fmal.c b/ports/sysdeps/alpha/soft-fp/s_fmal.c
new file mode 100644
index 0000000..8eae7fb
--- /dev/null
+++ b/ports/sysdeps/alpha/soft-fp/s_fmal.c
@@ -0,0 +1,52 @@
+/* Implement fmal using soft-fp.
+ Copyright (C) 2013 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ In addition to the permissions in the GNU Lesser General Public
+ License, the Free Software Foundation gives you unlimited
+ permission to link the compiled version of this file into
+ combinations with other programs, and to distribute those
+ combinations without any restriction coming from the use of this
+ file. (The Lesser General Public License restrictions do apply in
+ other respects; for example, they cover modification of the file,
+ and distribution when not linked into a combine executable.)
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <math.h>
+#include <soft-fp.h>
+#include <quad.h>
+#include <math_ldbl_opt.h>
+
+long double
+__fmal (long double a, long double b, long double c)
+{
+ FP_DECL_EX;
+ FP_DECL_Q(A); FP_DECL_Q(B); FP_DECL_Q(C); FP_DECL_Q(R);
+ long double r;
+ long _round = 4; /* dynamic rounding */
+
+ FP_INIT_ROUNDMODE;
+ FP_UNPACK_Q(A, a);
+ FP_UNPACK_Q(B, b);
+ FP_UNPACK_Q(C, c);
+ FP_FMA_Q(R, A, B, C);
+ FP_PACK_Q(r, R);
+ FP_HANDLE_EXCEPTIONS;
+
+ return r;
+}
+
+long_double_symbol (libm, __fmal, fmal);
diff --git a/ports/sysdeps/alpha/soft-fp/sfp-machine.h b/ports/sysdeps/alpha/soft-fp/sfp-machine.h
index be266fe..98e9850 100644
--- a/ports/sysdeps/alpha/soft-fp/sfp-machine.h
+++ b/ports/sysdeps/alpha/soft-fp/sfp-machine.h
@@ -34,6 +34,13 @@
#define _FP_MUL_MEAT_Q(R,X,Y) \
_FP_MUL_MEAT_2_wide(_FP_WFRACBITS_Q,R,X,Y,umul_ppmm)
+#define _FP_MUL_MEAT_DW_S(R,X,Y) \
+ _FP_MUL_MEAT_DW_1_imm(_FP_WFRACBITS_S,R,X,Y)
+#define _FP_MUL_MEAT_DW_D(R,X,Y) \
+ _FP_MUL_MEAT_DW_1_wide(_FP_WFRACBITS_D,R,X,Y,umul_ppmm)
+#define _FP_MUL_MEAT_DW_Q(R,X,Y) \
+ _FP_MUL_MEAT_DW_2_wide(_FP_WFRACBITS_Q,R,X,Y,umul_ppmm)
+
#define _FP_DIV_MEAT_S(R,X,Y) _FP_DIV_MEAT_1_imm(S,R,X,Y,_FP_DIV_HELP_imm)
#define _FP_DIV_MEAT_D(R,X,Y) _FP_DIV_MEAT_1_udiv_norm(D,R,X,Y)
#define _FP_DIV_MEAT_Q(R,X,Y) _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
diff --git a/ports/sysdeps/unix/sysv/linux/alpha/Implies b/ports/sysdeps/unix/sysv/linux/alpha/Implies
index 1616efe..2a14f00 100644
--- a/ports/sysdeps/unix/sysv/linux/alpha/Implies
+++ b/ports/sysdeps/unix/sysv/linux/alpha/Implies
@@ -1,4 +1,6 @@
unix/sysv/linux/wordsize-64
+# Select the soft-fp versions of some long double implementations.
+alpha/soft-fp
# These supply the ABI compatibility for when long double was double.
ieee754/ldbl-64-128
ieee754/ldbl-opt
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Implement fma in soft-fp
2013-06-25 16:31 ` Richard Henderson
@ 2013-06-25 16:45 ` Joseph S. Myers
0 siblings, 0 replies; 9+ messages in thread
From: Joseph S. Myers @ 2013-06-25 16:45 UTC (permalink / raw)
To: Richard Henderson
Cc: libc-alpha, libc-ports, Marcus Shawcroft, Richard Henderson,
David Holsgrove, Chris Metcalf, David S. Miller
On Tue, 25 Jun 2013, Richard Henderson wrote:
> With the following patch, I get
>
> > testing long double (without inline functions)
> > Failure: fma (0x1.fffffffffffffffffffffffffffcp-16382, 0x1.0000000000000000000000000001p-1, 0x1p-16494): Exception "Underflow" set
> > Failure: fma (-0x1.fffffffffffffffffffffffffffcp-16382, 0x1.0000000000000000000000000001p-1, -0x1p-16494): Exception "Underflow" set
> > Failure: fma (0x1p-16494, -0x1p-16494, 0x1p-16382): Exception "Underflow" set
> > Failure: fma (0x1p-16494, 0x1p-16494, -0x1p-16382): Exception "Underflow" set
> > Failure: fma_downward (-0x1.fffffffffffffffffffffffffffcp-16382, 0x1.0000000000000000000000000001p-1, -0x1p-16494): Exception "Underflow" set
> > Failure: fma_downward (-0x1p-16494, 0x1.1p-1, -0x0.ffffffffffffffffffffffffffffp-16382): Exception "Underflow" set
> > Failure: fma_downward (0x1p-16494, 0x1p-16494, -0x1p-16382): Exception "Underflow" set
> > Failure: fma_upward (0x1.fffffffffffffffffffffffffffcp-16382, 0x1.0000000000000000000000000001p-1, 0x1p-16494): Exception "Underflow" set
> > Failure: fma_upward (0x1p-16494, 0x1.1p-1, 0x0.ffffffffffffffffffffffffffffp-16382): Exception "Underflow" set
> > Failure: fma_upward (0x1p-16494, -0x1p-16494, 0x1p-16382): Exception "Underflow" set
>
> but I thought you'd already committed the patch to solve this
> as commit 695c378f81263640618bdebf56eaa065f578251f ?
No, that fix was about incorrect results rather than exceptions. The
above is what I was referring to about soft-fp not supporting
after-rounding tininess detection. _FP_PACK_SEMIRAW and
_FP_PACK_CANONICAL will need, for results whose before-rounding exponent
is the largest underflowing exponent, to check some new soft-fp macro for
after-rounding architectures and if it's true then determine whether the
result would still underflow if rounded to the normal precision (*not* if
rounded to the subnormal precision with one less bit, so this involves an
extra trial rounding in a temporary variable to determine whether the
result counts as tiny). See what stdlib/strtod_l.c does in
round_and_return if (TININESS_AFTER_ROUNDING && shift == 1), for example.
(But soft-fp would use its own macro definition from sfp-machine.h instead
of the existing tininess.h since it's supposed to be usable outside glibc
and independent of glibc internal headers.)
MIPS is an after-rounding architecture but this issue doesn't cause test
failures there because the testsuite knows not to test exceptions for MIPS
software floating point (which uses fp-bit in libgcc), via math-tests.h.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Implement fma in soft-fp
2013-06-22 17:17 Implement fma in soft-fp Joseph S. Myers
2013-06-25 15:30 ` Richard Henderson
2013-06-25 16:31 ` Richard Henderson
@ 2013-06-27 23:26 ` Joseph S. Myers
2013-07-02 12:57 ` Ping " Joseph S. Myers
2 siblings, 1 reply; 9+ messages in thread
From: Joseph S. Myers @ 2013-06-27 23:26 UTC (permalink / raw)
To: libc-alpha
Cc: libc-ports, Marcus Shawcroft, Richard Henderson, David Holsgrove,
Chris Metcalf, David S. Miller
Does someone wish to review this patch
<http://sourceware.org/ml/libc-alpha/2013-06/msg00855.html> (given the
typo fixes Richard mentioned in
<http://sourceware.org/ml/libc-alpha/2013-06/msg00927.html>)? I'd rather
like to get the test results improvements for soft-float systems in for
2.18; I think other architecture maintainers can reasonably do their
architecture changes to use this code, if desired, during the freeze (I
intend to do such a followup patch for soft-float ARM), but the big patch
wouldn't be appropriate after the freeze starts.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Ping Re: Implement fma in soft-fp
2013-06-27 23:26 ` Joseph S. Myers
@ 2013-07-02 12:57 ` Joseph S. Myers
2013-07-02 14:45 ` Richard Henderson
0 siblings, 1 reply; 9+ messages in thread
From: Joseph S. Myers @ 2013-07-02 12:57 UTC (permalink / raw)
To: libc-alpha
Cc: libc-ports, Marcus Shawcroft, Richard Henderson, David Holsgrove,
Chris Metcalf, David S. Miller
Ping. This patch
<http://sourceware.org/ml/libc-alpha/2013-06/msg00855.html> (with the typo
fixes Richard mentioned in
<http://sourceware.org/ml/libc-alpha/2013-06/msg00927.html>) is pending
review.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Ping Re: Implement fma in soft-fp
2013-07-02 12:57 ` Ping " Joseph S. Myers
@ 2013-07-02 14:45 ` Richard Henderson
0 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2013-07-02 14:45 UTC (permalink / raw)
To: Joseph S. Myers
Cc: libc-alpha, libc-ports, Marcus Shawcroft, Richard Henderson,
David Holsgrove, Chris Metcalf, David S. Miller
On 07/02/2013 05:57 AM, Joseph S. Myers wrote:
> Ping. This patch
> <http://sourceware.org/ml/libc-alpha/2013-06/msg00855.html> (with the typo
> fixes Richard mentioned in
> <http://sourceware.org/ml/libc-alpha/2013-06/msg00927.html>) is pending
> review.
The patch looks good to me.
r~
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-07-02 14:45 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-22 17:17 Implement fma in soft-fp Joseph S. Myers
2013-06-25 15:30 ` Richard Henderson
2013-06-25 16:13 ` Joseph S. Myers
2013-06-25 16:23 ` Richard Henderson
2013-06-25 16:31 ` Richard Henderson
2013-06-25 16:45 ` Joseph S. Myers
2013-06-27 23:26 ` Joseph S. Myers
2013-07-02 12:57 ` Ping " Joseph S. Myers
2013-07-02 14:45 ` Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).