* [RFC V4] Enable libmvec support for RISC-V
@ 2024-04-15 7:21 shiyulong
2024-04-25 5:07 ` Jeff Law
0 siblings, 1 reply; 7+ messages in thread
From: shiyulong @ 2024-04-15 7:21 UTC (permalink / raw)
To: libc-alpha
Cc: palmer, darius, andrew, maskray, kito.cheng, wuwei2016, jiawei,
shihua, chenyixuan, yulong
From: yulong <shiyulong@iscas.ac.cn>
Diff: Chande the version from GLIBC_2.39 to GLIBC_2.40.
This patch tries to enable libmvec on RISC-V. I also have demonstrated
how this all fits together by adding implementations for vector cos.
This patch is a try and we hope to receive valuable comments.
Thanks,
yulong
---
sysdeps/riscv/configure | 4 +
sysdeps/riscv/configure.ac | 4 +
sysdeps/riscv/rvd/Makefile | 5 +
sysdeps/riscv/rvd/Versions | 5 +
sysdeps/riscv/rvd/bits/math-vector.h | 29 ++++
sysdeps/riscv/rvd/cos.c | 94 ++++++++++++
sysdeps/riscv/rvd/math_private.h | 42 ++++++
sysdeps/riscv/rvd/v_math.h | 139 ++++++++++++++++++
sysdeps/riscv/rvd/vecmath_config.h | 33 +++++
sysdeps/unix/sysv/linux/riscv/libmvec.abilist | 1 +
10 files changed, 356 insertions(+)
mode change 100644 => 100755 sysdeps/riscv/configure
create mode 100644 sysdeps/riscv/rvd/Makefile
create mode 100644 sysdeps/riscv/rvd/Versions
create mode 100644 sysdeps/riscv/rvd/bits/math-vector.h
create mode 100644 sysdeps/riscv/rvd/cos.c
create mode 100644 sysdeps/riscv/rvd/math_private.h
create mode 100644 sysdeps/riscv/rvd/v_math.h
create mode 100644 sysdeps/riscv/rvd/vecmath_config.h
create mode 100644 sysdeps/unix/sysv/linux/riscv/libmvec.abilist
diff --git a/sysdeps/riscv/configure b/sysdeps/riscv/configure
old mode 100644
new mode 100755
index c8f01709f8..a6d0b4becb
--- a/sysdeps/riscv/configure
+++ b/sysdeps/riscv/configure
@@ -80,3 +80,7 @@ if test "$libc_cv_static_pie_on_riscv" = yes; then
printf "%s\n" "#define SUPPORT_STATIC_PIE 1" >>confdefs.h
fi
+
+if test x"$build_mathvec" = xnotset; then
+ build_mathvec=yes
+fi
diff --git a/sysdeps/riscv/configure.ac b/sysdeps/riscv/configure.ac
index ee3d1ed014..b1c1105baa 100644
--- a/sysdeps/riscv/configure.ac
+++ b/sysdeps/riscv/configure.ac
@@ -43,3 +43,7 @@ EOF
if test "$libc_cv_static_pie_on_riscv" = yes; then
AC_DEFINE(SUPPORT_STATIC_PIE)
fi
+
+if test x"$build_mathvec" = xnotset; then
+ build_mathvec=yes
+fi
diff --git a/sysdeps/riscv/rvd/Makefile b/sysdeps/riscv/rvd/Makefile
new file mode 100644
index 0000000000..1adb2ee582
--- /dev/null
+++ b/sysdeps/riscv/rvd/Makefile
@@ -0,0 +1,5 @@
+libmvec-supported-funcs = cos
+
+ifeq ($(subdir),mathvec)
+libmvec-support = $(addprefix d,$(libmvec-supported-funcs))
+endif
diff --git a/sysdeps/riscv/rvd/Versions b/sysdeps/riscv/rvd/Versions
new file mode 100644
index 0000000000..0fd283329c
--- /dev/null
+++ b/sysdeps/riscv/rvd/Versions
@@ -0,0 +1,5 @@
+libmvec {
+ GLIBC_2.40 {
+ _ZGVnN2v_cos;
+ }
+}
diff --git a/sysdeps/riscv/rvd/bits/math-vector.h b/sysdeps/riscv/rvd/bits/math-vector.h
new file mode 100644
index 0000000000..b34ffc9bc1
--- /dev/null
+++ b/sysdeps/riscv/rvd/bits/math-vector.h
@@ -0,0 +1,29 @@
+/* Platform-specific SIMD declarations of math functions.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly;\
+ include <math.h> instead."
+#endif
+
+#if defined __riscv__
+# define __DECL_RVV_RISCV _Pragma
+# undef __DECL_RVV_cos
+# define __DECL_RVV_cos __DECL_RVV_RISCV
+#endif
diff --git a/sysdeps/riscv/rvd/cos.c b/sysdeps/riscv/rvd/cos.c
new file mode 100644
index 0000000000..1806acd629
--- /dev/null
+++ b/sysdeps/riscv/rvd/cos.c
@@ -0,0 +1,94 @@
+/* Double-precision vector cos function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "v_math.h"
+
+
+static const struct data
+{
+ vfloat64m2_t poly[7];
+ vfloat64m2_t range_val, shift, inv_pi, half_pi, pi_1, pi_2, pi_3;
+} data = {
+ /* Worst-case error is 3.3 ulp in [-pi/2, pi/2]. */
+ .poly = { V2 (-0x1.555555555547bp-3), V2 (0x1.1111111108a4dp-7),
+ V2 (-0x1.a01a019936f27p-13), V2 (0x1.71de37a97d93ep-19),
+ V2 (-0x1.ae633919987c6p-26), V2 (0x1.60e277ae07cecp-33),
+ V2 (-0x1.9e9540300a1p-41) },
+ .inv_pi = V2 (0x1.45f306dc9c883p-2),
+ .half_pi = V2 (0x1.921fb54442d18p+0),
+ .pi_1 = V2 (0x1.921fb54442d18p+1),
+ .pi_2 = V2 (0x1.1a62633145c06p-53),
+ .pi_3 = V2 (0x1.c1cd129024e09p-106),
+ .shift = V2 (0x1.8p52),
+ .range_val = V2 (0x1p23)
+};
+
+#define C(i) d->poly[i]
+
+static vfloat64m2_t NOINLINE
+special_case (vfloat64m2_t x, vfloat64m2_t y, vuint64m2_t odd, vuint64m2_t cmp)
+{
+ y = vreinterpret_v_u64m2_f64m2 (vor (vreinterpret_v_f64m2_u64m2 (y), odd, 1));
+ return v_call_f64 (cos, x, y, cmp);
+}
+
+vfloat64m2_t V_NAME_D1 (cos) (vfloat64m2_t x)
+{
+ const struct data *d = ptr_barrier (&data);
+ vfloat64m2_t n, r, r2, r3, r4, t1, t2, t3, y;
+ vuint64m2_t odd, cmp;
+
+ r = vfabs_v_f64m2 (x, 2);
+ cmp = (vuint64m2_t) vmsgeu (vreinterpret_v_f64m2_u64m2 (r),
+ vreinterpret_v_f64m2_u64m2 (d->range_val));
+ if (__glibc_unlikely (v_any_u64 (cmp)))
+ /* If fenv exceptions are to be triggered correctly, set any special lanes
+ to 1 (which is neutral w.r.t. fenv). These lanes will be fixed by
+ special-case handler later. */
+ r = vmsltu (cmp, v_f64 (1.0), r);
+
+ /* n = rint((|x|+pi/2)/pi) - 0.5. */
+ n = vfmadd (d->shift, d->inv_pi, vfadd (r, d->half_pi,2), 2);
+ odd = vshlq_n_u64 (vreinterpret_v_f64m2_u64m2 (n), 63);
+ n = vfsub (n, d->shift, 2);
+ n = vfsub (n, v_f64 (0.5), 2);
+
+ /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */
+ r = vfmsub (r, d->pi_1, n, 2);
+ r = vfmsub (r, d->pi_2, n, 2);
+ r = vfmsub (r, d->pi_3, n, 2);
+
+ /* sin(r) poly approx. */
+ r2 = vfmul (r, r, 2);
+ r3 = vfmul (r2, r, 2);
+ r4 = vfmul (r2, r2, 2);
+
+ t1 = vfmadd (C (4), C (5), r2, 2);
+ t2 = vfmadd (C (2), C (3), r2, 2);
+ t3 = vfmadd (C (0), C (1), r2, 2);
+
+ y = vfmadd (t1, C (6), r4, 2);
+ y = vfmadd (t2, y, r4, 2);
+ y = vfmadd (t3, y, r4, 2);
+ y = vfmadd (r, y, r3, 2);
+
+ if (__glibc_unlikely (v_any_u64 (cmp)))
+ return special_case (x, y, odd, cmp);
+ return vreinterpretq_f64_u64 (vor (vreinterpret_v_f64m2_u64m2 (y), odd, 2));
+}
diff --git a/sysdeps/riscv/rvd/math_private.h b/sysdeps/riscv/rvd/math_private.h
new file mode 100644
index 0000000000..655a4dcd55
--- /dev/null
+++ b/sysdeps/riscv/rvd/math_private.h
@@ -0,0 +1,42 @@
+/* Configure optimized libm functions. RISC-V version.
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#ifndef RISCV_MATH_PRIVATE_H
+#define RISCV_MATH_PRIVATE_H 1
+
+#include <stdint.h>
+#include <math.h>
+
+/* Use inline round and lround instructions. */
+#define TOINT_INTRINSICS 1
+
+static inline double_t
+roundtoint (double_t x)
+{
+ return round (x);
+}
+
+static inline int32_t
+converttoint (double_t x)
+{
+ return lround (x);
+}
+
+#include_next <math_private.h>
+
+#endif
diff --git a/sysdeps/riscv/rvd/v_math.h b/sysdeps/riscv/rvd/v_math.h
new file mode 100644
index 0000000000..d2e821aeb2
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_math.h
@@ -0,0 +1,139 @@
+/* Utilities for Advanced SIMD libmvec routines.
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#ifndef _V_MATH_H
+#define _V_MATH_H
+
+#include <riscv_vector.h>
+#include "vecmath_config.h"
+
+#define V_NAME_D1(fun) _ZGVnN2v_##fun
+
+/* Shorthand helpers for declaring constants. */
+#define V2(X) { X, X }
+#define V4(X) { X, X, X, X }
+#define V8(X) { X, X, X, X, X, X, X, X }
+
+static inline vfloat32m4_t
+v_f32 (float x)
+{
+ return (vfloat32m4_t) V4 (x);
+}
+static inline vuint32m4_t
+v_u32 (uint32_t x)
+{
+ return (vuint32m4_t) V4 (x);
+}
+static inline vint32m4_t
+v_s32 (int32_t x)
+{
+ return (vint32m4_t) V4 (x);
+}
+
+/* true if any elements of a vector compare result is non-zero. */
+static inline int
+v_any_u32 (vuint32m4_t x)
+{
+ /* assume elements in x are either 0 or -1u. */
+ return vpaddd_u64 (vreinterpret_v_u64m2_u32m2 (x)) != 0;
+}
+static inline int
+v_any_u32h (vuint32m2_t x)
+{
+ return vget_lane_u64 (vreinterpret_v_u32m2_u64m2 (x), 0) != 0;
+}
+static inline vfloat32m4_t
+v_lookup_f32 (const float *tab, vuint32m4_t idx)
+{
+ return (vfloat32m4_t){ tab[idx[0]], tab[idx[1]], tab[idx[2]], tab[idx[3]] };
+}
+static inline vuint32m4_t
+v_lookup_u32 (const uint32_t *tab, vuint32m4_t idx)
+{
+ return (vuint32m4_t){ tab[idx[0]], tab[idx[1]], tab[idx[2]], tab[idx[3]] };
+}
+static inline vfloat32m4_t
+v_call_f32 (float (*f) (float), vfloat32m4_t x, vfloat32m4_t y, vuint32m4_t p)
+{
+ return (vfloat32m4_t){ p[0] ? f (x[0]) : y[0], p[1] ? f (x[1]) : y[1],
+ p[2] ? f (x[2]) : y[2], p[3] ? f (x[3]) : y[3] };
+}
+static inline vfloat32m4_t
+v_call2_f32 (float (*f) (float, float), vfloat32m4_t x1, vfloat32m4_t x2,
+ vfloat32m4_t y, vuint32m4_t p)
+{
+ return (vfloat32m4_t){ p[0] ? f (x1[0], x2[0]) : y[0],
+ p[1] ? f (x1[1], x2[1]) : y[1],
+ p[2] ? f (x1[2], x2[2]) : y[2],
+ p[3] ? f (x1[3], x2[3]) : y[3] };
+}
+
+static inline vfloat64m2_t
+v_f64 (double x)
+{
+ return (vfloat64m2_t) V2 (x);
+}
+static inline vuint64m2_t
+v_u64 (uint64_t x)
+{
+ return (vuint64m2_t) V2 (x);
+}
+static inline vint64m2_t
+v_s64 (int64_t x)
+{
+ return (vint64m2_t) V2 (x);
+}
+
+/* true if any elements of a vector compare result is non-zero. */
+static inline int
+v_any_u64 (vuint64m1_t x)
+{
+ /* assume elements in x are either 0 or -1u. */
+ return vpaddd_u64 (x) != 0;
+}
+/* true if all elements of a vector compare result is 1. */
+static inline int
+v_all_u64 (vuint64m1_t x)
+{
+ /* assume elements in x are either 0 or -1u. */
+ return vpaddd_s64 (vreinterpretq_s64_u64 (x)) == -2;
+}
+static inline vfloat64m1_t
+v_lookup_f64 (const double *tab, vuint64m1_t idx)
+{
+ return (vfloat64m1_t){ tab[idx[0]], tab[idx[1]] };
+}
+static inline vuint64m1_t
+v_lookup_u64 (const uint64_t *tab, vuint64m1_t idx)
+{
+ return (vuint64m1_t){ tab[idx[0]], tab[idx[1]] };
+}
+static inline vfloat64m1_t
+v_call_f64 (double (*f) (double), vfloat64m1_t x, vfloat64m1_t y, vuint64m1_t p)
+{
+ return (vfloat64m1_t){ p[0] ? f (x[0]) : y[0], p[1] ? f (x[1]) : y[1] };
+}
+static inline vfloat64m1_t
+v_call2_f64 (double (*f) (double, double), vfloat64m1_t x1, vfloat64m1_t x2,
+ vfloat64m1_t y, vuint64m1_t p)
+{
+ return (vfloat64m1_t){ p[0] ? f (x1[0], x2[0]) : y[0],
+ p[1] ? f (x1[1], x2[1]) : y[1] };
+}
+
+#endif
diff --git a/sysdeps/riscv/rvd/vecmath_config.h b/sysdeps/riscv/rvd/vecmath_config.h
new file mode 100644
index 0000000000..290ea1e33c
--- /dev/null
+++ b/sysdeps/riscv/rvd/vecmath_config.h
@@ -0,0 +1,33 @@
+/* Configuration for libmvec routines.
+ Copyright (C) 2023 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#ifndef _VECMATH_CONFIG_H
+#define _VECMATH_CONFIG_H
+
+#include <math_private.h>
+
+/* Return ptr but hide its value from the compiler so accesses through it
+ cannot be optimized based on the contents. */
+#define ptr_barrier(ptr) \
+ ({ \
+ __typeof (ptr) __ptr = (ptr); \
+ __asm("" : "+r"(__ptr)); \
+ __ptr; \
+ })
+
+#endif
diff --git a/sysdeps/unix/sysv/linux/riscv/libmvec.abilist b/sysdeps/unix/sysv/linux/riscv/libmvec.abilist
new file mode 100644
index 0000000000..fe8141b189
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/riscv/libmvec.abilist
@@ -0,0 +1 @@
+GLIBC_2.40 _ZGVnN2v_cos F
--
2.34.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC V4] Enable libmvec support for RISC-V
2024-04-15 7:21 [RFC V4] Enable libmvec support for RISC-V shiyulong
@ 2024-04-25 5:07 ` Jeff Law
2024-04-29 1:12 ` yulong
2024-04-30 16:26 ` Palmer Dabbelt
0 siblings, 2 replies; 7+ messages in thread
From: Jeff Law @ 2024-04-25 5:07 UTC (permalink / raw)
To: shiyulong, libc-alpha
Cc: palmer, darius, andrew, maskray, kito.cheng, wuwei2016, jiawei,
shihua, chenyixuan
On 4/15/24 1:21 AM, shiyulong@iscas.ac.cn wrote:
> From: yulong <shiyulong@iscas.ac.cn>
>
> Diff: Chande the version from GLIBC_2.39 to GLIBC_2.40.
> This patch tries to enable libmvec on RISC-V. I also have demonstrated
> how this all fits together by adding implementations for vector cos.
> This patch is a try and we hope to receive valuable comments.
Just an FYI -- Palmer's team over at Rivos have implementations for a
number of routines that would fit into libmvec. You might reach out to
Ping Tak Peter Tang <ptpt@rivosinc.com> for information in his
implementation.
> https://github.com/rivosinc/veclibm/
THeir implementations may provide good guidance on performant
implementations of various routines that libmvec typically provides.
jeff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC V4] Enable libmvec support for RISC-V
2024-04-25 5:07 ` Jeff Law
@ 2024-04-29 1:12 ` yulong
2024-04-30 16:26 ` Palmer Dabbelt
1 sibling, 0 replies; 7+ messages in thread
From: yulong @ 2024-04-29 1:12 UTC (permalink / raw)
To: Jeff Law, libc-alpha
Cc: palmer, darius, andrew, maskray, kito.cheng, wuwei2016, jiawei,
shihua, chenyixuan
在 2024/4/25 13:07, Jeff Law 写道:
>
>
> On 4/15/24 1:21 AM, shiyulong@iscas.ac.cn wrote:
>> From: yulong <shiyulong@iscas.ac.cn>
>>
>> Diff: Chande the version from GLIBC_2.39 to GLIBC_2.40.
>> This patch tries to enable libmvec on RISC-V. I also have demonstrated
>> how this all fits together by adding implementations for vector cos.
>> This patch is a try and we hope to receive valuable comments.
> Just an FYI -- Palmer's team over at Rivos have implementations for a
> number of routines that would fit into libmvec. You might reach out
> to Ping Tak Peter Tang <ptpt@rivosinc.com> for information in his
> implementation.
>
>> https://github.com/rivosinc/veclibm/
>
>
> THeir implementations may provide good guidance on performant
> implementations of various routines that libmvec typically provides.
>
> jeff
Thanks Jeff for your advice, I'm working on a new implementation after
reading the above code.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC V4] Enable libmvec support for RISC-V
2024-04-25 5:07 ` Jeff Law
2024-04-29 1:12 ` yulong
@ 2024-04-30 16:26 ` Palmer Dabbelt
2024-05-10 13:06 ` yulong
1 sibling, 1 reply; 7+ messages in thread
From: Palmer Dabbelt @ 2024-04-30 16:26 UTC (permalink / raw)
To: jeffreyalaw
Cc: shiyulong, libc-alpha, Darius Rad, Andrew Waterman, maskray,
kito.cheng, wuwei2016, jiawei, shihua, chenyixuan
On Wed, 24 Apr 2024 22:07:31 PDT (-0700), jeffreyalaw@gmail.com wrote:
>
>
> On 4/15/24 1:21 AM, shiyulong@iscas.ac.cn wrote:
>> From: yulong <shiyulong@iscas.ac.cn>
>>
>> Diff: Chande the version from GLIBC_2.39 to GLIBC_2.40.
>> This patch tries to enable libmvec on RISC-V. I also have demonstrated
>> how this all fits together by adding implementations for vector cos.
>> This patch is a try and we hope to receive valuable comments.
> Just an FYI -- Palmer's team over at Rivos have implementations for a
> number of routines that would fit into libmvec. You might reach out to
> Ping Tak Peter Tang <ptpt@rivosinc.com> for information in his
> implementation.
>
>> https://github.com/rivosinc/veclibm/
>
>
> THeir implementations may provide good guidance on performant
> implementations of various routines that libmvec typically provides.
Ya, that's the idea of veclibm. The actual functions are written in a
way that's more suitable for some other libraries, but the core
computational implemenations should be the same. A few of us had
briefly talked internally about getting these into glibc, IIUC all the
code was written at Rivos and thus could be copyright assigned to the
FSF and used in glibc. We don't have time to do that right now, but if
you're interested in helping that'd be awesome. We'll need to be
careful with the copyright/licensing, though.
That said, I've never really quite managed to figure out how all the
libmvec stuff is supposed to fit together. I'm more worried about the
ABI side of things than the implementation, so I think starting with
just one function to get the ABI template figure out is a reasonable way
to go and we can get the rest of the implementations ported over next.
The first thing that jumps out on the ABI side of things is cos() taking
EMUL=2 types, I'm not sure if there's a reason for that but it seems
we'd want EMUL=1 to fit more data in the argument registers?
Also, I think some of this can be split out: the roundtoint/converttoint
isn't really a libmvec thing (see
https://inbox.sourceware.org/libc-alpha/20220803174258.4235-1-palmer@rivosinc.com/,
which fails some test), and ptr_barrier() can probably be pulled out to
something generic as it's the same as arm64's version.
I'm also only seeing draft versions of the vector intrinsics. I know we
merged them into GCC and usually that means things are stable, but we
merged these pre-freeze (based on some assertions things wouldn't
change) and things have drifted around a bit it the spec. I think we're
probably safe just depending on the types, if there's no frozen version
we should at least write down exactly which version we're following
though.
Also: are there GCC patches for these? It'd be great to be able to test
things through the whole codegen stack so we can make sure it works.
>
> jeff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC V4] Enable libmvec support for RISC-V
2024-04-30 16:26 ` Palmer Dabbelt
@ 2024-05-10 13:06 ` yulong
2024-11-04 4:41 ` Zhijin Zeng
0 siblings, 1 reply; 7+ messages in thread
From: yulong @ 2024-05-10 13:06 UTC (permalink / raw)
To: Palmer Dabbelt, jeffreyalaw
Cc: libc-alpha, Darius Rad, Andrew Waterman, maskray, kito.cheng,
wuwei2016, jiawei, shihua, chenyixuan
[-- Attachment #1: Type: text/plain, Size: 3435 bytes --]
在 2024/5/1 0:26, Palmer Dabbelt 写道:
> On Wed, 24 Apr 2024 22:07:31 PDT (-0700), jeffreyalaw@gmail.com wrote:
>>
>>
>> On 4/15/24 1:21 AM, shiyulong@iscas.ac.cn wrote:
>>> From: yulong <shiyulong@iscas.ac.cn>
>>>
>>> Diff: Chande the version from GLIBC_2.39 to GLIBC_2.40.
>>> This patch tries to enable libmvec on RISC-V. I also have demonstrated
>>> how this all fits together by adding implementations for vector cos.
>>> This patch is a try and we hope to receive valuable comments.
>> Just an FYI -- Palmer's team over at Rivos have implementations for a
>> number of routines that would fit into libmvec. You might reach out to
>> Ping Tak Peter Tang <ptpt@rivosinc.com> for information in his
>> implementation.
>>
>>> https://github.com/rivosinc/veclibm/
>>
>>
>> THeir implementations may provide good guidance on performant
>> implementations of various routines that libmvec typically provides.
>
> Ya, that's the idea of veclibm. The actual functions are written in a
> way that's more suitable for some other libraries, but the core
> computational implemenations should be the same. A few of us had
> briefly talked internally about getting these into glibc, IIUC all the
> code was written at Rivos and thus could be copyright assigned to the
> FSF and used in glibc. We don't have time to do that right now, but
> if you're interested in helping that'd be awesome. We'll need to be
> careful with the copyright/licensing, though.
Thanks for your reply. I also received an email from Peter Tang. I am
very interested in contributing to glibc.
>
> That said, I've never really quite managed to figure out how all the
> libmvec stuff is supposed to fit together. I'm more worried about the
> ABI side of things than the implementation, so I think starting with
> just one function to get the ABI template figure out is a reasonable
> way to go and we can get the rest of the implementations ported over
> next. The first thing that jumps out on the ABI side of things is
> cos() taking EMUL=2 types, I'm not sure if there's a reason for that
> but it seems we'd want EMUL=1 to fit more data in the argument registers?
Setting EMUL=2 is just a personal experiment. I think you are right and
I will improve it in the next version.
>
> Also, I think some of this can be split out: the
> roundtoint/converttoint isn't really a libmvec thing (see
> https://inbox.sourceware.org/libc-alpha/20220803174258.4235-1-palmer@rivosinc.com/,
> which fails some test), and ptr_barrier() can probably be pulled out
> to something generic as it's the same as arm64's version.
>
> I'm also only seeing draft versions of the vector intrinsics. I know
> we merged them into GCC and usually that means things are stable, but
> we merged these pre-freeze (based on some assertions things wouldn't
> change) and things have drifted around a bit it the spec. I think
> we're probably safe just depending on the types, if there's no frozen
> version we should at least write down exactly which version we're
> following though.
We are currently developing based on the latest branches. Can we declare
that we are following RVV 1.0?
>
> Also: are there GCC patches for these? It'd be great to be able to
> test things through the whole codegen stack so we can make sure it works.
Unfortunately, there are no patches for GCC right now. This may be the
direction of future work.
>
>>
>> jeff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC V4] Enable libmvec support for RISC-V
2024-05-10 13:06 ` yulong
@ 2024-11-04 4:41 ` Zhijin Zeng
2024-11-05 3:06 ` yulong
0 siblings, 1 reply; 7+ messages in thread
From: Zhijin Zeng @ 2024-11-04 4:41 UTC (permalink / raw)
To: yulong, Palmer Dabbelt, jeffreyalaw
Cc: libc-alpha, Darius Rad, Andrew Waterman, maskray, kito.cheng,
wuwei2016, jiawei, shihua, chenyixuan
[-- Attachment #1: Type: text/plain, Size: 6225 bytes --]
Hi yulong, do you have any further progress? I finish a new version
libmvec support for risc-v, which also base on implementations by
Palmer's team over at Rivos.
https://github.com/rivosinc/veclibm/
I can't find the vector function name mangling of risc-v, so I define it
as follows, maybe it's incorrect, but I think it's worhting discussing.
_ZGV<x>N<y>v<v...>_<func_name>
'x' is the LMUL, if the LMUL is 1/2/4/8 and 'x' is 1/2/4/8.
'y' is the count of elements also 'simdlen' in gcc.
'v..' depends on the number of parameter, there are as many 'v'
characters as there are parameters.
'func_name' is the scalar function name.
This path have supported vectorized version for the following math
function in risc-v (although now only support VLENB <= 256, it's very
easy to extend to larger VLENB). Besides, I also finish the gcc patch to
support libmvec in risc-v.
exp/asin/atan/acos/atanh/exp10/exp2/tan/tanh/pow/sin/log/cos/acosh/asinh/atan2/expm1/tgamma/lgamma/log2/log10/cbrt/erfc/erf/cosh/sinh
Hi Palmer, I temporarily change the Copyright information in some files
which come from veclibm, it's not a viaolation of your Copyright,
actually I don't know how to solve the conflict between LGPL and
Apache2.0. If you know, please tell me to fix it, thank you.
Zhijin Zeng
在 2024/5/10 21:06, yulong 写道:
>
> 在 2024/5/1 0:26, Palmer Dabbelt 写道:
>> On Wed, 24 Apr 2024 22:07:31 PDT (-0700), jeffreyalaw@gmail.com wrote:
>>>
>>>
>>> On 4/15/24 1:21 AM, shiyulong@iscas.ac.cn wrote:
>>>> From: yulong <shiyulong@iscas.ac.cn>
>>>>
>>>> Diff: Chande the version from GLIBC_2.39 to GLIBC_2.40.
>>>> This patch tries to enable libmvec on RISC-V. I also have demonstrated
>>>> how this all fits together by adding implementations for vector cos.
>>>> This patch is a try and we hope to receive valuable comments.
>>> Just an FYI -- Palmer's team over at Rivos have implementations for a
>>> number of routines that would fit into libmvec. You might reach out to
>>> Ping Tak Peter Tang <ptpt@rivosinc.com> for information in his
>>> implementation.
>>>
>>>> https://github.com/rivosinc/veclibm/
>>>
>>>
>>> THeir implementations may provide good guidance on performant
>>> implementations of various routines that libmvec typically provides.
>>
>> Ya, that's the idea of veclibm. The actual functions are written in
>> a way that's more suitable for some other libraries, but the core
>> computational implemenations should be the same. A few of us had
>> briefly talked internally about getting these into glibc, IIUC all
>> the code was written at Rivos and thus could be copyright assigned to
>> the FSF and used in glibc. We don't have time to do that right now,
>> but if you're interested in helping that'd be awesome. We'll need to
>> be careful with the copyright/licensing, though.
> Thanks for your reply. I also received an email from Peter Tang. I
> am very interested in contributing to glibc.
>>
>> That said, I've never really quite managed to figure out how all the
>> libmvec stuff is supposed to fit together. I'm more worried about
>> the ABI side of things than the implementation, so I think starting
>> with just one function to get the ABI template figure out is a
>> reasonable way to go and we can get the rest of the implementations
>> ported over next. The first thing that jumps out on the ABI side of
>> things is cos() taking EMUL=2 types, I'm not sure if there's a reason
>> for that but it seems we'd want EMUL=1 to fit more data in the
>> argument registers?
> Setting EMUL=2 is just a personal experiment. I think you are right
> and I will improve it in the next version.
>>
>> Also, I think some of this can be split out: the
>> roundtoint/converttoint isn't really a libmvec thing (see
>> https://inbox.sourceware.org/libc-alpha/20220803174258.4235-1-palmer@rivosinc.com/,
>> which fails some test), and ptr_barrier() can probably be pulled out
>> to something generic as it's the same as arm64's version.
>>
>> I'm also only seeing draft versions of the vector intrinsics. I know
>> we merged them into GCC and usually that means things are stable, but
>> we merged these pre-freeze (based on some assertions things wouldn't
>> change) and things have drifted around a bit it the spec. I think
>> we're probably safe just depending on the types, if there's no frozen
>> version we should at least write down exactly which version we're
>> following though.
> We are currently developing based on the latest branches. Can we
> declare that we are following RVV 1.0?
>>
>> Also: are there GCC patches for these? It'd be great to be able to
>> test things through the whole codegen stack so we can make sure it
>> works.
> Unfortunately, there are no patches for GCC right now. This may be the
> direction of future work.
>>
>>>
>>> jeff
This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not an intended recipient of this message, please delete it and any attachment from your system and notify the sender immediately by reply e-mail. Unintended recipients should not use, copy, disclose or take any action based on this message or any information contained in this message. Emails cannot be guaranteed to be secure or error free as they can be intercepted, amended, lost or destroyed, and you should take full responsibility for security checking.
本邮件及其任何附件具有保密性质,并可能受其他保护或不允许被披露给第三方。如阁下误收到本邮件,敬请立即以回复电子邮件的方式通知发件人,并将本邮件及其任何附件从阁下系统中予以删除。如阁下并非本邮件写明之收件人,敬请切勿使用、复制、披露本邮件或其任何内容,亦请切勿依本邮件或其任何内容而采取任何行动。电子邮件无法保证是一种安全和不会出现任何差错的通信方式,可能会被拦截、修改、丢失或损坏,收件人需自行负责做好安全检查。
[-- Attachment #2: gcc.patch --]
[-- Type: text/plain, Size: 9175 bytes --]
From 0eda8e538c7f7d4036d9decceb714acf3314f885 Mon Sep 17 00:00:00 2001
From: Zhijin Zeng <zhijin.zeng@spacemit.com>
Date: Thu, 31 Oct 2024 18:13:19 +0800
Subject: [PATCH] RISC-V: support vector math library for risc-v
Add risc-v vector function mangling rules as follow:
_ZGV<x>N<y>v_<func_name>
'x' is the LMUL, if the LMUL is 1/2/4/8 and 'x' is 1/2/4/8.
'y' is the count of elements also 'simdlen' in gcc.
'func_name' is the scalar function name.
gcc/ChangeLog:
* config/riscv/riscv.cc (INCLUDE_STRING):
(riscv_vector_type_p):
(supported_simd_type):
(lane_size):
(riscv_simd_clone_compute_vecsize_and_simdlen):
(riscv_simd_clone_adjust):
(riscv_simd_clone_usable):
(TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN):
(TARGET_SIMD_CLONE_ADJUST):
(TARGET_SIMD_CLONE_USABLE):
---
gcc/config/riscv/riscv.cc | 241 +++++++++++++++++++++++++++++++++++++-
1 file changed, 240 insertions(+), 1 deletion(-)
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 4f8e3ab931a..9b44d36b171 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -22,6 +22,7 @@ along with GCC; see the file COPYING3. If not see
#define IN_TARGET_CODE 1
#define INCLUDE_STRING
+#include <cmath>
#include "config.h"
#include "system.h"
#include "coretypes.h"
@@ -33,6 +34,7 @@ along with GCC; see the file COPYING3. If not see
#include "insn-config.h"
#include "insn-attr.h"
#include "recog.h"
+#include "cgraph.h"
#include "output.h"
#include "alias.h"
#include "tree.h"
@@ -5197,7 +5199,9 @@ riscv_vector_type_p (const_tree type)
{
/* Currently, only builtin scalabler vector type is allowed, in the future,
more vector types may be allowed, such as GNU vector type, etc. */
- return riscv_vector::builtin_type_p (type);
+ if (!type)
+ return false;
+ return riscv_vector::builtin_type_p (type) || VECTOR_TYPE_P (type);
}
static unsigned int
@@ -11099,6 +11103,231 @@ riscv_get_raw_result_mode (int regno)
return default_get_reg_raw_mode (regno);
}
+/* Return true for types that could be supported as SIMD return or
+ argument types. */
+
+static bool
+supported_simd_type (tree t)
+{
+ if (SCALAR_FLOAT_TYPE_P (t) || INTEGRAL_TYPE_P (t))
+ {
+ HOST_WIDE_INT s = tree_to_shwi (TYPE_SIZE_UNIT (t));
+ return s == 1 || s == 2 || s == 4 || s == 8;
+ }
+ return false;
+}
+
+static unsigned
+lane_size (cgraph_simd_clone_arg_type clone_arg_type, tree type)
+{
+ gcc_assert (clone_arg_type != SIMD_CLONE_ARG_TYPE_MASK);
+
+ if (INTEGRAL_TYPE_P (type)
+ || SCALAR_FLOAT_TYPE_P (type))
+ switch (TYPE_PRECISION (type) / BITS_PER_UNIT)
+ {
+ default:
+ break;
+ case 1:
+ case 2:
+ case 4:
+ case 8:
+ return TYPE_PRECISION (type);
+ }
+ gcc_unreachable ();
+}
+
+/* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN. */
+
+static int
+riscv_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
+ struct cgraph_simd_clone *clonei,
+ tree base_type ATTRIBUTE_UNUSED,
+ int num, bool explicit_p)
+{
+ tree t, ret_type;
+ unsigned int elt_bit = 0;
+ unsigned HOST_WIDE_INT const_simdlen;
+
+ if (!TARGET_VECTOR)
+ return 0;
+
+ if (maybe_ne (clonei->simdlen, 0U)
+ && clonei->simdlen.is_constant (&const_simdlen)
+ && (const_simdlen < 2
+ || const_simdlen > 1024
+ || (const_simdlen & (const_simdlen - 1)) != 0))
+ {
+ if (explicit_p)
+ warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+ "unsupported simdlen %wd", const_simdlen);
+ return 0;
+ }
+
+ ret_type = TREE_TYPE (TREE_TYPE (node->decl));
+ if (TREE_CODE (ret_type) != VOID_TYPE
+ && !supported_simd_type (ret_type))
+ {
+ if (!explicit_p)
+ ;
+ else if (COMPLEX_FLOAT_TYPE_P (ret_type))
+ warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+ "GCC does not currently support return type %qT "
+ "for simd", ret_type);
+ else
+ warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+ "unsupported return type %qT for simd",
+ ret_type);
+ return 0;
+ }
+
+ auto_vec<std::pair <tree, unsigned int>> vec_elts (clonei->nargs + 1);
+ if (TREE_CODE (ret_type) != VOID_TYPE)
+ {
+ elt_bit = lane_size (SIMD_CLONE_ARG_TYPE_VECTOR, ret_type);
+ vec_elts.safe_push (std::make_pair (ret_type, elt_bit));
+ }
+
+ int i;
+ tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
+ bool decl_arg_p = (node->definition || type_arg_types == NULL_TREE);
+ for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0;
+ t && t != void_list_node; t = TREE_CHAIN (t), i++)
+ {
+ tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
+ if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM
+ && !supported_simd_type (arg_type))
+ {
+ if (!explicit_p)
+ ;
+ else if (COMPLEX_FLOAT_TYPE_P (ret_type))
+ warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+ "GCC does not currently support argument type %qT "
+ "for simd", arg_type);
+ else
+ warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+ "unsupported argument type %qT for simd",
+ arg_type);
+ return 0;
+ }
+ unsigned lane_bits = lane_size (clonei->args[i].arg_type, arg_type);
+ if (clonei->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
+ vec_elts.safe_push (std::make_pair (arg_type, lane_bits));
+ if (!elt_bit)
+ elt_bit = lane_bits;
+ if (elt_bit != lane_bits)
+ return 0;
+ }
+
+ if (!elt_bit)
+ return 0;
+
+ clonei->vecsize_mangle = 'n';
+ clonei->mask_mode = VOIDmode;
+ poly_uint64 simdlen;
+ auto_vec<poly_uint64> simdlens (2);
+
+ clonei->vecsize_int = 0;
+ clonei->vecsize_float = 0;
+
+ if ((unsigned int)TARGET_MIN_VLEN <= elt_bit)
+ return 0;
+
+ /* Keep track of the possible simdlens the clones of this function can have,
+ and check them later to see if we support them. */
+ if (known_eq (clonei->simdlen, 0U))
+ {
+ if (TARGET_MAX_LMUL >= RVV_M1)
+ simdlens.safe_push (
+ exact_div (poly_uint64 (TARGET_MIN_VLEN * RVV_M1), elt_bit));
+ if (TARGET_MAX_LMUL >= RVV_M2)
+ simdlens.safe_push (
+ exact_div (poly_uint64 (TARGET_MIN_VLEN * RVV_M2), elt_bit));
+ if (TARGET_MAX_LMUL >= RVV_M4)
+ simdlens.safe_push (
+ exact_div (poly_uint64 (TARGET_MIN_VLEN * RVV_M4), elt_bit));
+ if (TARGET_MAX_LMUL >= RVV_M8)
+ simdlens.safe_push (
+ exact_div (poly_uint64 (TARGET_MIN_VLEN * RVV_M8), elt_bit));
+ }
+ else
+ simdlens.safe_push (clonei->simdlen);
+
+ unsigned j = 0;
+ while (j < simdlens.length ())
+ {
+ bool remove_simdlen = false;
+ for (auto elt : vec_elts)
+ if (known_gt (simdlens[j] * elt.second,
+ TARGET_MIN_VLEN * TARGET_MAX_LMUL))
+ {
+ /* Don't issue a warning for every simdclone when there is no
+ specific simdlen clause. */
+ if (explicit_p && maybe_ne (clonei->simdlen, 0U))
+ warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+ "GCC does not currently support simdlen %wd for "
+ "type %qT",
+ constant_lower_bound (simdlens[j]), elt.first);
+ remove_simdlen = true;
+ break;
+ }
+ if (remove_simdlen)
+ simdlens.ordered_remove (j);
+ else
+ j++;
+ }
+
+ int count = simdlens.length ();
+ if (count == 0)
+ {
+ if (explicit_p && known_eq (clonei->simdlen, 0U))
+ {
+ /* Warn the user if we can't generate any simdclone. */
+ //simdlen = exact_div (TARGET_MIN_VLEN * LMUL, elt_bit);
+ warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+ "GCC does not currently support a simdclone with simdlens"
+ " %wd and %wd for these types.",
+ constant_lower_bound (simdlen),
+ constant_lower_bound (simdlen*2));
+ }
+ return 0;
+ }
+
+ gcc_assert (num < count);
+ clonei->vecsize_mangle = std::exp2 (num) + '0';
+ clonei->simdlen = simdlens[num];
+ return count;
+}
+
+/* Implement TARGET_SIMD_CLONE_ADJUST. */
+
+static void
+riscv_simd_clone_adjust (struct cgraph_node *node)
+{
+ tree t = TREE_TYPE (node->decl);
+ TYPE_ATTRIBUTES (t) = make_attribute ("riscv_vector_cc", "default",
+ TYPE_ATTRIBUTES (t));
+}
+
+/* Implement TARGET_SIMD_CLONE_USABLE. */
+
+static int
+riscv_simd_clone_usable (struct cgraph_node *node)
+{
+ switch (node->simdclone->vecsize_mangle)
+ {
+ case '1':
+ case '2':
+ case '4':
+ case '8':
+ if (!TARGET_VECTOR)
+ return -1;
+ return 0;
+ default:
+ gcc_unreachable ();
+ }
+}
+
/* Initialize the GCC target structure. */
#undef TARGET_ASM_ALIGNED_HI_OP
#define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -11451,6 +11680,16 @@ riscv_get_raw_result_mode (int regno)
#undef TARGET_GET_RAW_RESULT_MODE
#define TARGET_GET_RAW_RESULT_MODE riscv_get_raw_result_mode
+#undef TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN
+#define TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN \
+ riscv_simd_clone_compute_vecsize_and_simdlen
+
+#undef TARGET_SIMD_CLONE_ADJUST
+#define TARGET_SIMD_CLONE_ADJUST riscv_simd_clone_adjust
+
+#undef TARGET_SIMD_CLONE_USABLE
+#define TARGET_SIMD_CLONE_USABLE riscv_simd_clone_usable
+
struct gcc_target targetm = TARGET_INITIALIZER;
#include "gt-riscv.h"
--
2.25.1
[-- Attachment #3: 0001-RISC-V-add-libmvec-support-for-RISC-V.patch --]
[-- Type: text/plain, Size: 820064 bytes --]
From 1100e2a219854981c3e374ea703fbd51baa8b432 Mon Sep 17 00:00:00 2001
From: Zhijin Zeng <zhijin.zeng@spacemit.com>
Date: Thu, 17 Oct 2024 15:42:44 +0800
Subject: [PATCH] RISC-V: add libmvec support for RISC-V
Add risc-v vector function mangling rules as follow:
_ZGV<x>N<y>v_<func_name>
'x' is the LMUL, if the LMUL is 1/2/4/8 and 'x' is 1/2/4/8.
'y' is the count of elements also 'simdlen' in gcc.
'func_name' is the scalar function name.
Now just add double version, the float version still need to
be added.
---
sysdeps/riscv/Versions | 459 +++++++++++++
sysdeps/riscv/configure | 4 +
sysdeps/riscv/configure.ac | 4 +
sysdeps/riscv/rv32/rvd/Implies | 1 +
sysdeps/riscv/rv64/rvd/Implies | 1 +
sysdeps/riscv/rvd/Makefile | 28 +
sysdeps/riscv/rvd/bits/math-vector.h | 147 ++++
sysdeps/riscv/rvd/rvvlm_2ovpi_tbl.c | 35 +
sysdeps/riscv/rvd/rvvlm_expD_tbl.c | 49 ++
sysdeps/riscv/rvd/rvvlm_logD_tbl.c | 149 ++++
sysdeps/riscv/rvd/rvvlm_powD_tbl.c | 277 ++++++++
sysdeps/riscv/rvd/v_d_acos.c | 238 +++++++
sysdeps/riscv/rvd/v_d_acosh.c | 152 ++++
sysdeps/riscv/rvd/v_d_acospi.c | 237 +++++++
sysdeps/riscv/rvd/v_d_asin.c | 224 ++++++
sysdeps/riscv/rvd/v_d_asinh.c | 160 +++++
sysdeps/riscv/rvd/v_d_asinpi.c | 221 ++++++
sysdeps/riscv/rvd/v_d_atan.c | 253 +++++++
sysdeps/riscv/rvd/v_d_atan2.c | 407 +++++++++++
sysdeps/riscv/rvd/v_d_atan2pi.c | 396 +++++++++++
sysdeps/riscv/rvd/v_d_atanh.c | 182 +++++
sysdeps/riscv/rvd/v_d_atanpi.c | 238 +++++++
sysdeps/riscv/rvd/v_d_cbrt.c | 191 ++++++
sysdeps/riscv/rvd/v_d_cdfnorm.c | 226 ++++++
sysdeps/riscv/rvd/v_d_cdfnorminv.c | 292 ++++++++
sysdeps/riscv/rvd/v_d_cos.c | 201 ++++++
sysdeps/riscv/rvd/v_d_cosh.c | 187 +++++
sysdeps/riscv/rvd/v_d_cospi.c | 182 +++++
sysdeps/riscv/rvd/v_d_erf.c | 269 ++++++++
sysdeps/riscv/rvd/v_d_erfc.c | 258 +++++++
sysdeps/riscv/rvd/v_d_erfcinv.c | 283 ++++++++
sysdeps/riscv/rvd/v_d_erfinv.c | 262 +++++++
sysdeps/riscv/rvd/v_d_exp.c | 153 +++++
sysdeps/riscv/rvd/v_d_exp10.c | 158 +++++
sysdeps/riscv/rvd/v_d_exp2.c | 153 +++++
sysdeps/riscv/rvd/v_d_expint1.c | 479 +++++++++++++
sysdeps/riscv/rvd/v_d_expm1.c | 197 ++++++
sysdeps/riscv/rvd/v_d_lgamma.c | 647 ++++++++++++++++++
sysdeps/riscv/rvd/v_d_log.c | 188 +++++
sysdeps/riscv/rvd/v_d_log10.c | 189 +++++
sysdeps/riscv/rvd/v_d_log2.c | 189 +++++
sysdeps/riscv/rvd/v_d_pow.c | 465 +++++++++++++
sysdeps/riscv/rvd/v_d_sin.c | 203 ++++++
sysdeps/riscv/rvd/v_d_sinh.c | 189 +++++
sysdeps/riscv/rvd/v_d_sinpi.c | 182 +++++
sysdeps/riscv/rvd/v_d_tan.c | 268 ++++++++
sysdeps/riscv/rvd/v_d_tanh.c | 205 ++++++
sysdeps/riscv/rvd/v_d_tanpi.c | 264 +++++++
sysdeps/riscv/rvd/v_d_tgamma.c | 515 ++++++++++++++
sysdeps/riscv/rvd/v_math.h | 27 +
sysdeps/riscv/rvd/veclibm/include/rvvlm.h | 538 +++++++++++++++
.../rvd/veclibm/include/rvvlm_errorfuncsD.h | 196 ++++++
.../riscv/rvd/veclibm/include/rvvlm_fp.inc.h | 273 ++++++++
.../riscv/rvd/veclibm/include/rvvlm_fp64m1.h | 26 +
.../riscv/rvd/veclibm/include/rvvlm_fp64m2.h | 26 +
.../riscv/rvd/veclibm/include/rvvlm_fp64m4.h | 26 +
.../rvd/veclibm/include/rvvlm_gammafuncsD.h | 48 ++
.../rvd/veclibm/include/rvvlm_hyperbolicsD.h | 88 +++
.../veclibm/include/rvvlm_inverrorfuncsD.h | 451 ++++++++++++
.../rvd/veclibm/include/rvvlm_invhyperD.h | 194 ++++++
.../riscv/rvd/veclibm/include/rvvlm_trigD.h | 297 ++++++++
sysdeps/unix/sysv/linux/riscv/libmvec.abilist | 455 ++++++++++++
62 files changed, 13502 insertions(+)
create mode 100644 sysdeps/riscv/Versions
create mode 100644 sysdeps/riscv/rvd/Makefile
create mode 100644 sysdeps/riscv/rvd/bits/math-vector.h
create mode 100644 sysdeps/riscv/rvd/rvvlm_2ovpi_tbl.c
create mode 100644 sysdeps/riscv/rvd/rvvlm_expD_tbl.c
create mode 100644 sysdeps/riscv/rvd/rvvlm_logD_tbl.c
create mode 100644 sysdeps/riscv/rvd/rvvlm_powD_tbl.c
create mode 100644 sysdeps/riscv/rvd/v_d_acos.c
create mode 100644 sysdeps/riscv/rvd/v_d_acosh.c
create mode 100644 sysdeps/riscv/rvd/v_d_acospi.c
create mode 100644 sysdeps/riscv/rvd/v_d_asin.c
create mode 100644 sysdeps/riscv/rvd/v_d_asinh.c
create mode 100644 sysdeps/riscv/rvd/v_d_asinpi.c
create mode 100644 sysdeps/riscv/rvd/v_d_atan.c
create mode 100644 sysdeps/riscv/rvd/v_d_atan2.c
create mode 100644 sysdeps/riscv/rvd/v_d_atan2pi.c
create mode 100644 sysdeps/riscv/rvd/v_d_atanh.c
create mode 100644 sysdeps/riscv/rvd/v_d_atanpi.c
create mode 100644 sysdeps/riscv/rvd/v_d_cbrt.c
create mode 100644 sysdeps/riscv/rvd/v_d_cdfnorm.c
create mode 100644 sysdeps/riscv/rvd/v_d_cdfnorminv.c
create mode 100644 sysdeps/riscv/rvd/v_d_cos.c
create mode 100644 sysdeps/riscv/rvd/v_d_cosh.c
create mode 100644 sysdeps/riscv/rvd/v_d_cospi.c
create mode 100644 sysdeps/riscv/rvd/v_d_erf.c
create mode 100644 sysdeps/riscv/rvd/v_d_erfc.c
create mode 100644 sysdeps/riscv/rvd/v_d_erfcinv.c
create mode 100644 sysdeps/riscv/rvd/v_d_erfinv.c
create mode 100644 sysdeps/riscv/rvd/v_d_exp.c
create mode 100644 sysdeps/riscv/rvd/v_d_exp10.c
create mode 100644 sysdeps/riscv/rvd/v_d_exp2.c
create mode 100644 sysdeps/riscv/rvd/v_d_expint1.c
create mode 100644 sysdeps/riscv/rvd/v_d_expm1.c
create mode 100644 sysdeps/riscv/rvd/v_d_lgamma.c
create mode 100644 sysdeps/riscv/rvd/v_d_log.c
create mode 100644 sysdeps/riscv/rvd/v_d_log10.c
create mode 100644 sysdeps/riscv/rvd/v_d_log2.c
create mode 100644 sysdeps/riscv/rvd/v_d_pow.c
create mode 100644 sysdeps/riscv/rvd/v_d_sin.c
create mode 100644 sysdeps/riscv/rvd/v_d_sinh.c
create mode 100644 sysdeps/riscv/rvd/v_d_sinpi.c
create mode 100644 sysdeps/riscv/rvd/v_d_tan.c
create mode 100644 sysdeps/riscv/rvd/v_d_tanh.c
create mode 100644 sysdeps/riscv/rvd/v_d_tanpi.c
create mode 100644 sysdeps/riscv/rvd/v_d_tgamma.c
create mode 100644 sysdeps/riscv/rvd/v_math.h
create mode 100644 sysdeps/riscv/rvd/veclibm/include/rvvlm.h
create mode 100644 sysdeps/riscv/rvd/veclibm/include/rvvlm_errorfuncsD.h
create mode 100644 sysdeps/riscv/rvd/veclibm/include/rvvlm_fp.inc.h
create mode 100644 sysdeps/riscv/rvd/veclibm/include/rvvlm_fp64m1.h
create mode 100644 sysdeps/riscv/rvd/veclibm/include/rvvlm_fp64m2.h
create mode 100644 sysdeps/riscv/rvd/veclibm/include/rvvlm_fp64m4.h
create mode 100644 sysdeps/riscv/rvd/veclibm/include/rvvlm_gammafuncsD.h
create mode 100644 sysdeps/riscv/rvd/veclibm/include/rvvlm_hyperbolicsD.h
create mode 100644 sysdeps/riscv/rvd/veclibm/include/rvvlm_inverrorfuncsD.h
create mode 100644 sysdeps/riscv/rvd/veclibm/include/rvvlm_invhyperD.h
create mode 100644 sysdeps/riscv/rvd/veclibm/include/rvvlm_trigD.h
create mode 100644 sysdeps/unix/sysv/linux/riscv/libmvec.abilist
diff --git a/sysdeps/riscv/Versions b/sysdeps/riscv/Versions
new file mode 100644
index 0000000000..926bc0d882
--- /dev/null
+++ b/sysdeps/riscv/Versions
@@ -0,0 +1,459 @@
+libmvec {
+ GLIBC_2.41 {
+ _ZGV1N2v_exp;
+ _ZGV1N4v_exp;
+ _ZGV2N2v_exp;
+ _ZGV2N4v_exp;
+ _ZGV2N8v_exp;
+ _ZGV4N4v_exp;
+ _ZGV4N8v_exp;
+ _ZGV4N16v_exp;
+ _ZGV8N8v_exp;
+ _ZGV8N16v_exp;
+ _ZGV8N32v_exp;
+
+ _ZGV1N2v_asin;
+ _ZGV1N4v_asin;
+ _ZGV2N2v_asin;
+ _ZGV2N4v_asin;
+ _ZGV2N8v_asin;
+ _ZGV4N4v_asin;
+ _ZGV4N8v_asin;
+ _ZGV4N16v_asin;
+ _ZGV8N8v_asin;
+ _ZGV8N16v_asin;
+ _ZGV8N32v_asin;
+
+ _ZGV1N2v_atan;
+ _ZGV1N4v_atan;
+ _ZGV2N2v_atan;
+ _ZGV2N4v_atan;
+ _ZGV2N8v_atan;
+ _ZGV4N4v_atan;
+ _ZGV4N8v_atan;
+ _ZGV4N16v_atan;
+ _ZGV8N8v_atan;
+ _ZGV8N16v_atan;
+ _ZGV8N32v_atan;
+
+ _ZGV1N2v_acos;
+ _ZGV1N4v_acos;
+ _ZGV2N2v_acos;
+ _ZGV2N4v_acos;
+ _ZGV2N8v_acos;
+ _ZGV4N4v_acos;
+ _ZGV4N8v_acos;
+ _ZGV4N16v_acos;
+ _ZGV8N8v_acos;
+ _ZGV8N16v_acos;
+ _ZGV8N32v_acos;
+
+ _ZGV1N2v_atanh;
+ _ZGV1N4v_atanh;
+ _ZGV2N2v_atanh;
+ _ZGV2N4v_atanh;
+ _ZGV2N8v_atanh;
+ _ZGV4N4v_atanh;
+ _ZGV4N8v_atanh;
+ _ZGV4N16v_atanh;
+ _ZGV8N8v_atanh;
+ _ZGV8N16v_atanh;
+ _ZGV8N32v_atanh;
+
+ _ZGV1N2v_exp10;
+ _ZGV1N4v_exp10;
+ _ZGV2N2v_exp10;
+ _ZGV2N4v_exp10;
+ _ZGV2N8v_exp10;
+ _ZGV4N4v_exp10;
+ _ZGV4N8v_exp10;
+ _ZGV4N16v_exp10;
+ _ZGV8N8v_exp10;
+ _ZGV8N16v_exp10;
+ _ZGV8N32v_exp10;
+
+ _ZGV1N2v_exp2;
+ _ZGV1N4v_exp2;
+ _ZGV2N2v_exp2;
+ _ZGV2N4v_exp2;
+ _ZGV2N8v_exp2;
+ _ZGV4N4v_exp2;
+ _ZGV4N8v_exp2;
+ _ZGV4N16v_exp2;
+ _ZGV8N8v_exp2;
+ _ZGV8N16v_exp2;
+ _ZGV8N32v_exp2;
+
+ _ZGV1N2v_tan;
+ _ZGV1N4v_tan;
+ _ZGV2N2v_tan;
+ _ZGV2N4v_tan;
+ _ZGV2N8v_tan;
+ _ZGV4N4v_tan;
+ _ZGV4N8v_tan;
+ _ZGV4N16v_tan;
+ _ZGV8N8v_tan;
+ _ZGV8N16v_tan;
+ _ZGV8N32v_tan;
+
+ _ZGV1N2v_tanh;
+ _ZGV1N4v_tanh;
+ _ZGV2N2v_tanh;
+ _ZGV2N4v_tanh;
+ _ZGV2N8v_tanh;
+ _ZGV4N4v_tanh;
+ _ZGV4N8v_tanh;
+ _ZGV4N16v_tanh;
+ _ZGV8N8v_tanh;
+ _ZGV8N16v_tanh;
+ _ZGV8N32v_tanh;
+
+ _ZGV1N2vv_pow;
+ _ZGV1N4vv_pow;
+ _ZGV2N2vv_pow;
+ _ZGV2N4vv_pow;
+ _ZGV2N8vv_pow;
+ _ZGV4N4vv_pow;
+ _ZGV4N8vv_pow;
+ _ZGV4N16vv_pow;
+ _ZGV8N8vv_pow;
+ _ZGV8N16vv_pow;
+ _ZGV8N32vv_pow;
+
+ _ZGV1N2v_sin;
+ _ZGV1N4v_sin;
+ _ZGV2N2v_sin;
+ _ZGV2N4v_sin;
+ _ZGV2N8v_sin;
+ _ZGV4N4v_sin;
+ _ZGV4N8v_sin;
+ _ZGV4N16v_sin;
+ _ZGV8N8v_sin;
+ _ZGV8N16v_sin;
+ _ZGV8N32v_sin;
+
+ _ZGV1N2v_log;
+ _ZGV1N4v_log;
+ _ZGV2N2v_log;
+ _ZGV2N4v_log;
+ _ZGV2N8v_log;
+ _ZGV4N4v_log;
+ _ZGV4N8v_log;
+ _ZGV4N16v_log;
+ _ZGV8N8v_log;
+ _ZGV8N16v_log;
+ _ZGV8N32v_log;
+
+ _ZGV1N2v_cos;
+ _ZGV1N4v_cos;
+ _ZGV2N2v_cos;
+ _ZGV2N4v_cos;
+ _ZGV2N8v_cos;
+ _ZGV4N4v_cos;
+ _ZGV4N8v_cos;
+ _ZGV4N16v_cos;
+ _ZGV8N8v_cos;
+ _ZGV8N16v_cos;
+ _ZGV8N32v_cos;
+
+ _ZGV1N2v_acosh;
+ _ZGV1N4v_acosh;
+ _ZGV2N2v_acosh;
+ _ZGV2N4v_acosh;
+ _ZGV2N8v_acosh;
+ _ZGV4N4v_acosh;
+ _ZGV4N8v_acosh;
+ _ZGV4N16v_acosh;
+ _ZGV8N8v_acosh;
+ _ZGV8N16v_acosh;
+ _ZGV8N32v_acosh;
+
+ _ZGV1N2v_acospi;
+ _ZGV1N4v_acospi;
+ _ZGV2N2v_acospi;
+ _ZGV2N4v_acospi;
+ _ZGV2N8v_acospi;
+ _ZGV4N4v_acospi;
+ _ZGV4N8v_acospi;
+ _ZGV4N16v_acospi;
+ _ZGV8N8v_acospi;
+ _ZGV8N16v_acospi;
+ _ZGV8N32v_acospi;
+
+ _ZGV1N2v_asinh;
+ _ZGV1N4v_asinh;
+ _ZGV2N2v_asinh;
+ _ZGV2N4v_asinh;
+ _ZGV2N8v_asinh;
+ _ZGV4N4v_asinh;
+ _ZGV4N8v_asinh;
+ _ZGV4N16v_asinh;
+ _ZGV8N8v_asinh;
+ _ZGV8N16v_asinh;
+ _ZGV8N32v_asinh;
+
+ _ZGV1N2v_asinpi;
+ _ZGV1N4v_asinpi;
+ _ZGV2N2v_asinpi;
+ _ZGV2N4v_asinpi;
+ _ZGV2N8v_asinpi;
+ _ZGV4N4v_asinpi;
+ _ZGV4N8v_asinpi;
+ _ZGV4N16v_asinpi;
+ _ZGV8N8v_asinpi;
+ _ZGV8N16v_asinpi;
+ _ZGV8N32v_asinpi;
+
+ _ZGV1N2vv_atan2;
+ _ZGV1N4vv_atan2;
+ _ZGV2N2vv_atan2;
+ _ZGV2N4vv_atan2;
+ _ZGV2N8vv_atan2;
+ _ZGV4N4vv_atan2;
+ _ZGV4N8vv_atan2;
+ _ZGV4N16vv_atan2;
+ _ZGV8N8vv_atan2;
+ _ZGV8N16vv_atan2;
+ _ZGV8N32vv_atan2;
+
+ _ZGV1N2vv_atan2pi;
+ _ZGV1N4vv_atan2pi;
+ _ZGV2N2vv_atan2pi;
+ _ZGV2N4vv_atan2pi;
+ _ZGV2N8vv_atan2pi;
+ _ZGV4N4vv_atan2pi;
+ _ZGV4N8vv_atan2pi;
+ _ZGV4N16vv_atan2pi;
+ _ZGV8N8vv_atan2pi;
+ _ZGV8N16vv_atan2pi;
+ _ZGV8N32vv_atan2pi;
+
+ _ZGV1N2v_atanpi;
+ _ZGV1N4v_atanpi;
+ _ZGV2N2v_atanpi;
+ _ZGV2N4v_atanpi;
+ _ZGV2N8v_atanpi;
+ _ZGV4N4v_atanpi;
+ _ZGV4N8v_atanpi;
+ _ZGV4N16v_atanpi;
+ _ZGV8N8v_atanpi;
+ _ZGV8N16v_atanpi;
+ _ZGV8N32v_atanpi;
+
+ _ZGV1N2v_expint1;
+ _ZGV1N4v_expint1;
+ _ZGV2N2v_expint1;
+ _ZGV2N4v_expint1;
+ _ZGV2N8v_expint1;
+ _ZGV4N4v_expint1;
+ _ZGV4N8v_expint1;
+ _ZGV4N16v_expint1;
+ _ZGV8N8v_expint1;
+ _ZGV8N16v_expint1;
+ _ZGV8N32v_expint1;
+
+ _ZGV1N2v_expm1;
+ _ZGV1N4v_expm1;
+ _ZGV2N2v_expm1;
+ _ZGV2N4v_expm1;
+ _ZGV2N8v_expm1;
+ _ZGV4N4v_expm1;
+ _ZGV4N8v_expm1;
+ _ZGV4N16v_expm1;
+ _ZGV8N8v_expm1;
+ _ZGV8N16v_expm1;
+ _ZGV8N32v_expm1;
+
+ _ZGV1N2v_cosh;
+ _ZGV1N4v_cosh;
+ _ZGV2N2v_cosh;
+ _ZGV2N4v_cosh;
+ _ZGV2N8v_cosh;
+ _ZGV4N4v_cosh;
+ _ZGV4N8v_cosh;
+ _ZGV4N16v_cosh;
+ _ZGV8N8v_cosh;
+ _ZGV8N16v_cosh;
+ _ZGV8N32v_cosh;
+
+ _ZGV1N2v_sinh;
+ _ZGV1N4v_sinh;
+ _ZGV2N2v_sinh;
+ _ZGV2N4v_sinh;
+ _ZGV2N8v_sinh;
+ _ZGV4N4v_sinh;
+ _ZGV4N8v_sinh;
+ _ZGV4N16v_sinh;
+ _ZGV8N8v_sinh;
+ _ZGV8N16v_sinh;
+ _ZGV8N32v_sinh;
+
+ _ZGV1N2v_sinpi;
+ _ZGV1N4v_sinpi;
+ _ZGV2N2v_sinpi;
+ _ZGV2N4v_sinpi;
+ _ZGV2N8v_sinpi;
+ _ZGV4N4v_sinpi;
+ _ZGV4N8v_sinpi;
+ _ZGV4N16v_sinpi;
+ _ZGV8N8v_sinpi;
+ _ZGV8N16v_sinpi;
+ _ZGV8N32v_sinpi;
+
+ _ZGV1N2v_cospi;
+ _ZGV1N4v_cospi;
+ _ZGV2N2v_cospi;
+ _ZGV2N4v_cospi;
+ _ZGV2N8v_cospi;
+ _ZGV4N4v_cospi;
+ _ZGV4N8v_cospi;
+ _ZGV4N16v_cospi;
+ _ZGV8N8v_cospi;
+ _ZGV8N16v_cospi;
+ _ZGV8N32v_cospi;
+
+ _ZGV1N2v_tanpi;
+ _ZGV1N4v_tanpi;
+ _ZGV2N2v_tanpi;
+ _ZGV2N4v_tanpi;
+ _ZGV2N8v_tanpi;
+ _ZGV4N4v_tanpi;
+ _ZGV4N8v_tanpi;
+ _ZGV4N16v_tanpi;
+ _ZGV8N8v_tanpi;
+ _ZGV8N16v_tanpi;
+ _ZGV8N32v_tanpi;
+
+ _ZGV1N2v_tgamma;
+ _ZGV1N4v_tgamma;
+ _ZGV2N2v_tgamma;
+ _ZGV2N4v_tgamma;
+ _ZGV2N8v_tgamma;
+ _ZGV4N4v_tgamma;
+ _ZGV4N8v_tgamma;
+ _ZGV4N16v_tgamma;
+ _ZGV8N8v_tgamma;
+ _ZGV8N16v_tgamma;
+ _ZGV8N32v_tgamma;
+
+ _ZGV1N2v_lgamma;
+ _ZGV1N4v_lgamma;
+ _ZGV2N2v_lgamma;
+ _ZGV2N4v_lgamma;
+ _ZGV2N8v_lgamma;
+ _ZGV4N4v_lgamma;
+ _ZGV4N8v_lgamma;
+ _ZGV4N16v_lgamma;
+ _ZGV8N8v_lgamma;
+ _ZGV8N16v_lgamma;
+ _ZGV8N32v_lgamma;
+
+ _ZGV1N2v_log2;
+ _ZGV1N4v_log2;
+ _ZGV2N2v_log2;
+ _ZGV2N4v_log2;
+ _ZGV2N8v_log2;
+ _ZGV4N4v_log2;
+ _ZGV4N8v_log2;
+ _ZGV4N16v_log2;
+ _ZGV8N8v_log2;
+ _ZGV8N16v_log2;
+ _ZGV8N32v_log2;
+
+ _ZGV1N2v_log10;
+ _ZGV1N4v_log10;
+ _ZGV2N2v_log10;
+ _ZGV2N4v_log10;
+ _ZGV2N8v_log10;
+ _ZGV4N4v_log10;
+ _ZGV4N8v_log10;
+ _ZGV4N16v_log10;
+ _ZGV8N8v_log10;
+ _ZGV8N16v_log10;
+ _ZGV8N32v_log10;
+
+ _ZGV1N2v_cbrt;
+ _ZGV1N4v_cbrt;
+ _ZGV2N2v_cbrt;
+ _ZGV2N4v_cbrt;
+ _ZGV2N8v_cbrt;
+ _ZGV4N4v_cbrt;
+ _ZGV4N8v_cbrt;
+ _ZGV4N16v_cbrt;
+ _ZGV8N8v_cbrt;
+ _ZGV8N16v_cbrt;
+ _ZGV8N32v_cbrt;
+
+ _ZGV1N2v_cdfnorm;
+ _ZGV1N4v_cdfnorm;
+ _ZGV2N2v_cdfnorm;
+ _ZGV2N4v_cdfnorm;
+ _ZGV2N8v_cdfnorm;
+ _ZGV4N4v_cdfnorm;
+ _ZGV4N8v_cdfnorm;
+ _ZGV4N16v_cdfnorm;
+ _ZGV8N8v_cdfnorm;
+ _ZGV8N16v_cdfnorm;
+ _ZGV8N32v_cdfnorm;
+
+ _ZGV1N2v_erfc;
+ _ZGV1N4v_erfc;
+ _ZGV2N2v_erfc;
+ _ZGV2N4v_erfc;
+ _ZGV2N8v_erfc;
+ _ZGV4N4v_erfc;
+ _ZGV4N8v_erfc;
+ _ZGV4N16v_erfc;
+ _ZGV8N8v_erfc;
+ _ZGV8N16v_erfc;
+ _ZGV8N32v_erfc;
+
+ _ZGV1N2v_cdfnorminv;
+ _ZGV1N4v_cdfnorminv;
+ _ZGV2N2v_cdfnorminv;
+ _ZGV2N4v_cdfnorminv;
+ _ZGV2N8v_cdfnorminv;
+ _ZGV4N4v_cdfnorminv;
+ _ZGV4N8v_cdfnorminv;
+ _ZGV4N16v_cdfnorminv;
+ _ZGV8N8v_cdfnorminv;
+ _ZGV8N16v_cdfnorminv;
+ _ZGV8N32v_cdfnorminv;
+
+ _ZGV1N2v_erf;
+ _ZGV1N4v_erf;
+ _ZGV2N2v_erf;
+ _ZGV2N4v_erf;
+ _ZGV2N8v_erf;
+ _ZGV4N4v_erf;
+ _ZGV4N8v_erf;
+ _ZGV4N16v_erf;
+ _ZGV8N8v_erf;
+ _ZGV8N16v_erf;
+ _ZGV8N32v_erf;
+
+ _ZGV1N2v_erfcinv;
+ _ZGV1N4v_erfcinv;
+ _ZGV2N2v_erfcinv;
+ _ZGV2N4v_erfcinv;
+ _ZGV2N8v_erfcinv;
+ _ZGV4N4v_erfcinv;
+ _ZGV4N8v_erfcinv;
+ _ZGV4N16v_erfcinv;
+ _ZGV8N8v_erfcinv;
+ _ZGV8N16v_erfcinv;
+ _ZGV8N32v_erfcinv;
+
+ _ZGV1N2v_erfinv;
+ _ZGV1N4v_erfinv;
+ _ZGV2N2v_erfinv;
+ _ZGV2N4v_erfinv;
+ _ZGV2N8v_erfinv;
+ _ZGV4N4v_erfinv;
+ _ZGV4N8v_erfinv;
+ _ZGV4N16v_erfinv;
+ _ZGV8N8v_erfinv;
+ _ZGV8N16v_erfinv;
+ _ZGV8N32v_erfinv;
+ }
+}
diff --git a/sysdeps/riscv/configure b/sysdeps/riscv/configure
index 3ae4ae3bdb..aeb6e0a7d9 100644
--- a/sysdeps/riscv/configure
+++ b/sysdeps/riscv/configure
@@ -83,3 +83,7 @@ if test "$libc_cv_static_pie_on_riscv" = yes; then
fi
+if test x"$build_mathvec" = xnotset; then
+ build_mathvec=yes
+fi
+
diff --git a/sysdeps/riscv/configure.ac b/sysdeps/riscv/configure.ac
index ee3d1ed014..b1c1105baa 100644
--- a/sysdeps/riscv/configure.ac
+++ b/sysdeps/riscv/configure.ac
@@ -43,3 +43,7 @@ EOF
if test "$libc_cv_static_pie_on_riscv" = yes; then
AC_DEFINE(SUPPORT_STATIC_PIE)
fi
+
+if test x"$build_mathvec" = xnotset; then
+ build_mathvec=yes
+fi
diff --git a/sysdeps/riscv/rv32/rvd/Implies b/sysdeps/riscv/rv32/rvd/Implies
index 1151214e8f..af5a3f1411 100644
--- a/sysdeps/riscv/rv32/rvd/Implies
+++ b/sysdeps/riscv/rv32/rvd/Implies
@@ -1,3 +1,4 @@
riscv/rv32/rvf
riscv/rvd
riscv/rvf
+riscv/rvd/veclibm
diff --git a/sysdeps/riscv/rv64/rvd/Implies b/sysdeps/riscv/rv64/rvd/Implies
index 42fb132d12..061633b3a9 100644
--- a/sysdeps/riscv/rv64/rvd/Implies
+++ b/sysdeps/riscv/rv64/rvd/Implies
@@ -1,3 +1,4 @@
riscv/rv64/rvf
riscv/rvd
riscv/rvf
+riscv/rvd/veclibm
diff --git a/sysdeps/riscv/rvd/Makefile b/sysdeps/riscv/rvd/Makefile
new file mode 100644
index 0000000000..f93f0b9506
--- /dev/null
+++ b/sysdeps/riscv/rvd/Makefile
@@ -0,0 +1,28 @@
+libmvec-veclibm-funcs += rvvlm_2ovpi_tbl rvvlm_powD_tbl rvvlm_expD_tbl rvvlm_logD_tbl
+
+libmvec-support-funcs += \
+v_d_exp v_d_asin v_d_atan v_d_acos v_d_atanh v_d_exp10 v_d_exp2 v_d_tan v_d_tanh v_d_pow v_d_sin v_d_log v_d_cos \
+v_d_acosh v_d_acospi v_d_asinh v_d_asinpi v_d_atan2 v_d_atan2pi v_d_atanpi v_d_expint1 v_d_expm1 v_d_cosh v_d_sinh \
+v_d_sinpi v_d_cospi v_d_tanpi v_d_tgamma v_d_lgamma v_d_log2 v_d_log10 v_d_cbrt v_d_cdfnorm v_d_erfc v_d_cdfnorminv \
+v_d_erf v_d_erfcinv v_d_erfinv
+
+
+ifeq ($(subdir),mathvec)
+libmvec-support += $(libmvec-veclibm-funcs) $(libmvec-support-funcs)
+endif
+
+define riscv64-vector-cflags-template
+CFLAGS-$(1).c += -march=rv64gcv -Wno-maybe-uninitialized -Wno-undef
+endef
+
+define riscv32-vector-cflags-template
+CFLAGS-$(1).c += -march=rv32gcv -Wno-maybe-uninitialized -Wno-undef
+endef
+
+ifeq ($(config-machine),riscv64)
+$(foreach f,$(libmvec-support), $(eval $(call riscv64-vector-cflags-template,$(f))))
+endif
+
+ifeq ($(config-machine),riscv32)
+$(foreach f,$(libmvec-support), $(eval $(call riscv32-vector-cflags-template,$(f))))
+endif
diff --git a/sysdeps/riscv/rvd/bits/math-vector.h b/sysdeps/riscv/rvd/bits/math-vector.h
new file mode 100644
index 0000000000..92bffe6495
--- /dev/null
+++ b/sysdeps/riscv/rvd/bits/math-vector.h
@@ -0,0 +1,147 @@
+/* Platform-specific SIMD declarations of math functions.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#ifndef _MATH_H
+#error "Never include <bits/math-vector.h> directly;\
+ include <math.h> instead."
+#endif
+
+#include <bits/libm-simd-decl-stubs.h>
+
+#if defined __riscv_xlen && defined __FAST_MATH__
+#if defined _OPENMP && _OPENMP >= 201307
+/* OpenMP case. */
+#define __DECL_SIMD_riscv _Pragma ("omp declare simd notinbranch")
+#elif __GNUC_PREREQ(6, 0)
+/* W/o OpenMP use GCC 6.* __attribute__ ((__simd__)). */
+#define __DECL_SIMD_riscv __attribute__ ((__simd__ ("notinbranch")))
+#endif
+
+#ifdef __DECL_SIMD_riscv
+#undef __DECL_SIMD_cos
+#define __DECL_SIMD_cos __DECL_SIMD_riscv
+#undef __DECL_SIMD_cosf
+#define __DECL_SIMD_cosf
+#undef __DECL_SIMD_sin
+#define __DECL_SIMD_sin __DECL_SIMD_riscv
+#undef __DECL_SIMD_sinf
+#define __DECL_SIMD_sinf
+#undef __DECL_SIMD_sincos
+#define __DECL_SIMD_sincos
+#undef __DECL_SIMD_sincosf
+#define __DECL_SIMD_sincosf
+#undef __DECL_SIMD_log
+#define __DECL_SIMD_log __DECL_SIMD_riscv
+#undef __DECL_SIMD_logf
+#define __DECL_SIMD_logf
+#undef __DECL_SIMD_exp
+#define __DECL_SIMD_exp __DECL_SIMD_riscv
+#undef __DECL_SIMD_expf
+#define __DECL_SIMD_expf
+#undef __DECL_SIMD_pow
+#define __DECL_SIMD_pow __DECL_SIMD_riscv
+#undef __DECL_SIMD_powf
+#define __DECL_SIMD_powf
+#undef __DECL_SIMD_acos
+#define __DECL_SIMD_acos __DECL_SIMD_riscv
+#undef __DECL_SIMD_acosf
+#define __DECL_SIMD_acosf
+#undef __DECL_SIMD_atan
+#define __DECL_SIMD_atan __DECL_SIMD_riscv
+#undef __DECL_SIMD_atanf
+#define __DECL_SIMD_atanf
+#undef __DECL_SIMD_asin
+#define __DECL_SIMD_asin __DECL_SIMD_riscv
+#undef __DECL_SIMD_asinf
+#define __DECL_SIMD_asinf
+#undef __DECL_SIMD_hypot
+#define __DECL_SIMD_hypot
+#undef __DECL_SIMD_hypotf
+#define __DECL_SIMD_hypotf
+#undef __DECL_SIMD_exp2
+#define __DECL_SIMD_exp2 __DECL_SIMD_riscv
+#undef __DECL_SIMD_exp2f
+#define __DECL_SIMD_exp2f
+#undef __DECL_SIMD_exp10
+#define __DECL_SIMD_exp10 __DECL_SIMD_riscv
+#undef __DECL_SIMD_exp10f
+#define __DECL_SIMD_exp10f
+#undef __DECL_SIMD_cosh
+#define __DECL_SIMD_cosh __DECL_SIMD_riscv
+#undef __DECL_SIMD_coshf
+#define __DECL_SIMD_coshf
+#undef __DECL_SIMD_expm1
+#define __DECL_SIMD_expm1 __DECL_SIMD_riscv
+#undef __DECL_SIMD_expm1f
+#define __DECL_SIMD_expm1f
+#undef __DECL_SIMD_sinh
+#define __DECL_SIMD_sinh __DECL_SIMD_riscv
+#undef __DECL_SIMD_sinhf
+#define __DECL_SIMD_sinhf
+#undef __DECL_SIMD_cbrt
+#define __DECL_SIMD_cbrt __DECL_SIMD_riscv
+#undef __DECL_SIMD_cbrtf
+#define __DECL_SIMD_cbrtf
+#undef __DECL_SIMD_atan2
+#define __DECL_SIMD_atan2 __DECL_SIMD_riscv
+#undef __DECL_SIMD_atan2f
+#define __DECL_SIMD_atan2f
+#undef __DECL_SIMD_log10
+#define __DECL_SIMD_log10 __DECL_SIMD_riscv
+#undef __DECL_SIMD_log10f
+#define __DECL_SIMD_log10f
+#undef __DECL_SIMD_log2
+#define __DECL_SIMD_log2 __DECL_SIMD_riscv
+#undef __DECL_SIMD_log2f
+#define __DECL_SIMD_log2f
+#undef __DECL_SIMD_log1p
+#define __DECL_SIMD_log1p
+#undef __DECL_SIMD_log1pf
+#define __DECL_SIMD_log1pf
+#undef __DECL_SIMD_atanh
+#define __DECL_SIMD_atanh __DECL_SIMD_riscv
+#undef __DECL_SIMD_atanhf
+#define __DECL_SIMD_atanhf
+#undef __DECL_SIMD_acosh
+#define __DECL_SIMD_acosh __DECL_SIMD_riscv
+#undef __DECL_SIMD_acoshf
+#define __DECL_SIMD_acoshf
+#undef __DECL_SIMD_erf
+#define __DECL_SIMD_erf __DECL_SIMD_riscv
+#undef __DECL_SIMD_erff
+#define __DECL_SIMD_erff
+#undef __DECL_SIMD_tanh
+#define __DECL_SIMD_tanh __DECL_SIMD_riscv
+#undef __DECL_SIMD_tanhf
+#define __DECL_SIMD_tanhf
+#undef __DECL_SIMD_asinh
+#define __DECL_SIMD_asinh __DECL_SIMD_riscv
+#undef __DECL_SIMD_asinhf
+#define __DECL_SIMD_asinhf
+#undef __DECL_SIMD_erfc
+#define __DECL_SIMD_erfc __DECL_SIMD_riscv
+#undef __DECL_SIMD_erfcf
+#define __DECL_SIMD_erfcf
+#undef __DECL_SIMD_tan
+#define __DECL_SIMD_tan __DECL_SIMD_riscv
+#undef __DECL_SIMD_tanf
+#define __DECL_SIMD_tanf
+
+#endif
+#endif
diff --git a/sysdeps/riscv/rvd/rvvlm_2ovpi_tbl.c b/sysdeps/riscv/rvd/rvvlm_2ovpi_tbl.c
new file mode 100644
index 0000000000..789aff8b80
--- /dev/null
+++ b/sysdeps/riscv/rvd/rvvlm_2ovpi_tbl.c
@@ -0,0 +1,35 @@
+/*
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+// This table is used my different functions of the log and exponential family
+#include <stdint.h>
+
+// This is 2^500 * (2/pi) and the lsb of dbl_2ovpi_tbl[j] is 2^(500-(j+1)*52),
+// j=0,1,...,27
+const double dbl_2ovpi_tbl[28] = {
+ 0x1.45f306dc9c882p+499, 0x1.4a7f09d5f47d4p+446, 0x1.a6ee06db14ad0p+393,
+ -0x1.b0ef1bef806bcp+342, 0x1.8eaf7aef1586cp+290, 0x1.c91b8e909374cp+238,
+ -0x1.ff9b6d115f630p+184, 0x1.921cfe1deb1d0p+132, -0x1.3b5963045df74p+82,
+ 0x1.7d4baed1213a8p+30, -0x1.8e3f652e82070p-22, 0x1.3991d63983530p-76,
+ 0x1.cfa4e422fc5e0p-127, -0x1.036be27003b40p-179, -0x1.0fd33f8086800p-239,
+ -0x1.dce94beb25c20p-285, 0x1.b4d9fb3c9f2c4p-334, -0x1.922c2e7026588p-386,
+ 0x1.7fa8b5d49eeb0p-438, 0x1.faf97c5ecf41cp-490, 0x1.cfbc529497538p-543,
+ -0x1.012813b81ca8cp-594, 0x1.0ac06608df900p-649, -0x1.251503cc10f7cp-698,
+ -0x1.942f27895871cp-750, 0x1.615ee61b08660p-804, -0x1.99ea83ad7e5f0p-854,
+ 0x1.1bffb1009ae60p-909
+};
diff --git a/sysdeps/riscv/rvd/rvvlm_expD_tbl.c b/sysdeps/riscv/rvd/rvvlm_expD_tbl.c
new file mode 100644
index 0000000000..03c8b767ba
--- /dev/null
+++ b/sysdeps/riscv/rvd/rvvlm_expD_tbl.c
@@ -0,0 +1,49 @@
+/*
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+// This table is used my different functions of the exp and exponential family
+#include <stdint.h>
+
+// The following contains exp(j/64) for j = 0, 1, ..., 63
+// We need these values to more than FP64 precision. We do this
+// by exploiting fixed-point computation supported by RISC-V
+// This table contains round-to-int(2^62 * exp(j/64))
+const int64_t expD_tbl64_fixedpt[64] = {
+ 0x4000000000000000, 0x40b268f9de0183ba, 0x4166c34c5615d0ec,
+ 0x421d1461d66f2023, 0x42d561b3e6243d8a, 0x438fb0cb4f468808,
+ 0x444c0740496d4294, 0x450a6abaa4b77ecd, 0x45cae0f1f545eb73,
+ 0x468d6fadbf2dd4f3, 0x47521cc5a2e6a9e0, 0x4818ee218a3358ee,
+ 0x48e1e9b9d588e19b, 0x49ad159789f37496, 0x4a7a77d47f7b84b1,
+ 0x4b4a169b900c2d00, 0x4c1bf828c6dc54b8, 0x4cf022c9905bfd32,
+ 0x4dc69cdceaa72a9c, 0x4e9f6cd3967fdba8, 0x4f7a993048d088d7,
+ 0x50582887dcb8a7e1, 0x513821818624b40c, 0x521a8ad704f3404f,
+ 0x52ff6b54d8a89c75, 0x53e6c9da74b29ab5, 0x54d0ad5a753e077c,
+ 0x55bd1cdad49f699c, 0x56ac1f752150a563, 0x579dbc56b48521ba,
+ 0x5891fac0e95612c8, 0x5988e20954889245, 0x5a827999fcef3242,
+ 0x5b7ec8f19468bbc9, 0x5c7dd7a3b17dcf75, 0x5d7fad59099f22fe,
+ 0x5e8451cfac061b5f, 0x5f8bccdb3d398841, 0x6096266533384a2b,
+ 0x61a3666d124bb204, 0x62b39508aa836d6f, 0x63c6ba6455dcd8ae,
+ 0x64dcdec3371793d1, 0x65f60a7f79393e2e, 0x6712460a8fc24072,
+ 0x683199ed779592ca, 0x69540ec8f895722d, 0x6a79ad55e7f6fd10,
+ 0x6ba27e656b4eb57a, 0x6cce8ae13c57ebdb, 0x6dfddbcbed791bab,
+ 0x6f307a412f074892, 0x70666f76154a7089, 0x719fc4b95f452d29,
+ 0x72dc8373be41a454, 0x741cb5281e25ee34, 0x75606373ee921c97,
+ 0x76a7980f6cca15c2, 0x77f25ccdee6d7ae6, 0x7940bb9e2cffd89d,
+ 0x7a92be8a92436616, 0x7be86fb985689ddc, 0x7d41d96db915019d,
+ 0x7e9f06067a4360ba,
+};
diff --git a/sysdeps/riscv/rvd/rvvlm_logD_tbl.c b/sysdeps/riscv/rvd/rvvlm_logD_tbl.c
new file mode 100644
index 0000000000..f218084c73
--- /dev/null
+++ b/sysdeps/riscv/rvd/rvvlm_logD_tbl.c
@@ -0,0 +1,149 @@
+/*
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+// This table is used my different functions of the log and exponential family
+#include <stdint.h>
+
+const int64_t logD_tbl128_fixedpt[128] = { 0x0,
+ 0x22d443c414148a1,
+ 0x3a475f892273f13,
+ 0x51ea81cd5dc13cb,
+ 0x69be70ddf74c6a8,
+ 0x81c3f7de5434ed0,
+ 0x8dd9953002a4e86,
+ 0xa62b07f3457c407,
+ 0xbeb024b67dda634,
+ 0xd769c8d5b33a728,
+ 0xe3da945b878e27d,
+ 0xfce4aee0e88b275,
+ 0x1162593186da6fc4,
+ 0x122dadc2ab3496d3,
+ 0x13c6fb650cde50a1,
+ 0x1563dc29ffacb20d,
+ 0x1633a8bf437ce10b,
+ 0x17d60496cfbb4c67,
+ 0x18a8980abfbd3266,
+ 0x1a5094b54d282840,
+ 0x1b2602497d53458d,
+ 0x1cd3c712d3110932,
+ 0x1dac22d3e441d2fe,
+ 0x1f5fd8a9063e3491,
+ 0x203b3779f4c3a8bb,
+ 0x21f509008966a211,
+ 0x22d380a6c7e2b0e4,
+ 0x23b30593aa4e106c,
+ 0x2575418697c3d7e0,
+ 0x2657fdc6e1dcd0cb,
+ 0x273bd1c2ab3edefe,
+ 0x2906cbcd2baf2d54,
+ 0x29edf7659d8b30f2,
+ 0x2ad645cd6af1c939,
+ 0x2bbfb9e3dd5c1c88,
+ 0x2d961ed0cb91d407,
+ 0x2e83159d77e31d6d,
+ 0x2f713e059e555a64,
+ 0x30609b21823fa654,
+ 0x315130157f7a64cd,
+ 0x33360e552d8d64de,
+ 0x342a5e28530367af,
+ 0x351ff2e30214bc30,
+ 0x3616cfe9e8d01fea,
+ 0x370ef8af6360dfe0,
+ 0x380870b3c5fb66f7,
+ 0x39033b85a8bfc871,
+ 0x39ff5cc235a256c5,
+ 0x3afcd815786af188,
+ 0x3bfbb13ab0dc5614,
+ 0x3cfbebfca715669e,
+ 0x3dfd8c36023f0ab7,
+ 0x3f0095d1a19a0332,
+ 0x40050ccaf800ca8c,
+ 0x410af52e69f26264,
+ 0x42125319ae3bbf06,
+ 0x431b2abc31565be7,
+ 0x442580577b936763,
+ 0x4531583f9a2be204,
+ 0x463eb6db8b4f066d,
+ 0x474da0a5ad495303,
+ 0x485e1a2c30df9ea9,
+ 0x497028118efabeb8,
+ 0x4a83cf0d01c16e3d,
+ -0x3466ec14fec0a13b,
+ -0x335004723c465e69,
+ -0x323775123e2e1169,
+ -0x323775123e2e1169,
+ -0x311d38e5c1644b49,
+ -0x30014ac62c38a865,
+ -0x2ee3a574fdf677c9,
+ -0x2dc4439b3a19bcaf,
+ -0x2ca31fc8cef74dca,
+ -0x2ca31fc8cef74dca,
+ -0x2b803473f7ad0f3f,
+ -0x2a5b7bf8992d66fc,
+ -0x2934f0979a3715fd,
+ -0x280c8c76360892eb,
+ -0x280c8c76360892eb,
+ -0x26e2499d499bd9b3,
+ -0x25b621f89b355ede,
+ -0x24880f561c0e7305,
+ -0x24880f561c0e7305,
+ -0x23580b6523e0e0a5,
+ -0x22260fb5a616eb96,
+ -0x20f215b7606012de,
+ -0x20f215b7606012de,
+ -0x1fbc16b902680a24,
+ -0x1e840be74e6a4cc8,
+ -0x1e840be74e6a4cc8,
+ -0x1d49ee4c32596fc9,
+ -0x1c0db6cdd94dee41,
+ -0x1c0db6cdd94dee41,
+ -0x1acf5e2db4ec93f0,
+ -0x198edd077e70df03,
+ -0x198edd077e70df03,
+ -0x184c2bd02f03b2fe,
+ -0x170742d4ef027f2a,
+ -0x170742d4ef027f2a,
+ -0x15c01a39fbd687a0,
+ -0x1476a9f983f74d31,
+ -0x1476a9f983f74d31,
+ -0x132ae9e278ae1a1f,
+ -0x132ae9e278ae1a1f,
+ -0x11dcd197552b7b5f,
+ -0x108c588cda79e396,
+ -0x108c588cda79e396,
+ -0xf397608bfd2d90e,
+ -0xf397608bfd2d90e,
+ -0xde4212056d5dd32,
+ -0xc8c50b72319ad57,
+ -0xc8c50b72319ad57,
+ -0xb31fb7d64898b3e,
+ -0xb31fb7d64898b3e,
+ -0x9d517ee93f8e16c,
+ -0x9d517ee93f8e16c,
+ -0x8759c4fd14fcd5a,
+ -0x7137eae42aad7bd,
+ -0x7137eae42aad7bd,
+ -0x5aeb4dd63bf61cc,
+ -0x5aeb4dd63bf61cc,
+ -0x447347544cd04bb,
+ -0x447347544cd04bb,
+ -0x2dcf2d0b85a4531,
+ -0x2dcf2d0b85a4531,
+ -0x16fe50b6ef08518,
+ -0x16fe50b6ef08518,
+ 0x0 };
diff --git a/sysdeps/riscv/rvd/rvvlm_powD_tbl.c b/sysdeps/riscv/rvd/rvvlm_powD_tbl.c
new file mode 100644
index 0000000000..85f321a1ba
--- /dev/null
+++ b/sysdeps/riscv/rvd/rvvlm_powD_tbl.c
@@ -0,0 +1,277 @@
+/*
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+// This table is used my different functions of the log and exponential family
+#include <stdint.h>
+
+const double logtbl_4_powD_128_hi_lo[256] = { 0x0.0p+0,
+ 0x0.0p+0,
+ 0x1.16a21e20c0000p-6,
+ -0x1.f5baf436cbec7p-42,
+ 0x1.d23afc4900000p-6,
+ 0x1.39f89bcdae7bdp-42,
+ 0x1.47aa073580000p-5,
+ -0x1.1f61a96b8ce77p-42,
+ 0x1.a6f9c377e0000p-5,
+ -0x1.672b0c88d4dd6p-44,
+ 0x1.0387efbcb0000p-4,
+ -0x1.e5897de9078d1p-42,
+ 0x1.1bb32a6000000p-4,
+ 0x1.52743318a8a57p-42,
+ 0x1.4c560fe690000p-4,
+ -0x1.41dfc7d7c3321p-42,
+ 0x1.7d60496d00000p-4,
+ -0x1.12ce6312ebb82p-42,
+ 0x1.aed391ab60000p-4,
+ 0x1.9d3940238de7fp-42,
+ 0x1.c7b528b710000p-4,
+ -0x1.c760bc9b188c4p-45,
+ 0x1.f9c95dc1d0000p-4,
+ 0x1.164e932b2d51cp-44,
+ 0x1.1625931870000p-3,
+ -0x1.2c81df0fdbc29p-42,
+ 0x1.22dadc2ab0000p-3,
+ 0x1.a4b69691d7994p-42,
+ 0x1.3c6fb650d0000p-3,
+ -0x1.0d7af4dda9c36p-42,
+ 0x1.563dc29ff8000p-3,
+ 0x1.6590643906f2ap-42,
+ 0x1.633a8bf438000p-3,
+ -0x1.8f7aac147fdc1p-46,
+ 0x1.7d60496cf8000p-3,
+ 0x1.da6339da288fcp-42,
+ 0x1.8a8980abf8000p-3,
+ 0x1.e9933354dbf17p-42,
+ 0x1.a5094b54d0000p-3,
+ 0x1.41420276dd59dp-42,
+ 0x1.b2602497d8000p-3,
+ -0x1.65d3990cb67bap-42,
+ 0x1.cd3c712d30000p-3,
+ 0x1.109325dd5e814p-43,
+ 0x1.dac22d3e48000p-3,
+ -0x1.f1680dd458fb2p-42,
+ 0x1.f5fd8a9060000p-3,
+ 0x1.f1a4847f7b278p-42,
+ 0x1.01d9bbcfa8000p-2,
+ -0x1.e2ba25d9aeffdp-42,
+ 0x1.0fa848044c000p-2,
+ -0x1.95def21f8497bp-43,
+ 0x1.169c053640000p-2,
+ -0x1.d4f1b95e0ff45p-43,
+ 0x1.1d982c9d54000p-2,
+ -0x1.8f7ca2cff7b90p-42,
+ 0x1.2baa0c34c0000p-2,
+ -0x1.e1410132ae5e4p-42,
+ 0x1.32bfee3710000p-2,
+ -0x1.1979a5db68722p-42,
+ 0x1.39de8e1558000p-2,
+ 0x1.f6f7f2b4bd1c4p-42,
+ 0x1.48365e695c000p-2,
+ 0x1.796aa2981fdbcp-42,
+ 0x1.4f6fbb2cec000p-2,
+ 0x1.661e393a16b95p-44,
+ 0x1.56b22e6b58000p-2,
+ -0x1.c6d8d86531d56p-44,
+ 0x1.5dfdcf1eec000p-2,
+ -0x1.1f1bbd2926f16p-42,
+ 0x1.6cb0f6865c000p-2,
+ 0x1.1d406db502403p-43,
+ 0x1.7418acebc0000p-2,
+ -0x1.ce2935fff809ap-43,
+ 0x1.7b89f02cf4000p-2,
+ -0x1.552ce0ec3a295p-42,
+ 0x1.8304d90c10000p-2,
+ 0x1.fd32a3ab0a4b5p-42,
+ 0x1.8a8980abfc000p-2,
+ -0x1.66cccab240e90p-45,
+ 0x1.99b072a96c000p-2,
+ 0x1.ac9bca36fd02ep-44,
+ 0x1.a152f14298000p-2,
+ 0x1.b3d7b0e65d2cep-46,
+ 0x1.a8ff971810000p-2,
+ 0x1.4bc302ffa76fbp-43,
+ 0x1.b0b67f4f48000p-2,
+ -0x1.7f00af09dc1c7p-42,
+ 0x1.b877c57b1c000p-2,
+ -0x1.f20203b3186a6p-43,
+ 0x1.c043859e30000p-2,
+ -0x1.2642415d47384p-45,
+ 0x1.c819dc2d44000p-2,
+ 0x1.fe43895d8ac46p-42,
+ 0x1.cffae611ac000p-2,
+ 0x1.12b628e2d05d7p-42,
+ 0x1.d7e6c0abc4000p-2,
+ -0x1.50e785694a8c6p-43,
+ 0x1.dfdd89d588000p-2,
+ -0x1.1d4f639bb5cdfp-42,
+ 0x1.e7df5fe538000p-2,
+ 0x1.5669df6a2b592p-43,
+ 0x1.efec61b010000p-2,
+ 0x1.f855b4987c5d5p-42,
+ 0x1.f804ae8d0c000p-2,
+ 0x1.a0331af2e6feap-43,
+ 0x1.0014332be0000p-1,
+ 0x1.9518ce032f41dp-48,
+ 0x1.042bd4b9a8000p-1,
+ -0x1.b3b3864c60011p-44,
+ 0x1.08494c66b8000p-1,
+ 0x1.ddf82e1fe57c7p-42,
+ 0x1.0c6caaf0c6000p-1,
+ -0x1.4d20c519e12f4p-42,
+ 0x1.1096015dee000p-1,
+ 0x1.3676289cd3dd4p-43,
+ 0x1.14c560fe68000p-1,
+ 0x1.5f101c141e670p-42,
+ 0x1.18fadb6e2e000p-1,
+ -0x1.87cc95d0a2ee8p-42,
+ 0x1.1d368296b6000p-1,
+ -0x1.b567e7ee54aefp-42,
+ 0x1.217868b0c4000p-1,
+ -0x1.030ab442ce320p-42,
+ 0x1.25c0a0463c000p-1,
+ -0x1.50520a377c7ecp-45,
+ 0x1.2a0f3c3408000p-1,
+ -0x1.f48e1a4725559p-42,
+ -0x1.a33760a7f8000p-2,
+ 0x1.faf6283bf2868p-42,
+ -0x1.9a802391e4000p-2,
+ 0x1.cd0cb4492f1bcp-42,
+ -0x1.91bba891f0000p-2,
+ -0x1.708b4b2b5056cp-42,
+ -0x1.91bba891f0000p-2,
+ -0x1.708b4b2b5056cp-42,
+ -0x1.88e9c72e0c000p-2,
+ 0x1.bb4b69336b66ep-43,
+ -0x1.800a563160000p-2,
+ -0x1.c5432aeb609f5p-42,
+ -0x1.771d2ba7f0000p-2,
+ 0x1.3106e404cabb7p-44,
+ -0x1.6e221cd9d0000p-2,
+ -0x1.9bcaf1aa4168ap-43,
+ -0x1.6518fe4678000p-2,
+ 0x1.1646b761c48dep-44,
+ -0x1.6518fe4678000p-2,
+ 0x1.1646b761c48dep-44,
+ -0x1.5c01a39fbc000p-2,
+ -0x1.6879fa00b120ap-42,
+ -0x1.52dbdfc4c8000p-2,
+ -0x1.6b37dcf60e620p-42,
+ -0x1.49a784bcd0000p-2,
+ -0x1.b8afe492bf6ffp-42,
+ -0x1.406463b1b0000p-2,
+ -0x1.125d6cbcd1095p-44,
+ -0x1.406463b1b0000p-2,
+ -0x1.125d6cbcd1095p-44,
+ -0x1.37124cea4c000p-2,
+ -0x1.bd9b32266d92cp-43,
+ -0x1.2db10fc4d8000p-2,
+ -0x1.aaf6f137a3d8cp-42,
+ -0x1.24407ab0e0000p-2,
+ -0x1.ce60916e52e91p-44,
+ -0x1.24407ab0e0000p-2,
+ -0x1.ce60916e52e91p-44,
+ -0x1.1ac05b2920000p-2,
+ 0x1.f1f5ae718f241p-43,
+ -0x1.11307dad30000p-2,
+ -0x1.6eb9612e0b4f3p-43,
+ -0x1.0790adbb04000p-2,
+ 0x1.fed21f9cb2cc5p-43,
+ -0x1.0790adbb04000p-2,
+ 0x1.fed21f9cb2cc5p-43,
+ -0x1.fbc16b9028000p-3,
+ 0x1.7f5dc57266758p-43,
+ -0x1.e840be74e8000p-3,
+ 0x1.5b338360c2ae2p-43,
+ -0x1.e840be74e8000p-3,
+ 0x1.5b338360c2ae2p-43,
+ -0x1.d49ee4c328000p-3,
+ 0x1.3481b85a54d7fp-42,
+ -0x1.c0db6cdd98000p-3,
+ 0x1.908df8ec933b3p-42,
+ -0x1.c0db6cdd98000p-3,
+ 0x1.908df8ec933b3p-42,
+ -0x1.acf5e2db50000p-3,
+ 0x1.36c101ee13440p-43,
+ -0x1.98edd077e8000p-3,
+ 0x1.e41fa0a62e6aep-44,
+ -0x1.98edd077e8000p-3,
+ 0x1.e41fa0a62e6aep-44,
+ -0x1.84c2bd02f0000p-3,
+ -0x1.d97ee9124773bp-46,
+ -0x1.70742d4ef0000p-3,
+ -0x1.3f94e00e7d6bcp-46,
+ -0x1.70742d4ef0000p-3,
+ -0x1.3f94e00e7d6bcp-46,
+ -0x1.5c01a39fc0000p-3,
+ 0x1.4bc302ffa76fbp-42,
+ -0x1.476a9f9840000p-3,
+ 0x1.1659d8e2d7d38p-44,
+ -0x1.476a9f9840000p-3,
+ 0x1.1659d8e2d7d38p-44,
+ -0x1.32ae9e2788000p-3,
+ -0x1.70d0fa8f9603bp-42,
+ -0x1.32ae9e2788000p-3,
+ -0x1.70d0fa8f9603bp-42,
+ -0x1.1dcd197550000p-3,
+ -0x1.5bdaf522a183cp-42,
+ -0x1.08c588cda8000p-3,
+ 0x1.871a7610e40bdp-45,
+ -0x1.08c588cda8000p-3,
+ 0x1.871a7610e40bdp-45,
+ -0x1.e72ec11800000p-4,
+ 0x1.69378d0928989p-42,
+ -0x1.e72ec11800000p-4,
+ 0x1.69378d0928989p-42,
+ -0x1.bc84240ae0000p-4,
+ 0x1.51167134e9647p-42,
+ -0x1.918a16e460000p-4,
+ -0x1.9ad57391924a7p-43,
+ -0x1.918a16e460000p-4,
+ -0x1.9ad57391924a7p-43,
+ -0x1.663f6fac90000p-4,
+ -0x1.3167ccc538261p-44,
+ -0x1.663f6fac90000p-4,
+ -0x1.3167ccc538261p-44,
+ -0x1.3aa2fdd280000p-4,
+ 0x1.c7a4ff65ddbc9p-45,
+ -0x1.3aa2fdd280000p-4,
+ 0x1.c7a4ff65ddbc9p-45,
+ -0x1.0eb389fa30000p-4,
+ 0x1.819530c22d152p-42,
+ -0x1.c4dfab90a0000p-5,
+ -0x1.56bde9f1f0d3dp-42,
+ -0x1.c4dfab90a0000p-5,
+ -0x1.56bde9f1f0d3dp-42,
+ -0x1.6bad3758e0000p-5,
+ -0x1.fb0e626c0de13p-42,
+ -0x1.6bad3758e0000p-5,
+ -0x1.fb0e626c0de13p-42,
+ -0x1.11cd1d5140000p-5,
+ 0x1.97da24fd75f61p-42,
+ -0x1.11cd1d5140000p-5,
+ 0x1.97da24fd75f61p-42,
+ -0x1.6e79685c40000p-6,
+ 0x1.2dd67591d81dfp-42,
+ -0x1.6e79685c40000p-6,
+ 0x1.2dd67591d81dfp-42,
+ -0x1.6fe50b6f00000p-7,
+ 0x1.ef5d00e390a00p-44,
+ -0x1.6fe50b6f00000p-7,
+ 0x1.ef5d00e390a00p-44,
+ 0x0.0p+0,
+ 0x0.0p+0 };
diff --git a/sysdeps/riscv/rvd/v_d_acos.c b/sysdeps/riscv/rvd/v_d_acos.c
new file mode 100644
index 0000000000..938d5c97b0
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_acos.c
@@ -0,0 +1,238 @@
+/* Double-precision vector acos function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#include "rvvlm.h"
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_ACOSD_VSET_CONFIG
+
+#define COMPILE_FOR_ACOS
+
+#define PIBY2_HI 0x1.921fb54442d18p+0
+#define PIBY2_LO 0x1.1a62633145c07p-54
+
+#define PI_HI 0x1.921fb54442d18p+1
+#define PI_LO 0x1.1a62633145c07p-53
+
+#define ONE_OV_PI_HI 0x1.45f306dc9c883p-2
+#define ONE_OV_PI_LO -0x1.6b01ec5417056p-56
+
+#define PIBY2_Q60 0x1921fb54442d1847
+#define PI_Q60 0x3243f6a8885a308d
+#define ONE_OV_PI_Q63 0x28be60db9391054a
+
+#define FUNC_NEAR_ZERO(small_x, vx, vlen) \
+ __riscv_vfadd ((small_x), \
+ __riscv_vfrsub ((small_x), (vx), PIBY2_LO, (vlen)), \
+ PIBY2_HI, (vlen))
+
+#define FUNC_AT_ONE(abs_x_1, vx, vlen) \
+ __riscv_vfadd ( \
+ (abs_x_1), \
+ __riscv_vfsgnjn ((abs_x_1), VFMV_VF (PIBY2_HI, (vlen)), (vx), (vlen)), \
+ PIBY2_HI, (vlen))
+
+#define EXCEPTION_HANDLING_ASINCOS(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT expo_x = __riscv_vsrl (F_AS_U (vx), MAN_LEN, (vlen)); \
+ expo_x = __riscv_vand (expo_x, 0x7FF, (vlen)); \
+ /* filter out |x| >= 1, Infs and NaNs */ \
+ VBOOL expo_ge_BIAS = __riscv_vmsgeu (expo_x, EXP_BIAS, (vlen)); \
+ /* filter out |x| < 2^(-30) */ \
+ VBOOL expo_le_BIASm31 = __riscv_vmsleu (expo_x, EXP_BIAS - 31, (vlen)); \
+ (special_args) = __riscv_vmor (expo_ge_BIAS, expo_le_BIASm31, (vlen)); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ VFLOAT x_tmp = __riscv_vfsgnj ((vx), fp_posOne, (vlen)); \
+ VBOOL abs_x_1 = __riscv_vmand ( \
+ (special_args), __riscv_vmfeq (x_tmp, fp_posOne, (vlen)), \
+ (vlen)); \
+ VBOOL abs_x_gt1 = __riscv_vmand ( \
+ (special_args), __riscv_vmfgt (x_tmp, fp_posOne, (vlen)), \
+ (vlen)); \
+ (vy_special) = vx; \
+ /* Only replace extended real numbers x, |x| > 1; abs_x_gt1 is not \
+ * true if x is NaN */ \
+ x_tmp = __riscv_vfmerge (x_tmp, fp_sNaN, abs_x_gt1, (vlen)); \
+ /* Here we add x to itself for all "special args" including NaNs, \
+ * generating the necessary signal */ \
+ x_tmp = __riscv_vfadd ((special_args), x_tmp, x_tmp, (vlen)); \
+ (vy_special) \
+ = __riscv_vmerge ((vy_special), x_tmp, (special_args), (vlen)); \
+ x_tmp = FUNC_AT_ONE (abs_x_1, (vx), (vlen)); \
+ (vy_special) \
+ = __riscv_vmerge ((vy_special), x_tmp, abs_x_1, (vlen)); \
+ x_tmp = FUNC_NEAR_ZERO (expo_le_BIASm31, vx, vlen); \
+ (vy_special) = __riscv_vmerge ((vy_special), x_tmp, \
+ expo_le_BIASm31, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posZero, special_args, (vlen)); \
+ } \
+ } \
+ while (0)
+
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, acos) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx_orig, vx, vy, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx_orig = x; \
+ \
+ EXCEPTION_HANDLING_ASINCOS (vx_orig, special_args, vy_special, vlen); \
+ vx = __riscv_vfsgnj (vx_orig, fp_posOne, vlen); \
+ \
+ /* asin(x) ~=~ r + r*s*poly(s); r=x; s=r*r when |x| <= 1/2; \
+ asin(x) = pi/2 - 2 * asin(sqrt((1-x)/2)) for x > 1/2; \
+ acos(x) = pi/2 - asin(x) for |x| <= 1. \
+ These expressions allow us to compute asin or acos with the essential \
+ approximation of asin(x) for |x| <= 1/2 */ \
+ \
+ VBOOL x_le_half = __riscv_vmfle (vx, 0x1.0p-1, vlen); \
+ VBOOL x_gt_half = __riscv_vmnot (x_le_half, vlen); \
+ VBOOL x_orig_le_half = __riscv_vmfle (vx_orig, 0x1.0p-1, vlen); \
+ VBOOL x_orig_lt_neghalf = __riscv_vmflt (vx_orig, 0x1.0p-1, vlen); \
+ VFLOAT alpha, beta; \
+ alpha = vx; \
+ beta = U_AS_F (__riscv_vxor (F_AS_U (beta), F_AS_U (beta), vlen)); \
+ alpha = __riscv_vfmerge (alpha, -0x1.0p-1, x_gt_half, vlen); \
+ beta = __riscv_vfmerge (beta, 0x1.0p-1, x_gt_half, vlen); \
+ VFLOAT s = __riscv_vfmadd (alpha, vx, beta, vlen); \
+ /* s is x*x or (1-x)/2 */ \
+ double two_to_63 = 0x1.0p63; \
+ VINT S = __riscv_vfcvt_x (__riscv_vfmul (s, two_to_63, vlen), vlen); \
+ VINT Ssq = __riscv_vsmul (S, S, 1, vlen); \
+ \
+ /* For x > 1/2, we need to compute sqrt(s) to be used later \
+ where s = (1-x)/2. Note that s > 0 as we have handled |x| = 1 as \
+ special arguments */ \
+ VFLOAT sqrt_s = __riscv_vfsqrt (x_gt_half, s, vlen); \
+ VFLOAT delta = __riscv_vfnmsub (x_gt_half, sqrt_s, sqrt_s, s, vlen); \
+ delta = __riscv_vfmul (x_gt_half, delta, __riscv_vfrec7 (s, vlen), vlen); \
+ delta = __riscv_vfmul (x_gt_half, delta, sqrt_s, vlen); \
+ delta = __riscv_vfmul (x_gt_half, delta, 0x1.0p-1, vlen); \
+ \
+ VINT P_EVEN = PSTEP_I ( \
+ 0x15555555555390dd, Ssq, \
+ PSTEP_I (0x5b6db6d09b27a82, Ssq, \
+ PSTEP_I (0x2dd13e6dd791f29, Ssq, \
+ PSTEP_I (0x1c6fc7fedf424bb, Ssq, \
+ PSTEP_I (0xd5bd98b325786c, \
+ -0x21470ca28feec71, Ssq, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VINT P_ODD = PSTEP_I ( \
+ 0x99999999b7428ad, Ssq, \
+ PSTEP_I (0x3e38e587fad54b2, Ssq, \
+ PSTEP_I (0x238d7e0436a1c30, Ssq, \
+ PSTEP_I (0x18ecc06c5a390e3, Ssq, \
+ PSTEP_I (0x28063c8b4b6a072, \
+ 0x41646ebd6edd35e, Ssq, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P_ODD = __riscv_vsmul (P_ODD, S, 1, vlen); \
+ VINT POLY = __riscv_vsadd (P_ODD, P_EVEN, vlen); \
+ POLY = __riscv_vsmul (POLY, S, 1, vlen); \
+ \
+ VFLOAT r = vx; \
+ r = __riscv_vmerge (r, sqrt_s, x_gt_half, vlen); \
+ delta = __riscv_vfmerge (delta, fp_posZero, x_le_half, vlen); \
+ \
+ VINT m = U_AS_I (__riscv_vrsub (__riscv_vsrl (F_AS_U (r), MAN_LEN, vlen), \
+ EXP_BIAS, vlen)); \
+ \
+ m = __riscv_vmin (m, 60, vlen); /* in case r is 0.0 */ \
+ VINT q = __riscv_vadd (m, 60, vlen); \
+ q = __riscv_vmerge (q, 60, x_orig_le_half, vlen); \
+ r = __riscv_vfsgnjx (r, vx_orig, vlen); \
+ delta = __riscv_vfsgnjx (delta, vx_orig, vlen); \
+ VFLOAT scale_r = U_AS_F (__riscv_vsll ( \
+ I_AS_U (__riscv_vadd (q, EXP_BIAS, vlen)), MAN_LEN, vlen)); \
+ VINT R = __riscv_vfcvt_x (__riscv_vfmul (r, scale_r, vlen), vlen); \
+ R = __riscv_vsadd ( \
+ __riscv_vfcvt_x (__riscv_vfmul (delta, scale_r, vlen), vlen), R, \
+ vlen); \
+ POLY = __riscv_vsadd (R, __riscv_vsmul (POLY, R, 1, vlen), vlen); \
+ VINT POLY_prime = __riscv_vsadd (x_gt_half, POLY, POLY, vlen); \
+ \
+ POLY = __riscv_vrsub (x_le_half, POLY, 0, vlen); \
+ \
+ POLY = __riscv_vmerge (POLY, POLY_prime, x_gt_half, vlen); \
+ \
+ VINT C; \
+ C = __riscv_vxor (C, C, vlen); \
+ C = __riscv_vmerge (C, PI_Q60, x_orig_lt_neghalf, vlen); \
+ C = __riscv_vmerge (C, PIBY2_Q60, x_le_half, vlen); \
+ POLY = __riscv_vsadd (C, POLY, vlen); \
+ \
+ VFLOAT inv_scale_r = U_AS_F (__riscv_vsll ( \
+ I_AS_U (__riscv_vrsub (q, EXP_BIAS, vlen)), MAN_LEN, vlen)); \
+ vy = __riscv_vfmul (inv_scale_r, __riscv_vfcvt_f (POLY, vlen), vlen); \
+ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_acosh.c b/sysdeps/riscv/rvd/v_d_acosh.c
new file mode 100644
index 0000000000..1673810e1d
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_acosh.c
@@ -0,0 +1,152 @@
+/* Double-precision vector acosh function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#include "rvvlm.h"
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_ACOSHD_VSET_CONFIG
+
+#define COMPILE_FOR_ACOSH
+
+#include "rvvlm_invhyperD.h"
+
+// Acosh(x) is defined for x >= 1 by the formula log(x + sqrt(x*x - 1))
+// Asinh(x) is defined for all finite x by the formula log(x + sqrt(x*x + 1))
+// Acosh is always positive, and Asinh(-x) = -Asinh(x). Thus we in general
+// work with |x| and restore the sign (if necessary) in the end.
+// For the log function log(2^n z), we uses the expansion in terms of atanh:
+// n log(2) + 2 atanh((z-1)/(z+1))
+// The algorithm here first scales down x by 2^(-550) when |x| >= 2^500.
+// And for such large x, both acosh and asinh equals log(2x) to very high
+// precision. We safely ignore the +/- 1 when this is the case.
+//
+// A power 2^n is determined by the value of x + sqrt(x*x +/- 1) so that
+// scaling the expression by 2^(-n) transforms it to the range [0.71, 1.42].
+// Log(t) for t in this region is computed by 2 atanh((t-1)/(t+1))
+// More precisely, we use s = 2(t-1)/(t+1) and approximate the function
+// 2 atanh(s/2) by s + s^3 * polynomial(s^2).
+// The final result is n * log(2) + s + s^3 * polynomial(s^2)
+// which is computed with care.
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, acosh) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx, vy, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx = x; \
+ \
+ /* Handle Inf and NaN and input <= 1.0 */ \
+ EXCEPTION_HANDLING_ACOSH (vx, special_args, vy_special, vlen); \
+ \
+ /* Need to scale x so that x + sqrt(x*x +/- 1) doesn't overflow \
+ // We scale x down by 2^(-550) if x >= 2^500 and set the "+/- 1" to 0 */ \
+ VINT n; \
+ VFLOAT u; \
+ SCALE_X (vx, n, u, vlen); \
+ /* n is 0 or 500; and u is +/-1.0 or 0.0 \
+ \ \
+ // sqrt(x*x + u) extra precisely */ \
+ VFLOAT A, a; \
+ XSQ_PLUS_U_ACOSH (vx, u, A, a, vlen); \
+ /* A + a is x*x + u */ \
+ \
+ VFLOAT B, b; \
+ SQRT2_X2 (A, a, B, b, vlen); \
+ /* B + b is sqrt(x*x + u) to about 7 extra bits */ \
+ \
+ VFLOAT S, s; \
+ /* x dominantes B for acosh */ \
+ FAST2SUM (vx, B, S, s, vlen); \
+ s = __riscv_vfadd (s, b, vlen); \
+ \
+ /* x + sqrt(x*x + u) is accurately represented as S + s \
+ // We first scale S, s by 2^(-n) so that the scaled value \
+ // falls roughly in [1/rt2, rt2] */ \
+ SCALE_4_LOG (S, s, n, vlen); \
+ \
+ /* log(x + sqrt(x*x + u)) = n * log(2) + log(y); y = S + s \
+ // We use log(y) = 2 atanh( (y-1)/(y+1) ) and approximate the latter \
+ // by t + t^3 * poly(t^2), t = 2 (y-1)/(y+1) */ \
+ \
+ /* We now compute the numerator 2(y-1) and denominator y+1 and their \
+ // quotient to extra precision */ \
+ VFLOAT numer, delta_numer, denom, delta_denom; \
+ TRANSFORM_2_ATANH (S, s, numer, delta_numer, denom, delta_denom, vlen); \
+ \
+ VFLOAT r_hi, r_lo, r; \
+ DIV2_N2D2 (numer, delta_numer, denom, delta_denom, r_hi, r_lo, vlen); \
+ r = __riscv_vfadd (r_hi, r_lo, vlen); \
+ \
+ VFLOAT poly; \
+ LOG_POLY (r, r_lo, poly, vlen); \
+ /* At this point r_hi + poly approximates log(X) */ \
+ \
+ /* Compose the final result: n * log(2) + r_hi + poly */ \
+ VFLOAT n_flt = __riscv_vfcvt_f (n, vlen); \
+ A = __riscv_vfmul (n_flt, LOG2_HI, vlen); \
+ FAST2SUM (A, r_hi, S, s, vlen); \
+ s = __riscv_vfmacc (s, LOG2_LO, n_flt, vlen); \
+ s = __riscv_vfadd (s, poly, vlen); \
+ \
+ vy = __riscv_vfadd (S, s, vlen); \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_acospi.c b/sysdeps/riscv/rvd/v_d_acospi.c
new file mode 100644
index 0000000000..b6811c6343
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_acospi.c
@@ -0,0 +1,237 @@
+/* Double-precision vector acospi function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#include "rvvlm.h"
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_ACOSPID_VSET_CONFIG
+
+#define COMPILE_FOR_ACOSPI
+
+#define PIBY2_HI 0x1.921fb54442d18p+0
+#define PIBY2_LO 0x1.1a62633145c07p-54
+
+#define PI_HI 0x1.921fb54442d18p+1
+#define PI_LO 0x1.1a62633145c07p-53
+
+#define ONE_OV_PI_HI 0x1.45f306dc9c883p-2
+#define ONE_OV_PI_LO -0x1.6b01ec5417056p-56
+
+#define PIBY2_Q60 0x1921fb54442d1847
+#define PI_Q60 0x3243f6a8885a308d
+#define ONE_OV_PI_Q63 0x28be60db9391054a
+
+#define FUNC_NEAR_ZERO(small_x, vx, vlen) \
+ __riscv_vfnmsub ((small_x), (vx), ONE_OV_PI_HI, VFMV_VF (0x1.0p-1, (vlen)), \
+ (vlen))
+
+#define FUNC_AT_ONE(abs_x_1, vx, vlen) \
+ __riscv_vfadd ( \
+ (abs_x_1), \
+ __riscv_vfsgnjn ((abs_x_1), VFMV_VF (0x1.0p-1, (vlen)), (vx), (vlen)), \
+ 0x1.0p-1, (vlen))
+
+#define EXCEPTION_HANDLING_ASINCOS(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT expo_x = __riscv_vsrl (F_AS_U (vx), MAN_LEN, (vlen)); \
+ expo_x = __riscv_vand (expo_x, 0x7FF, (vlen)); \
+ /* filter out |x| >= 1, Infs and NaNs */ \
+ VBOOL expo_ge_BIAS = __riscv_vmsgeu (expo_x, EXP_BIAS, (vlen)); \
+ /* filter out |x| < 2^(-30) */ \
+ VBOOL expo_le_BIASm31 = __riscv_vmsleu (expo_x, EXP_BIAS - 31, (vlen)); \
+ (special_args) = __riscv_vmor (expo_ge_BIAS, expo_le_BIASm31, (vlen)); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ VFLOAT x_tmp = __riscv_vfsgnj ((vx), fp_posOne, (vlen)); \
+ VBOOL abs_x_1 = __riscv_vmand ( \
+ (special_args), __riscv_vmfeq (x_tmp, fp_posOne, (vlen)), \
+ (vlen)); \
+ VBOOL abs_x_gt1 = __riscv_vmand ( \
+ (special_args), __riscv_vmfgt (x_tmp, fp_posOne, (vlen)), \
+ (vlen)); \
+ (vy_special) = vx; \
+ /* Only replace extended real numbers x, |x| > 1; abs_x_gt1 is not \
+ * true if x is NaN */ \
+ x_tmp = __riscv_vfmerge (x_tmp, fp_sNaN, abs_x_gt1, (vlen)); \
+ /* Here we add x to itself for all "special args" including NaNs, \
+ * generating the necessary signal */ \
+ x_tmp = __riscv_vfadd ((special_args), x_tmp, x_tmp, (vlen)); \
+ (vy_special) \
+ = __riscv_vmerge ((vy_special), x_tmp, (special_args), (vlen)); \
+ x_tmp = FUNC_AT_ONE (abs_x_1, (vx), (vlen)); \
+ (vy_special) \
+ = __riscv_vmerge ((vy_special), x_tmp, abs_x_1, (vlen)); \
+ x_tmp = FUNC_NEAR_ZERO (expo_le_BIASm31, vx, vlen); \
+ (vy_special) = __riscv_vmerge ((vy_special), x_tmp, \
+ expo_le_BIASm31, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posZero, special_args, (vlen)); \
+ } \
+ } \
+ while (0)
+
+// For asin/acos, the computation is of the form Const +/- (r + r*s*poly(s))
+// This version computes this entire expression in fixed point by converting
+// r and s into fixed point.
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, acospi) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx_orig, vx, vy, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx_orig = x; \
+ \
+ EXCEPTION_HANDLING_ASINCOS (vx_orig, special_args, vy_special, vlen); \
+ vx = __riscv_vfsgnj (vx_orig, fp_posOne, vlen); \
+ \
+ VBOOL x_le_half = __riscv_vmfle (vx, 0x1.0p-1, vlen); \
+ VBOOL x_gt_half = __riscv_vmnot (x_le_half, vlen); \
+ VBOOL x_orig_le_half = __riscv_vmfle (vx_orig, 0x1.0p-1, vlen); \
+ VBOOL x_orig_lt_neghalf = __riscv_vmflt (vx_orig, 0x1.0p-1, vlen); \
+ VFLOAT alpha, beta; \
+ alpha = vx; \
+ beta = U_AS_F (__riscv_vxor (F_AS_U (beta), F_AS_U (beta), vlen)); \
+ alpha = __riscv_vfmerge (alpha, -0x1.0p-1, x_gt_half, vlen); \
+ beta = __riscv_vfmerge (beta, 0x1.0p-1, x_gt_half, vlen); \
+ VFLOAT s = __riscv_vfmadd (alpha, vx, beta, vlen); \
+ /* s is x*x or (1-x)/2 */ \
+ double two_to_63 = 0x1.0p63; \
+ VINT S = __riscv_vfcvt_x (__riscv_vfmul (s, two_to_63, vlen), vlen); \
+ VINT Ssq = __riscv_vsmul (S, S, 1, vlen); \
+ \
+ /* For x > 1/2, we need to compute sqrt(s) to be used later \
+ // where s = (1-x)/2. Note that s > 0 as we have handled |x| = 1 as \
+ special \
+ // arguments */ \
+ VFLOAT sqrt_s = __riscv_vfsqrt (x_gt_half, s, vlen); \
+ VFLOAT delta = __riscv_vfnmsub (x_gt_half, sqrt_s, sqrt_s, s, vlen); \
+ delta = __riscv_vfmul (x_gt_half, delta, __riscv_vfrec7 (s, vlen), vlen); \
+ delta = __riscv_vfmul (x_gt_half, delta, sqrt_s, vlen); \
+ delta = __riscv_vfmul (x_gt_half, delta, 0x1.0p-1, vlen); \
+ \
+ VINT P_EVEN = PSTEP_I ( \
+ 0x15555555555390dd, Ssq, \
+ PSTEP_I (0x5b6db6d09b27a82, Ssq, \
+ PSTEP_I (0x2dd13e6dd791f29, Ssq, \
+ PSTEP_I (0x1c6fc7fedf424bb, Ssq, \
+ PSTEP_I (0xd5bd98b325786c, \
+ -0x21470ca28feec71, Ssq, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VINT P_ODD = PSTEP_I ( \
+ 0x99999999b7428ad, Ssq, \
+ PSTEP_I (0x3e38e587fad54b2, Ssq, \
+ PSTEP_I (0x238d7e0436a1c30, Ssq, \
+ PSTEP_I (0x18ecc06c5a390e3, Ssq, \
+ PSTEP_I (0x28063c8b4b6a072, \
+ 0x41646ebd6edd35e, Ssq, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P_ODD = __riscv_vsmul (P_ODD, S, 1, vlen); \
+ VINT POLY = __riscv_vsadd (P_ODD, P_EVEN, vlen); \
+ POLY = __riscv_vsmul (POLY, S, 1, vlen); \
+ \
+ VFLOAT r = vx; \
+ r = __riscv_vmerge (r, sqrt_s, x_gt_half, vlen); \
+ delta = __riscv_vfmerge (delta, fp_posZero, x_le_half, vlen); \
+ \
+ VINT m = U_AS_I (__riscv_vrsub (__riscv_vsrl (F_AS_U (r), MAN_LEN, vlen), \
+ EXP_BIAS, vlen)); \
+ m = __riscv_vmin (m, 60, vlen); /* in case r is 0.0 */ \
+ VINT q = __riscv_vadd (m, 60, vlen); \
+ q = __riscv_vmerge (q, 60, x_orig_le_half, vlen); \
+ r = __riscv_vfsgnjx (r, vx_orig, vlen); \
+ delta = __riscv_vfsgnjx (delta, vx_orig, vlen); \
+ VFLOAT scale_r = U_AS_F (__riscv_vsll ( \
+ I_AS_U (__riscv_vadd (q, EXP_BIAS, vlen)), MAN_LEN, vlen)); \
+ VINT R = __riscv_vfcvt_x (__riscv_vfmul (r, scale_r, vlen), vlen); \
+ R = __riscv_vsadd ( \
+ __riscv_vfcvt_x (__riscv_vfmul (delta, scale_r, vlen), vlen), R, \
+ vlen); \
+ POLY = __riscv_vsadd (R, __riscv_vsmul (POLY, R, 1, vlen), vlen); \
+ VINT POLY_prime = __riscv_vsadd (x_gt_half, POLY, POLY, vlen); \
+ \
+ POLY = __riscv_vrsub (x_le_half, POLY, 0, vlen); \
+ \
+ POLY = __riscv_vmerge (POLY, POLY_prime, x_gt_half, vlen); \
+ \
+ VINT C; \
+ C = __riscv_vxor (C, C, vlen); \
+ C = __riscv_vmerge (C, PI_Q60, x_orig_lt_neghalf, vlen); \
+ C = __riscv_vmerge (C, PIBY2_Q60, x_le_half, vlen); \
+ POLY = __riscv_vsadd (C, POLY, vlen); \
+ \
+ POLY = __riscv_vsmul (POLY, ONE_OV_PI_Q63, 1, vlen); \
+ \
+ VFLOAT inv_scale_r = U_AS_F (__riscv_vsll ( \
+ I_AS_U (__riscv_vrsub (q, EXP_BIAS, vlen)), MAN_LEN, vlen)); \
+ vy = __riscv_vfmul (inv_scale_r, __riscv_vfcvt_f (POLY, vlen), vlen); \
+ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_asin.c b/sysdeps/riscv/rvd/v_d_asin.c
new file mode 100644
index 0000000000..27ff7067c8
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_asin.c
@@ -0,0 +1,224 @@
+/* Double-precision vector asin function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_ASIND_VSET_CONFIG
+
+#define COMPILE_FOR_ASIN
+
+#define PIBY2_HI 0x1.921fb54442d18p+0
+#define PIBY2_LO 0x1.1a62633145c07p-54
+
+#define PI_HI 0x1.921fb54442d18p+1
+#define PI_LO 0x1.1a62633145c07p-53
+
+#define ONE_OV_PI_HI 0x1.45f306dc9c883p-2
+#define ONE_OV_PI_LO -0x1.6b01ec5417056p-56
+
+#define PIBY2_Q60 0x1921fb54442d1847
+#define PI_Q60 0x3243f6a8885a308d
+#define ONE_OV_PI_Q63 0x28be60db9391054a
+
+#define FUNC_NEAR_ZERO(small_x, vx, vlen) \
+ __riscv_vfmadd ((small_x), (vx), 0x1.0p-60, (vx), (vlen))
+
+#define FUNC_AT_ONE(abs_x_1, vx, vlen) \
+ __riscv_vfsgnj ((abs_x_1), VFMV_VF (PIBY2_HI, (vlen)), (vx), (vlen))
+
+#define EXCEPTION_HANDLING_ASINCOS(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT expo_x = __riscv_vsrl (F_AS_U (vx), MAN_LEN, (vlen)); \
+ expo_x = __riscv_vand (expo_x, 0x7FF, (vlen)); \
+ /* filter out |x| >= 1, Infs and NaNs */ \
+ VBOOL expo_ge_BIAS = __riscv_vmsgeu (expo_x, EXP_BIAS, (vlen)); \
+ /* filter out |x| < 2^(-30) */ \
+ VBOOL expo_le_BIASm31 = __riscv_vmsleu (expo_x, EXP_BIAS - 31, (vlen)); \
+ (special_args) = __riscv_vmor (expo_ge_BIAS, expo_le_BIASm31, (vlen)); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ VFLOAT x_tmp = __riscv_vfsgnj ((vx), fp_posOne, (vlen)); \
+ VBOOL abs_x_1 = __riscv_vmand ( \
+ (special_args), __riscv_vmfeq (x_tmp, fp_posOne, (vlen)), \
+ (vlen)); \
+ VBOOL abs_x_gt1 = __riscv_vmand ( \
+ (special_args), __riscv_vmfgt (x_tmp, fp_posOne, (vlen)), \
+ (vlen)); \
+ (vy_special) = vx; \
+ /* Only replace extended real numbers x, |x| > 1; abs_x_gt1 is not \
+ * true if x is NaN */ \
+ x_tmp = __riscv_vfmerge (x_tmp, fp_sNaN, abs_x_gt1, (vlen)); \
+ /* Here we add x to itself for all "special args" including NaNs, \
+ * generating the necessary signal */ \
+ x_tmp = __riscv_vfadd ((special_args), x_tmp, x_tmp, (vlen)); \
+ (vy_special) \
+ = __riscv_vmerge ((vy_special), x_tmp, (special_args), (vlen)); \
+ x_tmp = FUNC_AT_ONE (abs_x_1, (vx), (vlen)); \
+ (vy_special) \
+ = __riscv_vmerge ((vy_special), x_tmp, abs_x_1, (vlen)); \
+ x_tmp = FUNC_NEAR_ZERO (expo_le_BIASm31, vx, vlen); \
+ (vy_special) = __riscv_vmerge ((vy_special), x_tmp, \
+ expo_le_BIASm31, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posZero, special_args, (vlen)); \
+ } \
+ } \
+ while (0)
+
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, asin) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx_orig, vx, vy, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx_orig = x; \
+ \
+ EXCEPTION_HANDLING_ASINCOS (vx_orig, special_args, vy_special, vlen); \
+ vx = __riscv_vfsgnj (vx_orig, fp_posOne, vlen); \
+ \
+ /* asin(x) ~=~ r + r*s*poly(s); r=x; s=r*r when |x| <= 1/2; \
+ asin(x) = pi/2 - 2 * asin(sqrt((1-x)/2)) for x > 1/2; \
+ acos(x) = pi/2 - asin(x) for |x| <= 1. \
+ These expressions allow us to compute asin or acos with the essential \
+ approximation of asin(x) for |x| <= 1/2 */ \
+ \
+ VBOOL x_le_half = __riscv_vmfle (vx, 0x1.0p-1, vlen); \
+ VBOOL x_gt_half = __riscv_vmnot (x_le_half, vlen); \
+ VFLOAT alpha, beta; \
+ alpha = vx; \
+ beta = U_AS_F (__riscv_vxor (F_AS_U (beta), F_AS_U (beta), vlen)); \
+ alpha = __riscv_vfmerge (alpha, -0x1.0p-1, x_gt_half, vlen); \
+ beta = __riscv_vfmerge (beta, 0x1.0p-1, x_gt_half, vlen); \
+ VFLOAT s = __riscv_vfmadd (alpha, vx, beta, vlen); \
+ /* s is x*x or (1-x)/2 */ \
+ double two_to_63 = 0x1.0p63; \
+ VINT S = __riscv_vfcvt_x (__riscv_vfmul (s, two_to_63, vlen), vlen); \
+ VINT Ssq = __riscv_vsmul (S, S, 1, vlen); \
+ \
+ /* For x > 1/2, we need to compute sqrt(s) to be used later \
+ where s = (1-x)/2. Note that s > 0 as we have handled |x| = 1 as \
+ special arguments */ \
+ VFLOAT sqrt_s = __riscv_vfsqrt (x_gt_half, s, vlen); \
+ VFLOAT delta = __riscv_vfnmsub (x_gt_half, sqrt_s, sqrt_s, s, vlen); \
+ delta = __riscv_vfmul (x_gt_half, delta, __riscv_vfrec7 (s, vlen), vlen); \
+ delta = __riscv_vfmul (x_gt_half, delta, sqrt_s, vlen); \
+ delta = __riscv_vfmul (x_gt_half, delta, 0x1.0p-1, vlen); \
+ \
+ VINT P_EVEN = PSTEP_I ( \
+ 0x15555555555390dd, Ssq, \
+ PSTEP_I (0x5b6db6d09b27a82, Ssq, \
+ PSTEP_I (0x2dd13e6dd791f29, Ssq, \
+ PSTEP_I (0x1c6fc7fedf424bb, Ssq, \
+ PSTEP_I (0xd5bd98b325786c, \
+ -0x21470ca28feec71, Ssq, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VINT P_ODD = PSTEP_I ( \
+ 0x99999999b7428ad, Ssq, \
+ PSTEP_I (0x3e38e587fad54b2, Ssq, \
+ PSTEP_I (0x238d7e0436a1c30, Ssq, \
+ PSTEP_I (0x18ecc06c5a390e3, Ssq, \
+ PSTEP_I (0x28063c8b4b6a072, \
+ 0x41646ebd6edd35e, Ssq, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P_ODD = __riscv_vsmul (P_ODD, S, 1, vlen); \
+ VINT POLY = __riscv_vsadd (P_ODD, P_EVEN, vlen); \
+ POLY = __riscv_vsmul (POLY, S, 1, vlen); \
+ \
+ VFLOAT r = vx; \
+ r = __riscv_vmerge (r, sqrt_s, x_gt_half, vlen); \
+ delta = __riscv_vfmerge (delta, fp_posZero, x_le_half, vlen); \
+ \
+ VINT m = U_AS_I (__riscv_vrsub (__riscv_vsrl (F_AS_U (r), MAN_LEN, vlen), \
+ EXP_BIAS, vlen)); \
+ m = __riscv_vmin (m, 60, vlen); /* in case r is 0.0 */ \
+ VINT q = __riscv_vadd (m, 60, vlen); \
+ q = __riscv_vmerge (q, 60, x_gt_half, vlen); \
+ VFLOAT scale_r = U_AS_F (__riscv_vsll ( \
+ I_AS_U (__riscv_vadd (q, EXP_BIAS, vlen)), MAN_LEN, vlen)); \
+ VINT R = __riscv_vfcvt_x (__riscv_vfmul (r, scale_r, vlen), vlen); \
+ R = __riscv_vsadd ( \
+ __riscv_vfcvt_x (__riscv_vfmul (delta, scale_r, vlen), vlen), R, \
+ vlen); \
+ POLY = __riscv_vsadd (R, __riscv_vsmul (POLY, R, 1, vlen), vlen); \
+ VINT POLY_prime = __riscv_vsadd (x_gt_half, POLY, POLY, vlen); \
+ \
+ POLY_prime = __riscv_vrsub (x_gt_half, POLY_prime, PIBY2_Q60, vlen); \
+ \
+ POLY = __riscv_vmerge (POLY, POLY_prime, x_gt_half, vlen); \
+ \
+ VFLOAT inv_scale_r = U_AS_F (__riscv_vsll ( \
+ I_AS_U (__riscv_vrsub (q, EXP_BIAS, vlen)), MAN_LEN, vlen)); \
+ vy = __riscv_vfmul (inv_scale_r, __riscv_vfcvt_f (POLY, vlen), vlen); \
+ \
+ vy = __riscv_vfsgnj (vy, vx_orig, vlen); \
+ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_asinh.c b/sysdeps/riscv/rvd/v_d_asinh.c
new file mode 100644
index 0000000000..2611057ca1
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_asinh.c
@@ -0,0 +1,160 @@
+/* Double-precision vector asinh function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_ASINHD_VSET_CONFIG
+
+#define COMPILE_FOR_ASINH
+
+#include "rvvlm_invhyperD.h"
+
+// Acosh(x) is defined for x >= 1 by the formula log(x + sqrt(x*x - 1))
+// Asinh(x) is defined for all finite x by the formula log(x + sqrt(x*x + 1))
+// Acosh is always positive, and Asinh(-x) = -Asinh(x). Thus we in general
+// work with |x| and restore the sign (if necessary) in the end.
+// For the log function log(2^n z), we uses the expansion in terms of atanh:
+// n log(2) + 2 atanh((z-1)/(z+1))
+// The algorithm here first scales down x by 2^(-550) when |x| >= 2^500.
+// And for such large x, both acosh and asinh equals log(2x) to very high
+// precision. We safely ignore the +/- 1 when this is the case.
+//
+// A power 2^n is determined by the value of x + sqrt(x*x +/- 1) so that
+// scaling the expression by 2^(-n) transforms it to the range [0.71, 1.42].
+// Log(t) for t in this region is computed by 2 atanh((t-1)/(t+1))
+// More precisely, we use s = 2(t-1)/(t+1) and approximate the function
+// 2 atanh(s/2) by s + s^3 * polynomial(s^2).
+// The final result is n * log(2) + s + s^3 * polynomial(s^2)
+// which is computed with care.
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, asinh) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx, vy, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx = x; \
+ VFLOAT vx_orig = vx; \
+ \
+ /* Handle Inf and NaN and |input| < 2^(-30) */ \
+ EXCEPTION_HANDLING_ASINH (vx, special_args, vy_special, vlen); \
+ vx = __riscv_vfsgnj (vx, fp_posOne, vlen); \
+ \
+ /* Need to scale x so that x + sqrt(x*x +/- 1) doesn't overflow \
+ // We scale x down by 2^(-550) if x >= 2^500 and set the "+/- 1" to 0 */ \
+ VINT n; \
+ VFLOAT u; \
+ SCALE_X (vx, n, u, vlen); \
+ /* n is 0 or 500; and u is +/-1.0 or 0.0 */ \
+ \
+ /* sqrt(x*x + u) extra precisely */ \
+ VFLOAT A, a; \
+ XSQ_PLUS_U_ASINH (vx, u, A, a, vlen); \
+ /* A + a is x*x + u */ \
+ \
+ VFLOAT B, b; \
+ /* For asinh, we need the sqrt to double-double precision */ \
+ VFLOAT recip = __riscv_vfrdiv (A, fp_posOne, vlen); \
+ B = __riscv_vfsqrt (A, vlen); \
+ b = __riscv_vfnmsub (B, B, A, vlen); \
+ b = __riscv_vfadd (b, a, vlen); \
+ VFLOAT B_recip = __riscv_vfmul (B, recip, vlen); \
+ b = __riscv_vfmul (b, 0x1.0p-1, vlen); \
+ b = __riscv_vfmul (b, B_recip, vlen); \
+ \
+ VFLOAT S, s; \
+ /* B dominates x for asinh */ \
+ FAST2SUM (B, vx, S, s, vlen); \
+ s = __riscv_vfadd (s, b, vlen); \
+ \
+ /* x + sqrt(x*x + u) is accurately represented as S + s \
+ // We first scale S, s by 2^(-n) so that the scaled value \
+ // falls roughly in [1/rt2, rt2] */ \
+ SCALE_4_LOG (S, s, n, vlen); \
+ \
+ /* log(x + sqrt(x*x + u)) = n * log(2) + log(y); y = S + s \
+ // We use log(y) = 2 atanh( (y-1)/(y+1) ) and approximate the latter \
+ // by t + t^3 * poly(t^2), t = 2 (y-1)/(y+1) */ \
+ \
+ /* We now compute the numerator 2(y-1) and denominator y+1 and their \
+ // quotient to extra precision */ \
+ VFLOAT numer, delta_numer, denom, delta_denom; \
+ TRANSFORM_2_ATANH (S, s, numer, delta_numer, denom, delta_denom, vlen); \
+ \
+ VFLOAT r_hi, r_lo, r; \
+ DIV2_N2D2 (numer, delta_numer, denom, delta_denom, r_hi, r_lo, vlen); \
+ r = __riscv_vfadd (r_hi, r_lo, vlen); \
+ \
+ VFLOAT poly; \
+ LOG_POLY (r, r_lo, poly, vlen); \
+ /* At this point r_hi + poly approximates log(X) */ \
+ \
+ /* Compose the final result: n * log(2) + r_hi + poly */ \
+ VFLOAT n_flt = __riscv_vfcvt_f (n, vlen); \
+ A = __riscv_vfmul (n_flt, LOG2_HI, vlen); \
+ FAST2SUM (A, r_hi, S, s, vlen); \
+ s = __riscv_vfmacc (s, LOG2_LO, n_flt, vlen); \
+ s = __riscv_vfadd (s, poly, vlen); \
+ \
+ vy = __riscv_vfadd (S, s, vlen); \
+ vy = __riscv_vfsgnj (vy, vx_orig, vlen); \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_asinpi.c b/sysdeps/riscv/rvd/v_d_asinpi.c
new file mode 100644
index 0000000000..1c219b8ed0
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_asinpi.c
@@ -0,0 +1,221 @@
+/* Double-precision vector asinpi function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#include "rvvlm.h"
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_ASINPID_VSET_CONFIG
+
+#define COMPILE_FOR_ASINPI
+
+#define PIBY2_HI 0x1.921fb54442d18p+0
+#define PIBY2_LO 0x1.1a62633145c07p-54
+
+#define PI_HI 0x1.921fb54442d18p+1
+#define PI_LO 0x1.1a62633145c07p-53
+
+#define ONE_OV_PI_HI 0x1.45f306dc9c883p-2
+#define ONE_OV_PI_LO -0x1.6b01ec5417056p-56
+
+#define PIBY2_Q60 0x1921fb54442d1847
+#define PI_Q60 0x3243f6a8885a308d
+#define ONE_OV_PI_Q63 0x28be60db9391054a
+
+#define FUNC_NEAR_ZERO(small_x, vx, vlen) \
+ __riscv_vfmadd ((small_x), (vx), ONE_OV_PI_HI, \
+ __riscv_vfmul ((small_x), (vx), ONE_OV_PI_LO, (vlen)), \
+ (vlen))
+
+#define FUNC_AT_ONE(abs_x_1, vx, vlen) \
+ __riscv_vfsgnj ((abs_x_1), VFMV_VF (0x1.0p-1, (vlen)), (vx), (vlen))
+
+#define EXCEPTION_HANDLING_ASINCOS(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT expo_x = __riscv_vsrl (F_AS_U (vx), MAN_LEN, (vlen)); \
+ expo_x = __riscv_vand (expo_x, 0x7FF, (vlen)); \
+ /* filter out |x| >= 1, Infs and NaNs */ \
+ VBOOL expo_ge_BIAS = __riscv_vmsgeu (expo_x, EXP_BIAS, (vlen)); \
+ /* filter out |x| < 2^(-30) */ \
+ VBOOL expo_le_BIASm31 = __riscv_vmsleu (expo_x, EXP_BIAS - 31, (vlen)); \
+ (special_args) = __riscv_vmor (expo_ge_BIAS, expo_le_BIASm31, (vlen)); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ VFLOAT x_tmp = __riscv_vfsgnj ((vx), fp_posOne, (vlen)); \
+ VBOOL abs_x_1 = __riscv_vmand ( \
+ (special_args), __riscv_vmfeq (x_tmp, fp_posOne, (vlen)), \
+ (vlen)); \
+ VBOOL abs_x_gt1 = __riscv_vmand ( \
+ (special_args), __riscv_vmfgt (x_tmp, fp_posOne, (vlen)), \
+ (vlen)); \
+ (vy_special) = vx; \
+ /* Only replace extended real numbers x, |x| > 1; abs_x_gt1 is not \
+ * true if x is NaN */ \
+ x_tmp = __riscv_vfmerge (x_tmp, fp_sNaN, abs_x_gt1, (vlen)); \
+ /* Here we add x to itself for all "special args" including NaNs, \
+ * generating the necessary signal */ \
+ x_tmp = __riscv_vfadd ((special_args), x_tmp, x_tmp, (vlen)); \
+ (vy_special) \
+ = __riscv_vmerge ((vy_special), x_tmp, (special_args), (vlen)); \
+ x_tmp = FUNC_AT_ONE (abs_x_1, (vx), (vlen)); \
+ (vy_special) \
+ = __riscv_vmerge ((vy_special), x_tmp, abs_x_1, (vlen)); \
+ x_tmp = FUNC_NEAR_ZERO (expo_le_BIASm31, vx, vlen); \
+ (vy_special) = __riscv_vmerge ((vy_special), x_tmp, \
+ expo_le_BIASm31, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posZero, special_args, (vlen)); \
+ } \
+ } \
+ while (0)
+
+// For asin/acos, the computation is of the form Const +/- (r + r*s*poly(s))
+// This version computes this entire expression in fixed point by converting
+// r and s into fixed point.
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, asinpi) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx_orig, vx, vy, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ vlen = VSET (simdlen); \
+ vx_orig = x; \
+ \
+ EXCEPTION_HANDLING_ASINCOS (vx_orig, special_args, vy_special, vlen); \
+ vx = __riscv_vfsgnj (vx_orig, fp_posOne, vlen); \
+ \
+ VBOOL x_le_half = __riscv_vmfle (vx, 0x1.0p-1, vlen); \
+ VBOOL x_gt_half = __riscv_vmnot (x_le_half, vlen); \
+ VFLOAT alpha, beta; \
+ alpha = vx; \
+ beta = U_AS_F (__riscv_vxor (F_AS_U (beta), F_AS_U (beta), vlen)); \
+ alpha = __riscv_vfmerge (alpha, -0x1.0p-1, x_gt_half, vlen); \
+ beta = __riscv_vfmerge (beta, 0x1.0p-1, x_gt_half, vlen); \
+ VFLOAT s = __riscv_vfmadd (alpha, vx, beta, vlen); \
+ double two_to_63 = 0x1.0p63; \
+ VINT S = __riscv_vfcvt_x (__riscv_vfmul (s, two_to_63, vlen), vlen); \
+ VINT Ssq = __riscv_vsmul (S, S, 1, vlen); \
+ \
+ VFLOAT sqrt_s = __riscv_vfsqrt (x_gt_half, s, vlen); \
+ VFLOAT delta = __riscv_vfnmsub (x_gt_half, sqrt_s, sqrt_s, s, vlen); \
+ delta = __riscv_vfmul (x_gt_half, delta, __riscv_vfrec7 (s, vlen), vlen); \
+ delta = __riscv_vfmul (x_gt_half, delta, sqrt_s, vlen); \
+ delta = __riscv_vfmul (x_gt_half, delta, 0x1.0p-1, vlen); \
+ \
+ VINT P_EVEN = PSTEP_I ( \
+ 0x15555555555390dd, Ssq, \
+ PSTEP_I (0x5b6db6d09b27a82, Ssq, \
+ PSTEP_I (0x2dd13e6dd791f29, Ssq, \
+ PSTEP_I (0x1c6fc7fedf424bb, Ssq, \
+ PSTEP_I (0xd5bd98b325786c, \
+ -0x21470ca28feec71, Ssq, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VINT P_ODD = PSTEP_I ( \
+ 0x99999999b7428ad, Ssq, \
+ PSTEP_I (0x3e38e587fad54b2, Ssq, \
+ PSTEP_I (0x238d7e0436a1c30, Ssq, \
+ PSTEP_I (0x18ecc06c5a390e3, Ssq, \
+ PSTEP_I (0x28063c8b4b6a072, \
+ 0x41646ebd6edd35e, Ssq, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P_ODD = __riscv_vsmul (P_ODD, S, 1, vlen); \
+ VINT POLY = __riscv_vsadd (P_ODD, P_EVEN, vlen); \
+ POLY = __riscv_vsmul (POLY, S, 1, vlen); \
+ \
+ VFLOAT r = vx; \
+ r = __riscv_vmerge (r, sqrt_s, x_gt_half, vlen); \
+ delta = __riscv_vfmerge (delta, fp_posZero, x_le_half, vlen); \
+ \
+ VINT m = U_AS_I (__riscv_vrsub (__riscv_vsrl (F_AS_U (r), MAN_LEN, vlen), \
+ EXP_BIAS, vlen)); \
+ m = __riscv_vmin (m, 60, vlen); /* in case r is 0.0 */ \
+ VINT q = __riscv_vadd (m, 60, vlen); \
+ q = __riscv_vmerge (q, 60, x_gt_half, vlen); \
+ VFLOAT scale_r = U_AS_F (__riscv_vsll ( \
+ I_AS_U (__riscv_vadd (q, EXP_BIAS, vlen)), MAN_LEN, vlen)); \
+ VINT R = __riscv_vfcvt_x (__riscv_vfmul (r, scale_r, vlen), vlen); \
+ R = __riscv_vsadd ( \
+ __riscv_vfcvt_x (__riscv_vfmul (delta, scale_r, vlen), vlen), R, \
+ vlen); \
+ POLY = __riscv_vsadd (R, __riscv_vsmul (POLY, R, 1, vlen), vlen); \
+ VINT POLY_prime = __riscv_vsadd (x_gt_half, POLY, POLY, vlen); \
+ \
+ POLY_prime = __riscv_vrsub (x_gt_half, POLY_prime, PIBY2_Q60, vlen); \
+ \
+ POLY = __riscv_vmerge (POLY, POLY_prime, x_gt_half, vlen); \
+ \
+ POLY = __riscv_vsmul (POLY, ONE_OV_PI_Q63, 1, vlen); \
+ \
+ VFLOAT inv_scale_r = U_AS_F (__riscv_vsll ( \
+ I_AS_U (__riscv_vrsub (q, EXP_BIAS, vlen)), MAN_LEN, vlen)); \
+ vy = __riscv_vfmul (inv_scale_r, __riscv_vfcvt_f (POLY, vlen), vlen); \
+ \
+ vy = __riscv_vfsgnj (vy, vx_orig, vlen); \
+ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_atan.c b/sysdeps/riscv/rvd/v_d_atan.c
new file mode 100644
index 0000000000..146ebd269e
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_atan.c
@@ -0,0 +1,253 @@
+/* Double-precision vector atan function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <riscv_vector.h>
+
+#include "rvvlm.h"
+#include "v_math.h"
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_ATAND_VSET_CONFIG
+
+#define COMPILE_FOR_ATAN
+
+#define PIBY2_HI 0x1.921fb54442d18p+0
+#define PIBY2_LO 0x1.1a62633145c07p-54
+
+#define ONE_OV_PI_HI 0x1.45f306dc9c883p-2
+#define ONE_OV_PI_LO -0x1.6b01ec5417056p-56
+
+#define PIBY2_Q60 0x1921fb54442d1847
+#define PI_Q60 0x3243f6a8885a308d
+#define PIBY2_Q61 0x3243f6a8885a308d
+#define ONE_OV_PI_Q63 0x28be60db9391054a
+
+#define FUNC_NEAR_ZERO(small_x, vx, vlen) \
+ __riscv_vfnmsub ((small_x), (vx), 0x1.0p-60, (vx), (vlen))
+
+#define FUNC_EXPO_LARGE(expo_x_large, vx, vlen) \
+ __riscv_vfsgnj (__riscv_vfadd ((expo_x_large), VFMV_VF (PIBY2_HI, (vlen)), \
+ PIBY2_LO, (vlen)), \
+ (vx), (vlen))
+
+#define EXCEPTION_HANDLING_ATAN(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT expo_x = __riscv_vsrl (F_AS_U (vx), MAN_LEN, (vlen)); \
+ expo_x = __riscv_vand (expo_x, 0x7FF, (vlen)); \
+ /* filter out |x| >= 2^60, Infs and NaNs */ \
+ VBOOL expo_x_large = __riscv_vmsgeu (expo_x, EXP_BIAS + 60, (vlen)); \
+ /* filter out |x| < 2^(-30) */ \
+ VBOOL x_small = __riscv_vmsleu (expo_x, EXP_BIAS - 31, (vlen)); \
+ (special_args) = __riscv_vmor (expo_x_large, x_small, (vlen)); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ VFLOAT x_tmp = FUNC_NEAR_ZERO (x_small, (vx), (vlen)); \
+ (vy_special) = __riscv_vmerge ((vy_special), x_tmp, x_small, vlen); \
+ x_tmp = FUNC_EXPO_LARGE (expo_x_large, (vx), (vlen)); \
+ (vy_special) \
+ = __riscv_vmerge ((vy_special), x_tmp, expo_x_large, vlen); \
+ (vx) = __riscv_vfmerge ((vx), fp_posZero, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+
+// For atan, atan(x) ~=~ r + r*s*poly(s), r = x and s = r*r for |x| < 1
+// and atan(x) = pi/2 - atan(1/x) for |x| >= 1
+// Thus atan(x) = (pi/2 or 0) +/- (r + r*s*poly(s)), where r is x or 1/x, s is
+// r*r This version computes this entire expression in fixed point by
+// converting r and s into fixed point.
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, atan) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx_orig, vy, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx_orig = x; \
+ \
+ EXCEPTION_HANDLING_ATAN (vx_orig, special_args, vy_special, vlen); \
+ \
+ /* atan(-x) = -atan(x) and so we compute sign(x)*atan(|x|) to preserve \
+ symmetry. For 0 <= t < 1, atan(t) is approximated by t + t^3*poly(t^2) \
+ For 1 <= t < Inf, atan(t) = pi/2 - atan(1/t). \
+ So the generic form of core is z + z^3 poly(z^2). \
+ Because the series decays slowly and that the argument can be \
+ as big as 1 in magnitude, rounding errors accumulation is significant \
+ This version uses fixed point computation for the entire polynomial. \
+ */ \
+ \
+ VFLOAT a = __riscv_vfsgnj (vx_orig, fp_posOne, vlen); \
+ VBOOL a_ge_one = __riscv_vmfge (a, fp_posOne, vlen); \
+ VBOOL a_lt_one = __riscv_vmnot (a_ge_one, vlen); \
+ VFLOAT z = __riscv_vfrdiv (a_ge_one, a, fp_posOne, vlen); \
+ z = __riscv_vmerge (z, a, a_lt_one, vlen); \
+ /* We need 1/a to extra precision.*/ \
+ VFLOAT delta = VFMV_VF (fp_posOne, vlen); \
+ delta = __riscv_vfnmsac (a_ge_one, delta, z, a, vlen); \
+ delta = __riscv_vfmul (a_ge_one, delta, z, vlen); \
+ delta = __riscv_vfmerge (delta, fp_posZero, a_lt_one, vlen); \
+ /* z + delta is extra precise z. */ \
+ \
+ /* Now convert z to fixed point. */ \
+ /* We scale z by 61+m where 2^(-m) <= a < 2^(-m+1) \
+ noting that m >= 0 */ \
+ VUINT expo_61pm = __riscv_vsrl (F_AS_U (z), MAN_LEN, vlen); \
+ expo_61pm = __riscv_vmaxu (expo_61pm, EXP_BIAS - 60, vlen); \
+ expo_61pm \
+ = __riscv_vrsub (expo_61pm, 2 * EXP_BIAS + 61, vlen); /* BIAS+61+m */ \
+ VFLOAT scale_61pm = U_AS_F (__riscv_vsll (expo_61pm, MAN_LEN, vlen)); \
+ VINT Z = __riscv_vfcvt_x (__riscv_vfmul (z, scale_61pm, vlen), vlen); \
+ VINT Delta \
+ = __riscv_vfcvt_x (__riscv_vfmul (delta, scale_61pm, vlen), vlen); \
+ Delta = __riscv_vsadd (a_ge_one, Delta, Z, vlen); \
+ Z = __riscv_vmerge (Z, Delta, a_ge_one, vlen); \
+ \
+ VINT V = __riscv_vsmul (Z, __riscv_vsll (Z, 1, vlen), 1, vlen); \
+ /* V is z*z with scale 60 + 2m */ \
+ VINT VV = __riscv_vrsub (V, 0, vlen); \
+ \
+ VUINT m = __riscv_vsub (expo_61pm, EXP_BIAS + 61, vlen); \
+ VUINT two_m = __riscv_vsll (m, 1, vlen); \
+ VBOOL left_shift = __riscv_vmsltu (two_m, 3, vlen); \
+ VBOOL right_shift = __riscv_vmnot (left_shift, vlen); \
+ \
+ VINT I_tmp \
+ = __riscv_vsll (left_shift, V, __riscv_vrsub (two_m, 2, vlen), vlen); \
+ V = __riscv_vmerge (V, I_tmp, left_shift, vlen); \
+ I_tmp = __riscv_vsll (left_shift, VV, __riscv_vrsub (two_m, 3, vlen), \
+ vlen); \
+ VV = __riscv_vmerge (VV, I_tmp, left_shift, vlen); \
+ \
+ I_tmp \
+ = __riscv_vsra (right_shift, V, __riscv_vsub (two_m, 2, vlen), vlen); \
+ V = __riscv_vmerge (V, I_tmp, right_shift, vlen); \
+ I_tmp = __riscv_vsra (right_shift, VV, __riscv_vsub (two_m, 3, vlen), \
+ vlen); \
+ VV = __riscv_vmerge (VV, I_tmp, right_shift, vlen); \
+ \
+ /* V is z*z in scale 62, VV is -z*z in scale 63 */ \
+ VINT WW = __riscv_vsll (__riscv_vsmul (V, VV, 1, vlen), 1, vlen); \
+ /* WW is -z^4 in scale 63. */ \
+ \
+ VINT P_even = PSTEPN_I ( \
+ -0x56629d839b68685, WW, \
+ PSTEPN_I (-0x3d2984d0a6f836a, WW, \
+ PSTEPN_I (-0x1c5e8b5228f9fe4, WW, \
+ PSTEPN_I (-0x05deca0ae3a1a5d, -0x004efe42fda24d7, \
+ WW, vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P_even = PSTEPN_I ( \
+ -0x2aaaaaaaaaaa49d3, WW, \
+ PSTEPN_I (-0x12492492378aaf69, WW, \
+ PSTEPN_I (-0xba2e88c805cbaf8, WW, \
+ PSTEPN_I (-0x888722719d1260a, WW, \
+ PSTEPN_I (-0x6b96ef57ce79cc3, WW, \
+ P_even, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VINT P_odd = PSTEPN_I ( \
+ 0x04afe3b1345b489b, WW, \
+ PSTEPN_I (0x02cec355111c7439, WW, \
+ PSTEPN_I (0x00eaa9acebf3963e, WW, \
+ PSTEPN_I (0x001b053368ecfa14, 0x00006da7bb4399dd, \
+ WW, vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P_odd = PSTEPN_I ( \
+ 0x1999999999540349, WW, \
+ PSTEPN_I (0x0e38e38bf1671f42, WW, \
+ PSTEPN_I (0x09d89b293ef5f4d9, WW, \
+ PSTEPN_I (0x0786ec3df324db61, WW, \
+ PSTEPN_I (0x060b457b3c56e750, WW, \
+ P_odd, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P_odd = __riscv_vsmul (VV, P_odd, 1, vlen); \
+ VINT P = __riscv_vsub (P_even, P_odd, vlen); \
+ \
+ P = __riscv_vsmul (VV, P, 1, vlen); /* Q_63 */ \
+ P = __riscv_vsmul (Z, P, 1, vlen); /* Q_61pm */ \
+ P = __riscv_vsub (Z, P, vlen); /* Q_61pm */ \
+ \
+ VINT P_a_ge_one = __riscv_vsra (a_ge_one, P, m, vlen); \
+ P_a_ge_one = __riscv_vrsub (P_a_ge_one, PIBY2_Q61, vlen); \
+ P = __riscv_vmerge (P, P_a_ge_one, a_ge_one, vlen); \
+ \
+ /* we need to scale P by 2^(-(61+m)) or 2^(-61) */ \
+ VUINT expo_scale = __riscv_vrsub (expo_61pm, 2 * EXP_BIAS, \
+ vlen); /* EXP_BIAS - (61+m) */ \
+ expo_scale = __riscv_vmerge (expo_scale, EXP_BIAS - 61, a_ge_one, vlen); \
+ VFLOAT scale_result = U_AS_F (__riscv_vsll (expo_scale, MAN_LEN, vlen)); \
+ vy = __riscv_vfcvt_f (P, vlen); \
+ vy = __riscv_vfmul (vy, scale_result, vlen); \
+ vy = __riscv_vfsgnj (vy, vx_orig, vlen); \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_atan2.c b/sysdeps/riscv/rvd/v_d_atan2.c
new file mode 100644
index 0000000000..c93e3da82e
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_atan2.c
@@ -0,0 +1,407 @@
+/* Double-precision vector atan2 function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <riscv_vector.h>
+
+#include "rvvlm.h"
+#include "v_math.h"
+
+#define API_SIGNATURE API_SIGNATURE_21
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_ATAN2D_VSET_CONFIG
+
+#define COMPILE_FOR_ATAN2
+
+#define PI_075_HI 0x1.2d97c7f3321d2p+1
+#define PI_075_LO 0x1.a79394c9e8a0ap-54
+#define THREE_OV_4 0x1.8p-1
+
+#define PIBY4_HI 0x1.921fb54442d18p-1
+#define PIBY4_LO 0x1.1a62633145c07p-55
+#define ONE_OV_4 0x1.0p-2
+
+#define PIBY2_HI 0x1.921fb54442d18p+0
+#define PIBY2_LO 0x1.1a62633145c07p-54
+#define HALF 0x1.0p-1
+
+#define PI_HI 0x1.921fb54442d18p+1
+#define PI_LO 0x1.1a62633145c07p-53
+#define ONE 0x1.0p0
+
+#define PIBY2_Q61 0x3243f6a8885a308d
+#define PI_Q61 0x6487ed5110b4611a
+#define HALF_Q61 0x1000000000000000
+#define ONE_Q61 0x2000000000000000
+
+#define ONE_OV_PI_HI 0x1.45f306dc9c883p-2
+#define ONE_OV_PI_LO -0x1.6b01ec5417056p-56
+
+#define PIBY2_Q60 0x1921fb54442d1847
+#define PI_Q60 0x3243f6a8885a308d
+#define PIBY2_Q61 0x3243f6a8885a308d
+#define ONE_OV_PI_Q63 0x28be60db9391054a
+
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D2 (lmul, simdlen, atan2) (VFLOAT x, VFLOAT y) \
+ { \
+ size_t vlen; \
+ VFLOAT vy, vx, vw, vw_special; \
+ VUINT vclass_y, vclass_x; \
+ UINT stencil, class_of_interest; \
+ VBOOL special_y, special_x, special_args, id_mask; \
+ UINT nb_special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ stencil = class_NaN | class_Inf | class_Zero; \
+ vlen = VSET (simdlen); \
+ vy = y; \
+ vx = x; \
+ \
+ /* Exception handling is more involved than other functions */ \
+ VFLOAT result_tmp; \
+ \
+ vclass_y = __riscv_vfclass (vy, vlen); \
+ IDENTIFY (vclass_y, stencil, special_y, vlen); \
+ vclass_x = __riscv_vfclass (vx, vlen); \
+ IDENTIFY (vclass_x, stencil, special_x, vlen); \
+ special_args = __riscv_vmor (special_y, special_x, vlen); \
+ nb_special_args = __riscv_vcpop (special_args, vlen); \
+ \
+ if (nb_special_args > 0) \
+ { \
+ /* y or x is one of {NaN, +-Inf, +-0} */ \
+ class_of_interest = class_NaN; \
+ IDENTIFY (vclass_y, class_of_interest, special_y, vlen); \
+ IDENTIFY (vclass_x, class_of_interest, special_x, vlen); \
+ VBOOL y_notNaN = __riscv_vmnot (special_y, vlen); \
+ id_mask = __riscv_vmor (special_y, special_x, vlen); \
+ result_tmp = __riscv_vfadd (id_mask, vy, vx, vlen); \
+ vw_special = __riscv_vmerge (vw_special, result_tmp, id_mask, vlen); \
+ \
+ VBOOL x_0, x_neg0, x_pos0; \
+ class_of_interest = class_posZero; \
+ IDENTIFY (vclass_x, class_of_interest, x_pos0, vlen); \
+ class_of_interest = class_negZero; \
+ IDENTIFY (vclass_x, class_of_interest, x_neg0, vlen); \
+ x_0 = __riscv_vmor (x_pos0, x_neg0, vlen); \
+ \
+ VBOOL y_0, y_not0; \
+ class_of_interest = class_Zero; \
+ IDENTIFY (vclass_y, class_of_interest, y_0, vlen); \
+ y_not0 = __riscv_vmnot (y_0, vlen); \
+ y_not0 = __riscv_vmand (y_not0, y_notNaN, vlen); \
+ id_mask = __riscv_vmand (x_0, y_not0, vlen); \
+ result_tmp = VFMV_VF (PIBY2_HI, vlen); \
+ result_tmp = __riscv_vfsgnj (result_tmp, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, result_tmp, id_mask, vlen); \
+ \
+ id_mask = __riscv_vmand (y_0, x_pos0, vlen); \
+ vw_special = __riscv_vmerge (vw_special, vy, id_mask, vlen); \
+ \
+ id_mask = __riscv_vmand (y_0, x_neg0, vlen); \
+ result_tmp = VFMV_VF (PI_HI, vlen); \
+ result_tmp = __riscv_vfsgnj (result_tmp, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, result_tmp, id_mask, vlen); \
+ \
+ VBOOL x_posInf, x_negInf, y_Inf, y_finite; \
+ class_of_interest = class_Inf; \
+ IDENTIFY (vclass_y, class_of_interest, y_Inf, vlen); \
+ y_finite = __riscv_vmandn (y_notNaN, y_Inf, vlen); \
+ x_posInf = __riscv_vmfeq (vx, fp_posInf, vlen); \
+ x_negInf = __riscv_vmfeq (vx, fp_negInf, vlen); \
+ \
+ id_mask = __riscv_vmand (x_posInf, y_Inf, vlen); \
+ result_tmp = VFMV_VF (PIBY4_HI, vlen); \
+ result_tmp = __riscv_vfsgnj (result_tmp, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, result_tmp, id_mask, vlen); \
+ \
+ id_mask = __riscv_vmand (x_posInf, y_finite, vlen); \
+ result_tmp = VFMV_VF (fp_posZero, vlen); \
+ result_tmp = __riscv_vfsgnj (result_tmp, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, result_tmp, id_mask, vlen); \
+ \
+ id_mask = __riscv_vmand (x_negInf, y_Inf, vlen); \
+ result_tmp = VFMV_VF (PI_075_HI, vlen); \
+ result_tmp = __riscv_vfsgnj (result_tmp, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, result_tmp, id_mask, vlen); \
+ \
+ id_mask = __riscv_vmand (x_negInf, y_finite, vlen); \
+ result_tmp = VFMV_VF (PI_HI, vlen); \
+ result_tmp = __riscv_vfsgnj (result_tmp, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, result_tmp, id_mask, vlen); \
+ \
+ class_of_interest = class_finite_pos; \
+ VBOOL x_finite_pos; \
+ IDENTIFY (vclass_x, class_of_interest, x_finite_pos, vlen); \
+ id_mask = __riscv_vmand (x_finite_pos, y_0, vlen); \
+ vw_special = __riscv_vmerge (vw_special, vy, id_mask, vlen); \
+ \
+ class_of_interest = class_finite_neg; \
+ VBOOL x_finite_neg; \
+ IDENTIFY (vclass_x, class_of_interest, x_finite_neg, vlen); \
+ id_mask = __riscv_vmand (x_finite_neg, y_0, vlen); \
+ result_tmp = VFMV_VF (PI_HI, vlen); \
+ result_tmp = __riscv_vfsgnj (result_tmp, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, result_tmp, id_mask, vlen); \
+ \
+ id_mask = __riscv_vmor (x_finite_pos, x_finite_neg, vlen); \
+ id_mask = __riscv_vmand (id_mask, y_Inf, vlen); \
+ result_tmp = VFMV_VF (PIBY2_HI, vlen); \
+ result_tmp = __riscv_vfsgnj (result_tmp, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, result_tmp, id_mask, vlen); \
+ \
+ vy = __riscv_vfmerge (vy, 0x1.0p-1, special_args, vlen); \
+ vx = __riscv_vfmerge (vx, 0x1.0p0, special_args, vlen); \
+ } \
+ \
+ /* Other than the obvious exceptional cases that have been handled, \
+ // we filter out large differences in the exponents of x and y \
+ // to avoid spurious underflow being raised */ \
+ VUINT expo_y = __riscv_vand (__riscv_vsrl (F_AS_U (vy), MAN_LEN, vlen), \
+ 0x7FF, vlen); \
+ VUINT expo_x = __riscv_vand (__riscv_vsrl (F_AS_U (vx), MAN_LEN, vlen), \
+ 0x7FF, vlen); \
+ VINT exp_diff = __riscv_vsub (U_AS_I (expo_y), U_AS_I (expo_x), vlen); \
+ VBOOL exp_diff_large = __riscv_vmsge (exp_diff, 60, vlen); \
+ exp_diff_large = __riscv_vmor ( \
+ exp_diff_large, __riscv_vmsle (exp_diff, -60, vlen), vlen); \
+ \
+ nb_special_args = __riscv_vcpop (exp_diff_large, vlen); \
+ special_args = __riscv_vmor (special_args, exp_diff_large, vlen); \
+ \
+ if (nb_special_args > 0) \
+ { \
+ VBOOL swap_yx = __riscv_vmsgtu (expo_y, expo_x, vlen); \
+ VBOOL x_neg = __riscv_vmslt (F_AS_I (vx), 0, vlen); \
+ \
+ VBOOL no_divide = __riscv_vmor (swap_yx, x_neg, vlen); \
+ no_divide = __riscv_vmand (no_divide, exp_diff_large, vlen); \
+ \
+ VBOOL divide = __riscv_vmnot (swap_yx, vlen); \
+ divide = __riscv_vmandn (divide, x_neg, vlen); \
+ divide = __riscv_vmand (divide, exp_diff_large, vlen); \
+ \
+ VFLOAT abs_y = __riscv_vfsgnj (vy, fp_posOne, vlen); \
+ VFLOAT tmp1 = __riscv_vfdiv (divide, abs_y, vx, vlen); \
+ tmp1 = __riscv_vfmerge (tmp1, 0x1.0p-60, no_divide, vlen); \
+ tmp1 = __riscv_vfsgnj (tmp1, vx, vlen); \
+ \
+ VFLOAT tmp2 = __riscv_vfsgnj (divide, tmp1, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, tmp2, divide, vlen); \
+ \
+ VBOOL use_piby2 = __riscv_vmand (swap_yx, exp_diff_large, vlen); \
+ tmp2 = __riscv_vfrsub (use_piby2, tmp1, PIBY2_HI, vlen); \
+ tmp2 = __riscv_vfsgnj (use_piby2, tmp2, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, tmp2, use_piby2, vlen); \
+ \
+ VBOOL use_pi = __riscv_vmandn (x_neg, swap_yx, vlen); \
+ use_pi = __riscv_vmand (use_pi, exp_diff_large, vlen); \
+ tmp2 = __riscv_vfadd (use_pi, tmp1, PI_HI, vlen); \
+ tmp2 = __riscv_vfsgnj (use_pi, tmp2, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, tmp2, use_pi, vlen); \
+ vy = __riscv_vfmerge (vy, fp_posZero, special_args, vlen); \
+ vx = __riscv_vfmerge (vx, 0x1.0p0, special_args, vlen); \
+ } \
+ \
+ /* atan2(y, x) = sgn(y) * atan2(|y|, x) \
+ // Let z = min(|y|, |x|) / max(|y|, |x|) \
+ // If |y| >= |x|, then atan2(|y|, x) = pi/2 - sgn(x)*atan(z) \
+ // If |y| < |x|, then atan2(|y|, x) = sgn(x)*atan(z) if x > 0; \
+ // otherwise it is pi + sgn(x)*atan(z). \
+ // And atan2pi(y, x) = atan2(y, x) / pi */ \
+ \
+ VFLOAT abs_y = __riscv_vfsgnj (vy, fp_posOne, vlen); \
+ VFLOAT abs_x = __riscv_vfsgnj (vx, fp_posOne, vlen); \
+ VBOOL swap_yx = __riscv_vmfge (abs_y, abs_x, vlen); \
+ VFLOAT numer = abs_y; \
+ VFLOAT denom = abs_x; \
+ numer = __riscv_vmerge (numer, abs_x, swap_yx, vlen); \
+ denom = __riscv_vmerge (denom, abs_y, swap_yx, vlen); \
+ numer = __riscv_vfsgnj (numer, vx, vlen); \
+ \
+ /* Here |numer| <= denom and the exponent difference is within 60 \
+ // We normalize them so that 1/denom will not overflow */ \
+ VUINT exp_normalize = __riscv_vsrl (F_AS_U (denom), 52, vlen); \
+ exp_normalize = __riscv_vmaxu (exp_normalize, 10, vlen); \
+ exp_normalize = __riscv_vminu (exp_normalize, 2036, vlen); \
+ exp_normalize = __riscv_vrsub (exp_normalize, 2046, vlen); \
+ VFLOAT scale_normalize = U_AS_F (__riscv_vsll (exp_normalize, 52, vlen)); \
+ numer = __riscv_vfmul (numer, scale_normalize, vlen); \
+ denom = __riscv_vfmul (denom, scale_normalize, vlen); \
+ \
+ VFLOAT z = __riscv_vfdiv (numer, denom, vlen); \
+ VFLOAT delta = numer; \
+ delta = __riscv_vfnmsac (delta, z, denom, vlen); \
+ delta = __riscv_vfmul (delta, __riscv_vfrec7 (numer, vlen), vlen); \
+ delta = __riscv_vfmul (delta, z, vlen); \
+ /* z + delta is extra precise z. */ \
+ \
+ /* Now convert z to fixed point. \
+ // We scale z by 61+m where 2^(-m) <= a < 2^(-m+1) \
+ // noting that m >= 0 */ \
+ VUINT expo_61pm = __riscv_vsrl (F_AS_U (z), MAN_LEN, vlen); \
+ expo_61pm = __riscv_vand (expo_61pm, 0x7FF, vlen); \
+ expo_61pm = __riscv_vmaxu (expo_61pm, EXP_BIAS - 60, vlen); \
+ expo_61pm \
+ = __riscv_vrsub (expo_61pm, 2 * EXP_BIAS + 61, vlen); /* BIAS+61+m */ \
+ \
+ VFLOAT scale_61pm = U_AS_F (__riscv_vsll (expo_61pm, MAN_LEN, vlen)); \
+ VINT Z = __riscv_vfcvt_x (__riscv_vfmul (z, scale_61pm, vlen), vlen); \
+ VINT Delta \
+ = __riscv_vfcvt_x (__riscv_vfmul (delta, scale_61pm, vlen), vlen); \
+ Z = __riscv_vsadd (Z, Delta, vlen); \
+ \
+ VINT V = __riscv_vsmul (Z, __riscv_vsll (Z, 1, vlen), 1, vlen); \
+ /* V is z*z with scale 60 + 2m */ \
+ VINT VV = __riscv_vrsub (V, 0, vlen); \
+ \
+ VUINT m = __riscv_vsub (expo_61pm, EXP_BIAS + 61, vlen); \
+ VUINT two_m = __riscv_vsll (m, 1, vlen); \
+ VBOOL left_shift = __riscv_vmsltu (two_m, 3, vlen); \
+ VBOOL right_shift = __riscv_vmnot (left_shift, vlen); \
+ \
+ VINT I_tmp \
+ = __riscv_vsll (left_shift, V, __riscv_vrsub (two_m, 2, vlen), vlen); \
+ V = __riscv_vmerge (V, I_tmp, left_shift, vlen); \
+ I_tmp = __riscv_vsll (left_shift, VV, __riscv_vrsub (two_m, 3, vlen), \
+ vlen); \
+ VV = __riscv_vmerge (VV, I_tmp, left_shift, vlen); \
+ \
+ I_tmp \
+ = __riscv_vsra (right_shift, V, __riscv_vsub (two_m, 2, vlen), vlen); \
+ V = __riscv_vmerge (V, I_tmp, right_shift, vlen); \
+ I_tmp = __riscv_vsra (right_shift, VV, __riscv_vsub (two_m, 3, vlen), \
+ vlen); \
+ VV = __riscv_vmerge (VV, I_tmp, right_shift, vlen); \
+ \
+ /* V is z*z in scale 62, VV is -z*z in scale 63 */ \
+ VINT WW = __riscv_vsll (__riscv_vsmul (V, VV, 1, vlen), 1, vlen); \
+ /* WW is -z^4 in scale 63. */ \
+ \
+ VINT P_even = PSTEPN_I ( \
+ -0x56629d839b68685, WW, \
+ PSTEPN_I (-0x3d2984d0a6f836a, WW, \
+ PSTEPN_I (-0x1c5e8b5228f9fe4, WW, \
+ PSTEPN_I (-0x05deca0ae3a1a5d, -0x004efe42fda24d7, \
+ WW, vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P_even = PSTEPN_I ( \
+ -0x2aaaaaaaaaaa49d3, WW, \
+ PSTEPN_I (-0x12492492378aaf69, WW, \
+ PSTEPN_I (-0xba2e88c805cbaf8, WW, \
+ PSTEPN_I (-0x888722719d1260a, WW, \
+ PSTEPN_I (-0x6b96ef57ce79cc3, WW, \
+ P_even, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VINT P_odd = PSTEPN_I ( \
+ 0x04afe3b1345b489b, WW, \
+ PSTEPN_I (0x02cec355111c7439, WW, \
+ PSTEPN_I (0x00eaa9acebf3963e, WW, \
+ PSTEPN_I (0x001b053368ecfa14, 0x00006da7bb4399dd, \
+ WW, vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P_odd = PSTEPN_I ( \
+ 0x1999999999540349, WW, \
+ PSTEPN_I (0x0e38e38bf1671f42, WW, \
+ PSTEPN_I (0x09d89b293ef5f4d9, WW, \
+ PSTEPN_I (0x0786ec3df324db61, WW, \
+ PSTEPN_I (0x060b457b3c56e750, WW, \
+ P_odd, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P_odd = __riscv_vsmul (VV, P_odd, 1, vlen); \
+ VINT P = __riscv_vsub (P_even, P_odd, vlen); \
+ \
+ P = __riscv_vsmul (VV, P, 1, vlen); /* Q_63 */ \
+ P = __riscv_vsmul (Z, P, 1, vlen); /* Q_61pm */ \
+ P = __riscv_vsub (Z, P, vlen); /* Q_61pm */ \
+ \
+ VBOOL xneg = __riscv_vmslt (F_AS_I (vx), 0, vlen); \
+ VBOOL xneg_or_swap = __riscv_vmor (xneg, swap_yx, vlen); \
+ VBOOL xneg_and_noswap = __riscv_vmandn (xneg, swap_yx, vlen); \
+ \
+ VINT P_tmp = __riscv_vsra (xneg_or_swap, P, m, vlen); \
+ P = __riscv_vmerge (P, P_tmp, xneg_or_swap, vlen); \
+ \
+ P_tmp = __riscv_vrsub (swap_yx, P, PIBY2_Q61, vlen); \
+ P = __riscv_vmerge (P, P_tmp, swap_yx, vlen); \
+ \
+ P_tmp = __riscv_vadd (xneg_and_noswap, P, PI_Q61, vlen); \
+ P = __riscv_vmerge (P, P_tmp, xneg_and_noswap, vlen); \
+ \
+ /* we need to scale P by 2^(-(61+m)) or 2^(-61) */ \
+ VUINT expo_scale = __riscv_vrsub (expo_61pm, 2 * EXP_BIAS, \
+ vlen); /* EXP_BIAS - (61+m) */ \
+ expo_scale \
+ = __riscv_vmerge (expo_scale, EXP_BIAS - 61, xneg_or_swap, vlen); \
+ VFLOAT scale_result = U_AS_F (__riscv_vsll (expo_scale, MAN_LEN, vlen)); \
+ vw = __riscv_vfcvt_f (P, vlen); \
+ vw = __riscv_vfmul (vw, scale_result, vlen); \
+ vw = __riscv_vfsgnj (vw, vy, vlen); \
+ vw = __riscv_vmerge (vw, vw_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vw; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_atan2pi.c b/sysdeps/riscv/rvd/v_d_atan2pi.c
new file mode 100644
index 0000000000..825affd7b9
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_atan2pi.c
@@ -0,0 +1,396 @@
+/* Double-precision vector atan2pi function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <riscv_vector.h>
+
+#include "rvvlm.h"
+#include "v_math.h"
+
+#define API_SIGNATURE API_SIGNATURE_21
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_ATAN2PID_VSET_CONFIG
+
+#define COMPILE_FOR_ATAN2PI
+
+#define PI_075_HI 0x1.2d97c7f3321d2p+1
+#define PI_075_LO 0x1.a79394c9e8a0ap-54
+#define THREE_OV_4 0x1.8p-1
+
+#define PIBY4_HI 0x1.921fb54442d18p-1
+#define PIBY4_LO 0x1.1a62633145c07p-55
+#define ONE_OV_4 0x1.0p-2
+
+#define PIBY2_HI 0x1.921fb54442d18p+0
+#define PIBY2_LO 0x1.1a62633145c07p-54
+#define HALF 0x1.0p-1
+
+#define PI_HI 0x1.921fb54442d18p+1
+#define PI_LO 0x1.1a62633145c07p-53
+#define ONE 0x1.0p0
+
+#define PIBY2_Q61 0x3243f6a8885a308d
+#define PI_Q61 0x6487ed5110b4611a
+#define HALF_Q61 0x1000000000000000
+#define ONE_Q61 0x2000000000000000
+
+#define ONE_OV_PI_HI 0x1.45f306dc9c883p-2
+#define ONE_OV_PI_LO -0x1.6b01ec5417056p-56
+
+#define PIBY2_Q60 0x1921fb54442d1847
+#define PI_Q60 0x3243f6a8885a308d
+#define PIBY2_Q61 0x3243f6a8885a308d
+#define ONE_OV_PI_Q63 0x28be60db9391054a
+
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D2 (lmul, simdlen, atan2pi) (VFLOAT x, VFLOAT y) \
+ { \
+ size_t vlen; \
+ VFLOAT vy, vx, vw, vw_special; \
+ VUINT vclass_y, vclass_x; \
+ UINT stencil, class_of_interest; \
+ VBOOL special_y, special_x, special_args, id_mask; \
+ UINT nb_special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ stencil = class_NaN | class_Inf | class_Zero; \
+ vlen = VSET (simdlen); \
+ vy = y; \
+ vx = x; \
+ \
+ VFLOAT result_tmp; \
+ \
+ vclass_y = __riscv_vfclass (vy, vlen); \
+ IDENTIFY (vclass_y, stencil, special_y, vlen); \
+ vclass_x = __riscv_vfclass (vx, vlen); \
+ IDENTIFY (vclass_x, stencil, special_x, vlen); \
+ special_args = __riscv_vmor (special_y, special_x, vlen); \
+ nb_special_args = __riscv_vcpop (special_args, vlen); \
+ \
+ if (nb_special_args > 0) \
+ { \
+ class_of_interest = class_NaN; \
+ IDENTIFY (vclass_y, class_of_interest, special_y, vlen); \
+ IDENTIFY (vclass_x, class_of_interest, special_x, vlen); \
+ VBOOL y_notNaN = __riscv_vmnot (special_y, vlen); \
+ id_mask = __riscv_vmor (special_y, special_x, vlen); \
+ result_tmp = __riscv_vfadd (id_mask, vy, vx, vlen); \
+ vw_special = __riscv_vmerge (vw_special, result_tmp, id_mask, vlen); \
+ \
+ VBOOL x_0, x_neg0, x_pos0; \
+ class_of_interest = class_posZero; \
+ IDENTIFY (vclass_x, class_of_interest, x_pos0, vlen); \
+ class_of_interest = class_negZero; \
+ IDENTIFY (vclass_x, class_of_interest, x_neg0, vlen); \
+ x_0 = __riscv_vmor (x_pos0, x_neg0, vlen); \
+ \
+ VBOOL y_0, y_not0; \
+ class_of_interest = class_Zero; \
+ IDENTIFY (vclass_y, class_of_interest, y_0, vlen); \
+ y_not0 = __riscv_vmnot (y_0, vlen); \
+ y_not0 = __riscv_vmand (y_not0, y_notNaN, vlen); \
+ id_mask = __riscv_vmand (x_0, y_not0, vlen); \
+ result_tmp = VFMV_VF (HALF, vlen); \
+ result_tmp = __riscv_vfsgnj (result_tmp, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, result_tmp, id_mask, vlen); \
+ \
+ id_mask = __riscv_vmand (y_0, x_pos0, vlen); \
+ vw_special = __riscv_vmerge (vw_special, vy, id_mask, vlen); \
+ \
+ id_mask = __riscv_vmand (y_0, x_neg0, vlen); \
+ result_tmp = VFMV_VF (ONE, vlen); \
+ result_tmp = __riscv_vfsgnj (result_tmp, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, result_tmp, id_mask, vlen); \
+ \
+ VBOOL x_posInf, x_negInf, y_Inf, y_finite; \
+ class_of_interest = class_Inf; \
+ IDENTIFY (vclass_y, class_of_interest, y_Inf, vlen); \
+ y_finite = __riscv_vmandn (y_notNaN, y_Inf, vlen); \
+ x_posInf = __riscv_vmfeq (vx, fp_posInf, vlen); \
+ x_negInf = __riscv_vmfeq (vx, fp_negInf, vlen); \
+ \
+ id_mask = __riscv_vmand (x_posInf, y_Inf, vlen); \
+ result_tmp = VFMV_VF (ONE_OV_4, vlen); \
+ result_tmp = __riscv_vfsgnj (result_tmp, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, result_tmp, id_mask, vlen); \
+ \
+ id_mask = __riscv_vmand (x_posInf, y_finite, vlen); \
+ result_tmp = VFMV_VF (fp_posZero, vlen); \
+ result_tmp = __riscv_vfsgnj (result_tmp, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, result_tmp, id_mask, vlen); \
+ \
+ id_mask = __riscv_vmand (x_negInf, y_Inf, vlen); \
+ result_tmp = VFMV_VF (THREE_OV_4, vlen); \
+ result_tmp = __riscv_vfsgnj (result_tmp, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, result_tmp, id_mask, vlen); \
+ \
+ id_mask = __riscv_vmand (x_negInf, y_finite, vlen); \
+ result_tmp = VFMV_VF (ONE, vlen); \
+ result_tmp = __riscv_vfsgnj (result_tmp, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, result_tmp, id_mask, vlen); \
+ \
+ class_of_interest = class_finite_pos; \
+ VBOOL x_finite_pos; \
+ IDENTIFY (vclass_x, class_of_interest, x_finite_pos, vlen); \
+ id_mask = __riscv_vmand (x_finite_pos, y_0, vlen); \
+ vw_special = __riscv_vmerge (vw_special, vy, id_mask, vlen); \
+ \
+ class_of_interest = class_finite_neg; \
+ VBOOL x_finite_neg; \
+ IDENTIFY (vclass_x, class_of_interest, x_finite_neg, vlen); \
+ id_mask = __riscv_vmand (x_finite_neg, y_0, vlen); \
+ result_tmp = VFMV_VF (ONE, vlen); \
+ result_tmp = __riscv_vfsgnj (result_tmp, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, result_tmp, id_mask, vlen); \
+ \
+ id_mask = __riscv_vmor (x_finite_pos, x_finite_neg, vlen); \
+ id_mask = __riscv_vmand (id_mask, y_Inf, vlen); \
+ result_tmp = VFMV_VF (HALF, vlen); \
+ result_tmp = __riscv_vfsgnj (result_tmp, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, result_tmp, id_mask, vlen); \
+ \
+ vy = __riscv_vfmerge (vy, 0x1.0p-1, special_args, vlen); \
+ vx = __riscv_vfmerge (vx, 0x1.0p0, special_args, vlen); \
+ } \
+ \
+ VUINT expo_y = __riscv_vand (__riscv_vsrl (F_AS_U (vy), MAN_LEN, vlen), \
+ 0x7FF, vlen); \
+ VUINT expo_x = __riscv_vand (__riscv_vsrl (F_AS_U (vx), MAN_LEN, vlen), \
+ 0x7FF, vlen); \
+ VINT exp_diff = __riscv_vsub (U_AS_I (expo_y), U_AS_I (expo_x), vlen); \
+ VBOOL exp_diff_large = __riscv_vmsge (exp_diff, 60, vlen); \
+ exp_diff_large = __riscv_vmor ( \
+ exp_diff_large, __riscv_vmsle (exp_diff, -60, vlen), vlen); \
+ \
+ nb_special_args = __riscv_vcpop (exp_diff_large, vlen); \
+ special_args = __riscv_vmor (special_args, exp_diff_large, vlen); \
+ \
+ if (nb_special_args > 0) \
+ { \
+ VBOOL swap_yx = __riscv_vmsgtu (expo_y, expo_x, vlen); \
+ VBOOL x_neg = __riscv_vmslt (F_AS_I (vx), 0, vlen); \
+ \
+ VBOOL no_divide = __riscv_vmor (swap_yx, x_neg, vlen); \
+ no_divide = __riscv_vmand (no_divide, exp_diff_large, vlen); \
+ \
+ VBOOL divide = __riscv_vmnot (swap_yx, vlen); \
+ divide = __riscv_vmandn (divide, x_neg, vlen); \
+ divide = __riscv_vmand (divide, exp_diff_large, vlen); \
+ \
+ VFLOAT tmp1 = __riscv_vfmul (divide, vx, 0x1.0p-55, vlen); \
+ VFLOAT tmp2 = __riscv_vfmul (divide, tmp1, PI_HI, vlen); \
+ VFLOAT tmp3 = __riscv_vfmsac (divide, tmp2, PI_HI, tmp1, vlen); \
+ tmp3 = __riscv_vfmacc (divide, tmp3, PI_LO, tmp1, vlen); \
+ VFLOAT R = __riscv_vfrdiv (divide, tmp2, fp_posOne, vlen); \
+ VFLOAT r = VFMV_VF (fp_posOne, vlen); \
+ r = __riscv_vfnmsac (divide, r, R, tmp2, vlen); \
+ r = __riscv_vfnmsac (divide, r, R, tmp3, vlen); \
+ r = __riscv_vfmul (divide, r, R, vlen); \
+ tmp1 = __riscv_vfmul (divide, vy, 0x1.0p55, vlen); \
+ tmp2 = __riscv_vfmul (divide, tmp1, r, vlen); \
+ tmp1 = __riscv_vfmadd (divide, tmp1, R, tmp2, vlen); \
+ tmp1 = __riscv_vfmul (divide, tmp1, 0x1.0p-110, vlen); \
+ tmp1 = __riscv_vfmerge (tmp1, 0x1.0p-60, no_divide, vlen); \
+ tmp1 = __riscv_vfsgnj (tmp1, vx, vlen); \
+ \
+ tmp2 = __riscv_vfsgnj (divide, tmp1, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, tmp2, divide, vlen); \
+ \
+ VBOOL use_half = __riscv_vmand (swap_yx, exp_diff_large, vlen); \
+ tmp2 = __riscv_vfrsub (use_half, tmp1, fp_posHalf, vlen); \
+ tmp2 = __riscv_vfsgnj (use_half, tmp2, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, tmp2, use_half, vlen); \
+ \
+ VBOOL use_one = __riscv_vmandn (x_neg, swap_yx, vlen); \
+ use_one = __riscv_vmand (use_one, exp_diff_large, vlen); \
+ tmp2 = __riscv_vfadd (use_one, tmp1, fp_posOne, vlen); \
+ tmp2 = __riscv_vfsgnj (use_one, tmp2, vy, vlen); \
+ vw_special = __riscv_vmerge (vw_special, tmp2, use_one, vlen); \
+ vy = __riscv_vfmerge (vy, fp_posZero, special_args, vlen); \
+ vx = __riscv_vfmerge (vx, 0x1.0p0, special_args, vlen); \
+ } \
+ \
+ VFLOAT abs_y = __riscv_vfsgnj (vy, fp_posOne, vlen); \
+ VFLOAT abs_x = __riscv_vfsgnj (vx, fp_posOne, vlen); \
+ VBOOL swap_yx = __riscv_vmfge (abs_y, abs_x, vlen); \
+ VFLOAT numer = abs_y; \
+ VFLOAT denom = abs_x; \
+ numer = __riscv_vmerge (numer, abs_x, swap_yx, vlen); \
+ denom = __riscv_vmerge (denom, abs_y, swap_yx, vlen); \
+ numer = __riscv_vfsgnj (numer, vx, vlen); \
+ \
+ VUINT exp_normalize = __riscv_vsrl (F_AS_U (denom), 52, vlen); \
+ exp_normalize = __riscv_vmaxu (exp_normalize, 10, vlen); \
+ exp_normalize = __riscv_vminu (exp_normalize, 2036, vlen); \
+ exp_normalize = __riscv_vrsub (exp_normalize, 2046, vlen); \
+ VFLOAT scale_normalize = U_AS_F (__riscv_vsll (exp_normalize, 52, vlen)); \
+ numer = __riscv_vfmul (numer, scale_normalize, vlen); \
+ denom = __riscv_vfmul (denom, scale_normalize, vlen); \
+ \
+ VFLOAT z = __riscv_vfdiv (numer, denom, vlen); \
+ VFLOAT delta = numer; \
+ delta = __riscv_vfnmsac (delta, z, denom, vlen); \
+ delta = __riscv_vfmul (delta, __riscv_vfrec7 (numer, vlen), vlen); \
+ delta = __riscv_vfmul (delta, z, vlen); \
+ VUINT expo_61pm = __riscv_vsrl (F_AS_U (z), MAN_LEN, vlen); \
+ expo_61pm = __riscv_vand (expo_61pm, 0x7FF, vlen); \
+ expo_61pm = __riscv_vmaxu (expo_61pm, EXP_BIAS - 60, vlen); \
+ expo_61pm \
+ = __riscv_vrsub (expo_61pm, 2 * EXP_BIAS + 61, vlen); /* BIAS+61+m */ \
+ \
+ VFLOAT scale_61pm = U_AS_F (__riscv_vsll (expo_61pm, MAN_LEN, vlen)); \
+ VINT Z = __riscv_vfcvt_x (__riscv_vfmul (z, scale_61pm, vlen), vlen); \
+ VINT Delta \
+ = __riscv_vfcvt_x (__riscv_vfmul (delta, scale_61pm, vlen), vlen); \
+ Z = __riscv_vsadd (Z, Delta, vlen); \
+ \
+ VINT V = __riscv_vsmul (Z, __riscv_vsll (Z, 1, vlen), 1, vlen); \
+ VINT VV = __riscv_vrsub (V, 0, vlen); \
+ \
+ VUINT m = __riscv_vsub (expo_61pm, EXP_BIAS + 61, vlen); \
+ VUINT two_m = __riscv_vsll (m, 1, vlen); \
+ VBOOL left_shift = __riscv_vmsltu (two_m, 3, vlen); \
+ VBOOL right_shift = __riscv_vmnot (left_shift, vlen); \
+ \
+ VINT I_tmp \
+ = __riscv_vsll (left_shift, V, __riscv_vrsub (two_m, 2, vlen), vlen); \
+ V = __riscv_vmerge (V, I_tmp, left_shift, vlen); \
+ I_tmp = __riscv_vsll (left_shift, VV, __riscv_vrsub (two_m, 3, vlen), \
+ vlen); \
+ VV = __riscv_vmerge (VV, I_tmp, left_shift, vlen); \
+ \
+ I_tmp \
+ = __riscv_vsra (right_shift, V, __riscv_vsub (two_m, 2, vlen), vlen); \
+ V = __riscv_vmerge (V, I_tmp, right_shift, vlen); \
+ I_tmp = __riscv_vsra (right_shift, VV, __riscv_vsub (two_m, 3, vlen), \
+ vlen); \
+ VV = __riscv_vmerge (VV, I_tmp, right_shift, vlen); \
+ \
+ VINT WW = __riscv_vsll (__riscv_vsmul (V, VV, 1, vlen), 1, vlen); \
+ \
+ VINT P_even = PSTEPN_I ( \
+ -0x56629d839b68685, WW, \
+ PSTEPN_I (-0x3d2984d0a6f836a, WW, \
+ PSTEPN_I (-0x1c5e8b5228f9fe4, WW, \
+ PSTEPN_I (-0x05deca0ae3a1a5d, -0x004efe42fda24d7, \
+ WW, vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P_even = PSTEPN_I ( \
+ -0x2aaaaaaaaaaa49d3, WW, \
+ PSTEPN_I (-0x12492492378aaf69, WW, \
+ PSTEPN_I (-0xba2e88c805cbaf8, WW, \
+ PSTEPN_I (-0x888722719d1260a, WW, \
+ PSTEPN_I (-0x6b96ef57ce79cc3, WW, \
+ P_even, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VINT P_odd = PSTEPN_I ( \
+ 0x04afe3b1345b489b, WW, \
+ PSTEPN_I (0x02cec355111c7439, WW, \
+ PSTEPN_I (0x00eaa9acebf3963e, WW, \
+ PSTEPN_I (0x001b053368ecfa14, 0x00006da7bb4399dd, \
+ WW, vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P_odd = PSTEPN_I ( \
+ 0x1999999999540349, WW, \
+ PSTEPN_I (0x0e38e38bf1671f42, WW, \
+ PSTEPN_I (0x09d89b293ef5f4d9, WW, \
+ PSTEPN_I (0x0786ec3df324db61, WW, \
+ PSTEPN_I (0x060b457b3c56e750, WW, \
+ P_odd, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P_odd = __riscv_vsmul (VV, P_odd, 1, vlen); \
+ VINT P = __riscv_vsub (P_even, P_odd, vlen); \
+ \
+ P = __riscv_vsmul (VV, P, 1, vlen); \
+ P = __riscv_vsmul (Z, P, 1, vlen); \
+ P = __riscv_vsub (Z, P, vlen); \
+ \
+ VBOOL xneg = __riscv_vmslt (F_AS_I (vx), 0, vlen); \
+ VBOOL xneg_or_swap = __riscv_vmor (xneg, swap_yx, vlen); \
+ VBOOL xneg_and_noswap = __riscv_vmandn (xneg, swap_yx, vlen); \
+ \
+ VINT P_tmp = __riscv_vsra (xneg_or_swap, P, m, vlen); \
+ P = __riscv_vmerge (P, P_tmp, xneg_or_swap, vlen); \
+ \
+ P = __riscv_vsmul (P, ONE_OV_PI_Q63, 1, vlen); \
+ \
+ P_tmp = __riscv_vrsub (swap_yx, P, HALF_Q61, vlen); \
+ P = __riscv_vmerge (P, P_tmp, swap_yx, vlen); \
+ \
+ P_tmp = __riscv_vadd (xneg_and_noswap, P, ONE_Q61, vlen); \
+ P = __riscv_vmerge (P, P_tmp, xneg_and_noswap, vlen); \
+ \
+ VUINT expo_scale = __riscv_vrsub (expo_61pm, 2 * EXP_BIAS, \
+ vlen); /* EXP_BIAS - (61+m) */ \
+ expo_scale \
+ = __riscv_vmerge (expo_scale, EXP_BIAS - 61, xneg_or_swap, vlen); \
+ VFLOAT scale_result = U_AS_F (__riscv_vsll (expo_scale, MAN_LEN, vlen)); \
+ vw = __riscv_vfcvt_f (P, vlen); \
+ vw = __riscv_vfmul (vw, scale_result, vlen); \
+ vw = __riscv_vfsgnj (vw, vy, vlen); \
+ vw = __riscv_vmerge (vw, vw_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vw; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_atanh.c b/sysdeps/riscv/rvd/v_d_atanh.c
new file mode 100644
index 0000000000..3014ec2311
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_atanh.c
@@ -0,0 +1,182 @@
+/* Double-precision vector atanh function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_ATANHD_VSET_CONFIG
+
+#include "rvvlm_invhyperD.h"
+
+// Atanh(x) is defined only for |x| <= 1. As atanh(-x) = -atanh(x), the
+// main computation works with |x|.
+// For |x| > 1 and x being a sNaN, the invalid signal has to be generated
+// together with a returned valued of NaN. For a qNaN input, no signal is
+// generated. And atan(+/- 1) yiedls +/- Inf, but a div-by-zero signal has
+// to be generated.
+//
+// For 0 < x < 1, we use the formula atan(x) = (1/2) log( (1+x)/(1-x) ).
+// The usual technique is to find a scale s = 2^(-n) so that
+// r = s * (1+x)/(1-x) falls roughly in the region [1/sqrt(2), sqrt(2)].
+// Thus the desired result is (1/2)(n * log(2) + log(r)).
+// Somewhat ironically, log(r) is usually approximated in terms of atanh,
+// as its Taylor series around 0 converges much faster than that of log(r)
+// around 1. log(r) = 2 atanh( (r-1)/(r+1) ).
+// Hence, atan(x) = (n/2)log(2) + atan([(1+x)-(1-x)/s]/[(1+x)+(1-x)/s]).
+//
+// This implementation obtains s=2^(-n) using the approximate reciprocal
+// instruction rather than computing (1+x)/(1-x) to extra precision.
+// It then combines the two transformations into
+// atanh( [(1+x) - (1-x)/s] / [(1+x) + (1-x)/s] ) requiring only
+// one division, instead of two.
+// We further observe that instead of using multiple extra-precise
+// simulations to obtain both the numerator and denominator accurately,
+// we can use fixed-point computations.
+// As long as the original input |x| >= 0.248, a scale of 60 allows
+// both numerator and denominator to maintain high precision without overflow,
+// elminating many double-double like simulations. For |x| < 0.248, the
+// core polynomial evaluated at x yields the result.
+//
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, atanh) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx, vx_orig, vy, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx = x; \
+ vx_orig = vx; \
+ \
+ /* Handle Inf, NaN, |input| >= 1, and |input| < 2^(-30) */ \
+ EXCEPTION_HANDLING_ATANH (vx, special_args, vy_special, vlen); \
+ vx = __riscv_vfsgnj (vx, fp_posOne, vlen); \
+ \
+ /* At this point vx are positive number, either 0, or 2^(-30) <= x < 1.*/ \
+ \
+ /* Get n so that 2^(-n) * (1+x)/(1-x) is in roughly in the range [1/rt2, \
+ * rt2] */ \
+ VUINT n; \
+ VFLOAT one_plus_x, one_minus_x; \
+ one_plus_x = __riscv_vfadd (vx, fp_posOne, vlen); \
+ one_minus_x = __riscv_vfrsub (vx, fp_posOne, vlen); \
+ /* note one_minus_x >= 2^(-53) is never 0 */ \
+ VFLOAT ratio = __riscv_vfmul (one_plus_x, \
+ __riscv_vfrec7 (one_minus_x, vlen), vlen); \
+ n = __riscv_vadd (__riscv_vsrl (F_AS_U (ratio), MAN_LEN - 8, vlen), 0x96, \
+ vlen); \
+ n = __riscv_vsub (__riscv_vsrl (n, 8, vlen), EXP_BIAS, vlen); \
+ \
+ VINT X = __riscv_vfcvt_x (__riscv_vfmul (vx, 0x1.0p60, vlen), vlen); \
+ VINT Numer, Denom; \
+ /* no overflow, so it does not matter if we use the saturating add or not \
+ */ \
+ VINT One_plus_X = __riscv_vadd (X, ONE_Q60, vlen); \
+ VINT One_minus_X = __riscv_vrsub (X, ONE_Q60, vlen); \
+ One_minus_X = __riscv_vsll (One_minus_X, n, vlen); \
+ Numer = __riscv_vsub (One_plus_X, One_minus_X, vlen); \
+ Denom = __riscv_vadd (One_plus_X, One_minus_X, vlen); \
+ VFLOAT numer, delta_numer, denom, delta_denom; \
+ numer = __riscv_vfcvt_f (Numer, vlen); \
+ VINT Tail = __riscv_vsub (Numer, __riscv_vfcvt_x (numer, vlen), vlen); \
+ delta_numer = __riscv_vfcvt_f (Tail, vlen); \
+ denom = __riscv_vfcvt_f (Denom, vlen); \
+ Tail = __riscv_vsub (Denom, __riscv_vfcvt_x (denom, vlen), vlen); \
+ delta_denom = __riscv_vfcvt_f (Tail, vlen); \
+ \
+ VFLOAT r_hi, r_lo, r; \
+ DIV2_N2D2 (numer, delta_numer, denom, delta_denom, r_hi, r_lo, vlen); \
+ VBOOL x_in_range = __riscv_vmflt (vx, 0x1.0p-8, vlen); \
+ r_hi = __riscv_vmerge (r_hi, vx, x_in_range, vlen); \
+ r_lo = __riscv_vfmerge (r_lo, fp_posZero, x_in_range, vlen); \
+ n = __riscv_vmerge (n, 0, x_in_range, vlen); \
+ \
+ r = __riscv_vfadd (r_hi, r_lo, vlen); \
+ VFLOAT rsq = __riscv_vfmul (r, r, vlen); \
+ VFLOAT rcube = __riscv_vfmul (rsq, r, vlen); \
+ VFLOAT r6 = __riscv_vfmul (rcube, rcube, vlen); \
+ \
+ VFLOAT poly_right = PSTEP ( \
+ 0x1.c71c4a9aa397dp-4, rsq, \
+ PSTEP (0x1.7467d1711e0d8p-4, rsq, \
+ PSTEP (0x1.397813e4ac2d0p-4, 0x1.30b2960ceaa62p-4, rsq, vlen), \
+ vlen), \
+ vlen); \
+ \
+ VFLOAT poly_left = PSTEP ( \
+ 0x1.55555555555aep-2, rsq, \
+ PSTEP (0x1.999999997646fp-3, 0x1.2492494ac4a16p-3, rsq, vlen), vlen); \
+ \
+ VFLOAT poly = __riscv_vfmadd (poly_right, r6, poly_left, vlen); \
+ poly = __riscv_vfmadd (poly, rcube, r_lo, vlen); \
+ /* At this point r_hi + poly approximates atanh(r) */ \
+ \
+ /* Compose the final answer (n/2)*log(2) + atanh(r) */ \
+ VFLOAT n_flt = __riscv_vfcvt_f (n, vlen); \
+ VFLOAT A = __riscv_vfmul (n_flt, LOG2_BY2_HI, vlen); \
+ VFLOAT S, s; \
+ FAST2SUM (A, r_hi, S, s, vlen); \
+ s = __riscv_vfmacc (s, LOG2_BY2_LO, n_flt, vlen); \
+ s = __riscv_vfadd (s, poly, vlen); \
+ vy = __riscv_vfadd (S, s, vlen); \
+ \
+ vy = __riscv_vfsgnj (vy, vx_orig, vlen); \
+ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_atanpi.c b/sysdeps/riscv/rvd/v_d_atanpi.c
new file mode 100644
index 0000000000..d0c64ab2af
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_atanpi.c
@@ -0,0 +1,238 @@
+/* Double-precision vector atanpi function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include <riscv_vector.h>
+
+#include "v_math.h"
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_ATANPID_VSET_CONFIG
+
+#define COMPILE_FOR_ATANPI
+
+#define PIBY2_HI 0x1.921fb54442d18p+0
+#define PIBY2_LO 0x1.1a62633145c07p-54
+
+#define ONE_OV_PI_HI 0x1.45f306dc9c883p-2
+#define ONE_OV_PI_LO -0x1.6b01ec5417056p-56
+
+#define PIBY2_Q60 0x1921fb54442d1847
+#define PI_Q60 0x3243f6a8885a308d
+#define PIBY2_Q61 0x3243f6a8885a308d
+#define ONE_OV_PI_Q63 0x28be60db9391054a
+
+#define FUNC_NEAR_ZERO(small_x, vx, vlen) \
+ __riscv_vfmadd ((small_x), (vx), ONE_OV_PI_HI, \
+ __riscv_vfmul ((small_x), (vx), ONE_OV_PI_LO, (vlen)), \
+ (vlen))
+
+#define FUNC_EXPO_LARGE(expo_x_large, vx, vlen) \
+ __riscv_vfsub ((expo_x_large), \
+ __riscv_vfsgnj (VFMV_VF (0x1.0p-1, (vlen)), (vx), (vlen)), \
+ __riscv_vfrec7 ((expo_x_large), (vx), (vlen)), (vlen))
+
+#define EXCEPTION_HANDLING_ATAN(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT expo_x = __riscv_vsrl (F_AS_U (vx), MAN_LEN, (vlen)); \
+ expo_x = __riscv_vand (expo_x, 0x7FF, (vlen)); \
+ /* filter out |x| >= 2^60, Infs and NaNs */ \
+ VBOOL expo_x_large = __riscv_vmsgeu (expo_x, EXP_BIAS + 60, (vlen)); \
+ /* filter out |x| < 2^(-30) */ \
+ VBOOL x_small = __riscv_vmsleu (expo_x, EXP_BIAS - 31, (vlen)); \
+ (special_args) = __riscv_vmor (expo_x_large, x_small, (vlen)); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ VFLOAT x_tmp = FUNC_NEAR_ZERO (x_small, (vx), (vlen)); \
+ (vy_special) = __riscv_vmerge ((vy_special), x_tmp, x_small, vlen); \
+ x_tmp = FUNC_EXPO_LARGE (expo_x_large, (vx), (vlen)); \
+ (vy_special) \
+ = __riscv_vmerge ((vy_special), x_tmp, expo_x_large, vlen); \
+ (vx) = __riscv_vfmerge ((vx), fp_posZero, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+
+// For atan, atan(x) ~=~ r + r*s*poly(s), r = x and s = r*r for |x| < 1
+// and atan(x) = pi/2 - atan(1/x) for |x| >= 1
+// Thus atan(x) = (pi/2 or 0) +/- (r + r*s*poly(s)), where r is x or 1/x, s is
+// r*r This version computes this entire expression in fixed point by
+// converting r and s into fixed point.
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, atanpi) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx_orig, vy, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ vlen = VSET (simdlen); \
+ vx_orig = x; \
+ \
+ EXCEPTION_HANDLING_ATAN (vx_orig, special_args, vy_special, vlen); \
+ \
+ VFLOAT a = __riscv_vfsgnj (vx_orig, fp_posOne, vlen); \
+ VBOOL a_ge_one = __riscv_vmfge (a, fp_posOne, vlen); \
+ VBOOL a_lt_one = __riscv_vmnot (a_ge_one, vlen); \
+ VFLOAT z = __riscv_vfrdiv (a_ge_one, a, fp_posOne, vlen); \
+ z = __riscv_vmerge (z, a, a_lt_one, vlen); \
+ VFLOAT delta = VFMV_VF (fp_posOne, vlen); \
+ delta = __riscv_vfnmsac (a_ge_one, delta, z, a, vlen); \
+ delta = __riscv_vfmul (a_ge_one, delta, z, vlen); \
+ delta = __riscv_vfmerge (delta, fp_posZero, a_lt_one, vlen); \
+ \
+ VUINT expo_61pm = __riscv_vsrl (F_AS_U (z), MAN_LEN, vlen); \
+ expo_61pm = __riscv_vmaxu (expo_61pm, EXP_BIAS - 60, vlen); \
+ expo_61pm \
+ = __riscv_vrsub (expo_61pm, 2 * EXP_BIAS + 61, vlen); /* BIAS+61+m */ \
+ VFLOAT scale_61pm = U_AS_F (__riscv_vsll (expo_61pm, MAN_LEN, vlen)); \
+ VINT Z = __riscv_vfcvt_x (__riscv_vfmul (z, scale_61pm, vlen), vlen); \
+ VINT Delta \
+ = __riscv_vfcvt_x (__riscv_vfmul (delta, scale_61pm, vlen), vlen); \
+ Delta = __riscv_vsadd (a_ge_one, Delta, Z, vlen); \
+ Z = __riscv_vmerge (Z, Delta, a_ge_one, vlen); \
+ \
+ VINT V = __riscv_vsmul (Z, __riscv_vsll (Z, 1, vlen), 1, vlen); \
+ VINT VV = __riscv_vrsub (V, 0, vlen); \
+ \
+ VUINT m = __riscv_vsub (expo_61pm, EXP_BIAS + 61, vlen); \
+ VUINT two_m = __riscv_vsll (m, 1, vlen); \
+ VBOOL left_shift = __riscv_vmsltu (two_m, 3, vlen); \
+ VBOOL right_shift = __riscv_vmnot (left_shift, vlen); \
+ \
+ VINT I_tmp \
+ = __riscv_vsll (left_shift, V, __riscv_vrsub (two_m, 2, vlen), vlen); \
+ V = __riscv_vmerge (V, I_tmp, left_shift, vlen); \
+ I_tmp = __riscv_vsll (left_shift, VV, __riscv_vrsub (two_m, 3, vlen), \
+ vlen); \
+ VV = __riscv_vmerge (VV, I_tmp, left_shift, vlen); \
+ \
+ I_tmp \
+ = __riscv_vsra (right_shift, V, __riscv_vsub (two_m, 2, vlen), vlen); \
+ V = __riscv_vmerge (V, I_tmp, right_shift, vlen); \
+ I_tmp = __riscv_vsra (right_shift, VV, __riscv_vsub (two_m, 3, vlen), \
+ vlen); \
+ VV = __riscv_vmerge (VV, I_tmp, right_shift, vlen); \
+ \
+ VINT WW = __riscv_vsll (__riscv_vsmul (V, VV, 1, vlen), 1, vlen); \
+ \
+ VINT P_even = PSTEPN_I ( \
+ -0x56629d839b68685, WW, \
+ PSTEPN_I (-0x3d2984d0a6f836a, WW, \
+ PSTEPN_I (-0x1c5e8b5228f9fe4, WW, \
+ PSTEPN_I (-0x05deca0ae3a1a5d, -0x004efe42fda24d7, \
+ WW, vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P_even = PSTEPN_I ( \
+ -0x2aaaaaaaaaaa49d3, WW, \
+ PSTEPN_I (-0x12492492378aaf69, WW, \
+ PSTEPN_I (-0xba2e88c805cbaf8, WW, \
+ PSTEPN_I (-0x888722719d1260a, WW, \
+ PSTEPN_I (-0x6b96ef57ce79cc3, WW, \
+ P_even, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VINT P_odd = PSTEPN_I ( \
+ 0x04afe3b1345b489b, WW, \
+ PSTEPN_I (0x02cec355111c7439, WW, \
+ PSTEPN_I (0x00eaa9acebf3963e, WW, \
+ PSTEPN_I (0x001b053368ecfa14, 0x00006da7bb4399dd, \
+ WW, vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P_odd = PSTEPN_I ( \
+ 0x1999999999540349, WW, \
+ PSTEPN_I (0x0e38e38bf1671f42, WW, \
+ PSTEPN_I (0x09d89b293ef5f4d9, WW, \
+ PSTEPN_I (0x0786ec3df324db61, WW, \
+ PSTEPN_I (0x060b457b3c56e750, WW, \
+ P_odd, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P_odd = __riscv_vsmul (VV, P_odd, 1, vlen); \
+ VINT P = __riscv_vsub (P_even, P_odd, vlen); \
+ \
+ P = __riscv_vsmul (VV, P, 1, vlen); /* Q_63 */ \
+ P = __riscv_vsmul (Z, P, 1, vlen); /* Q_61pm */ \
+ P = __riscv_vsub (Z, P, vlen); /* Q_61pm */ \
+ \
+ VINT P_a_ge_one = __riscv_vsra (a_ge_one, P, m, vlen); \
+ P_a_ge_one = __riscv_vrsub (P_a_ge_one, PIBY2_Q61, vlen); \
+ P = __riscv_vmerge (P, P_a_ge_one, a_ge_one, vlen); \
+ \
+ P = __riscv_vsmul (P, ONE_OV_PI_Q63, 1, vlen); \
+ \
+ VUINT expo_scale = __riscv_vrsub (expo_61pm, 2 * EXP_BIAS, \
+ vlen); /* EXP_BIAS - (61+m) */ \
+ expo_scale = __riscv_vmerge (expo_scale, EXP_BIAS - 61, a_ge_one, vlen); \
+ VFLOAT scale_result = U_AS_F (__riscv_vsll (expo_scale, MAN_LEN, vlen)); \
+ vy = __riscv_vfcvt_f (P, vlen); \
+ vy = __riscv_vfmul (vy, scale_result, vlen); \
+ vy = __riscv_vfsgnj (vy, vx_orig, vlen); \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_cbrt.c b/sysdeps/riscv/rvd/v_d_cbrt.c
new file mode 100644
index 0000000000..97bfc90ae5
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_cbrt.c
@@ -0,0 +1,191 @@
+/* Double-precision vector cbrt function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_CBRTD_VSET_CONFIG
+
+#define EXCEPTION_HANDLING_CBRT(vx, special_args, vy_special, n_adjust, vlen) \
+ do \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ /* special handling NaN, +-Inf, +-0 */ \
+ IDENTIFY (vclass, 0x399, (special_args), (vlen)); \
+ VBOOL denorm; \
+ IDENTIFY (vclass, 0x24, denorm, (vlen)); \
+ VBOOL special_n_denorm = __riscv_vmor ((special_args), denorm, (vlen)); \
+ (n_adjust) = __riscv_vxor ((n_adjust), (n_adjust), (vlen)); \
+ if (__riscv_vcpop (special_n_denorm, (vlen)) > 0) \
+ { \
+ /* normalize denormal numbers */ \
+ VFLOAT vx_normal = __riscv_vfmul (denorm, vx, 0x1.0p60, (vlen)); \
+ (vx) = __riscv_vmerge ((vx), vx_normal, denorm, (vlen)); \
+ (n_adjust) = __riscv_vmerge ((n_adjust), -20, denorm, (vlen)); \
+ (vy_special) = __riscv_vfadd ((special_args), (vx), (vx), (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posOne, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+
+#define ONE_OV_3 0x1.5555555555555p-2
+#define THIRD_Q62 0x1555555555555555
+#define ONE_Q62 0x4000000000000000
+#define CBRT_2_Q62 0x50a28be635ca2b89
+#define CBRT_4_Q62 0x6597fa94f5b8f20b
+
+// This version uses a short polynomial to approximate x^(-1/3) to 14+ bits
+// It then iterates to improve the accuracy. Finally x * (x^(-1/3))^2 gives
+// x^(1/3)
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, cbrt) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx_orig, vx, vy, vy_special; \
+ VBOOL special_args; \
+ VINT n_adjust; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx_orig = x; \
+ \
+ /* Set results for input of NaN, +-Inf, +-0, and normalize denormals */ \
+ EXCEPTION_HANDLING_CBRT (vx_orig, special_args, vy_special, n_adjust, \
+ vlen); \
+ vx = __riscv_vfsgnj (vx_orig, fp_posOne, vlen); \
+ \
+ VINT N = __riscv_vsra (F_AS_I (vx), MAN_LEN, vlen); \
+ N = __riscv_vsub (N, EXP_BIAS, vlen); \
+ vx = I_AS_F ( \
+ __riscv_vsub (F_AS_I (vx), __riscv_vsll (N, MAN_LEN, vlen), vlen)); \
+ /* vx are now in [1, 2), the original argument is 2^N * vx \
+ // cube root is 2^M * 2^(J/3) * vx^(1/3) where N = 3 * M + J, 0 <= J <= 2 \
+ */ \
+ \
+ VFLOAT poly_left = PSTEP ( \
+ 0x1.c7feaf5d6cc3bp+0, vx, \
+ PSTEP (-0x1.910e22c54a1eap+0, 0x1.3e9d3512b6a5ap+0, vx, vlen), vlen); \
+ \
+ VFLOAT xcube = __riscv_vfmul (vx, vx, vlen); \
+ xcube = __riscv_vfmul (xcube, vx, vlen); \
+ \
+ VFLOAT poly_right = PSTEP ( \
+ -0x1.3261c716ecf2dp-1, vx, \
+ PSTEP (0x1.3ffc61ff0985dp-3, -0x1.173278cb4b00fp-6, vx, vlen), vlen); \
+ \
+ VFLOAT z = __riscv_vfmadd (poly_right, xcube, poly_left, vlen); \
+ /* z ~=~ x^(-1/3) to relatitve error 2^(-17.3) \
+ // iteration is z <-- z + delta * z where delta = 1/3 - 1/3 * x * z^3 */ \
+ \
+ /* work on decomposing N = 3 * M + J \
+ // M = N // 3; it is a well known trick that one can get \
+ // integer quotient by multiplication of an "inverse" \
+ // but this works only for non-negative (or non-positive) dividends \
+ // So we add 1023 to N to make it non-negative */ \
+ VINT L = __riscv_vadd (N, 1023, vlen); \
+ VINT M = __riscv_vsra (__riscv_vmul (L, 1366, vlen), 12, vlen); \
+ ; \
+ /* 0 <= L <= 2046; 1366 is ceil(2^12/3); M = L // 3 */ \
+ VINT J = __riscv_vadd (__riscv_vsll (M, 1, vlen), M, vlen); \
+ J = __riscv_vsub (L, J, vlen); /* J is N mod 3 */ \
+ M = __riscv_vsub (M, 341, vlen); /* 341 is 1023/3 */ \
+ /* At this point, N = 3 * M + J */ \
+ \
+ VINT R = VMVI_VX (ONE_Q62, vlen); \
+ VBOOL J_is_1 = __riscv_vmseq (J, 1, vlen); \
+ VBOOL J_is_2 = __riscv_vmseq (J, 2, vlen); \
+ R = __riscv_vmerge (R, CBRT_2_Q62, J_is_1, vlen); \
+ R = __riscv_vmerge (R, CBRT_4_Q62, J_is_2, vlen); \
+ \
+ /* two iterations of z <-- z + delta * z */ \
+ /* rounding error in the first iteration is immaterial */ \
+ VFLOAT a = __riscv_vfmul (z, z, vlen); \
+ VFLOAT b = __riscv_vfmul (vx, z, vlen); \
+ b = __riscv_vfmul (b, a, vlen); \
+ VFLOAT c = VFMV_VF (ONE_OV_3, vlen); \
+ VFLOAT delta = __riscv_vfnmsub (b, ONE_OV_3, c, vlen); \
+ z = __riscv_vfmacc (z, delta, z, vlen); \
+ \
+ /* the second iteration we perform in fixed point \
+ // as the rounding errors need to be controlled */ \
+ double two_to_62 = 0x1.0p62; \
+ VINT Z_Q62 = __riscv_vfcvt_x (__riscv_vfmul (z, two_to_62, vlen), vlen); \
+ VINT X_Q62 = __riscv_vfcvt_x (__riscv_vfmul (vx, two_to_62, vlen), vlen); \
+ VINT A = __riscv_vsll (__riscv_vsmul (Z_Q62, Z_Q62, 1, vlen), 1, vlen); \
+ VINT B = __riscv_vsll (__riscv_vsmul (X_Q62, Z_Q62, 1, vlen), 1, vlen); \
+ B = __riscv_vsll (__riscv_vsmul (A, B, 1, vlen), 1, vlen); \
+ B = __riscv_vsll (__riscv_vsmul (B, THIRD_Q62, 1, vlen), 1, vlen); \
+ VINT DELTA = __riscv_vrsub (B, THIRD_Q62, vlen); \
+ A = __riscv_vsll (__riscv_vsmul (DELTA, Z_Q62, 1, vlen), 1, vlen); \
+ Z_Q62 = __riscv_vadd (Z_Q62, A, vlen); \
+ \
+ /* X * Z * Z is cube root of x in [1, 2) \
+ // then we need to multiply with 2^(1/3) or 4^(1/3) as needed \
+ // together with multiplication with 2^m */ \
+ Z_Q62 = __riscv_vsll (__riscv_vsmul (Z_Q62, Z_Q62, 1, vlen), 1, vlen); \
+ Z_Q62 = __riscv_vsll (__riscv_vsmul (Z_Q62, X_Q62, 1, vlen), 1, vlen); \
+ R = __riscv_vsmul (R, Z_Q62, 1, vlen); /* scale is 61 now */ \
+ \
+ M = __riscv_vadd (M, EXP_BIAS - 61, vlen); \
+ M = __riscv_vadd (M, n_adjust, vlen); \
+ VFLOAT scale = I_AS_F (__riscv_vsll (M, MAN_LEN, vlen)); \
+ vy = __riscv_vfcvt_f (R, vlen); \
+ vy = __riscv_vfmul (vy, scale, vlen); \
+ vy = __riscv_vfsgnj (vy, vx_orig, vlen); \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_cdfnorm.c b/sysdeps/riscv/rvd/v_d_cdfnorm.c
new file mode 100644
index 0000000000..eddc545cfd
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_cdfnorm.c
@@ -0,0 +1,226 @@
+/* Double-precision vector cdfnorm function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_CDFNORMD_VSET_CONFIG
+
+#define COMPILE_FOR_CDFNORM
+#include "rvvlm_errorfuncsD.h"
+
+// polynomial coefficients in Q63
+#define P_0 0x6c25c9f6cfd132e7
+#define P_1 -0x5abb8f458c7895f
+#define P_2 -0x5ea2dcf3956792c
+#define P_3 0xdd22963d83fa7d8
+#define P_4 -0x107d667db8b90c84
+#define P_5 0xea0acc44786d840
+#define P_6 -0xa5e5b52ef29e23a
+#define P_7 0x5ef73d5784d9dc6
+#define P_8 -0x2acb1deb9208ae5
+#define P_9 0xdf0d75186479cf
+#define P_10 -0x25493132730985
+#define P_11 -0x7daed4327549c
+#define P_12 0x6ff2fb205b4f9
+#define P_13 -0x15242feefcc0f
+#define P_14 -0x7f14d7432d2b
+#define P_15 0x4b2791427dab
+#define P_16 0x17d0499cfa7
+#define P_17 -0xae9fb960b85
+#define P_18 0x15d4aa6975c
+#define P_19 0x17cff734612
+#define P_20 -0x505ad971f3
+#define P_21 -0x34366c3ea9
+#define P_22 0x97dfa0691
+#define P_23 0x591d3b55a
+#define NEG_A_SCALED -0x1.536p+65
+#define B_FOR_TRANS 0x1.6fap+2
+#define MIN_CLIP 0x1.0p-60
+#define MAX_CLIP 0x1.4p5
+
+// When COMPILE_FOR_ERFC
+// The main computation is for erfc(|x|) and exploits the symmetry
+// erfc(-|x|) = 2 - erfc(|x|)
+// When COMPILE_FOR_CDFNORM
+// The main compputation is for cdfnorm(-|x|) and exploits the symmetry
+// cdfnorm(|x|) = 1 - cdfnorm(-|x|)
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, cdfnorm) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx, vx_orig, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx = x; \
+ vx_orig = vx; \
+ \
+ /* Handle Inf and NaN */ \
+ EXCEPTION_HANDLING (vx, special_args, vy_special, vlen); \
+ \
+ /* suffices to focus on |x| clipped to [2^-60, 28] */ \
+ vx = __riscv_vfsgnj (vx, fp_posOne, vlen); \
+ vx = __riscv_vfmin (vx, MAX_CLIP, vlen); \
+ vx = __riscv_vfmax (vx, MIN_CLIP, vlen); \
+ \
+ VINT R; \
+ /* Compute (x-a)/(x+b) as Q63 fixed-point */ \
+ X_TRANSFORM (vx, NEG_A_SCALED, B_FOR_TRANS, R, vlen); \
+ \
+ VINT n, A; \
+ VFLOAT vy = __riscv_vfmul (vx, 0x1.0p-1, vlen); \
+ /* Compute exp(-x*x) or exp(-x*x/2) as 2^n a \
+ // but return a as Q62 fixed-point A */ \
+ EXP_negAB (vx, vy, n, A, vlen); \
+ \
+ /* Approximate exp(x*x)*(1+2x)*erfc(x) \
+ // or exp(x*x/2)*(1+2x)*cdfnorm(-x) \
+ // using a polynomial in r = (x-a)/(x+b) \
+ // We use fixed-point computing \
+ // -1 < r < 1, thus using Q63 fixed-point for r \
+ // All coefficients are scaled the same and thus \
+ // the final value is in this scaling. \
+ // Scale is 2^62 for erfc and 2^63 for cdfnorm */ \
+ VINT P_RIGHT = PSTEP_I ( \
+ P_16, R, \
+ PSTEP_I (P_17, R, \
+ PSTEP_I (P_18, R, \
+ PSTEP_I (P_19, R, \
+ PSTEP_I (P_20, R, \
+ PSTEP_I (P_21, R, \
+ PSTEP_I (P_22, P_23, R, \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VINT R8 = __riscv_vsmul (R, R, 1, vlen); \
+ R8 = __riscv_vsmul (R8, R8, 1, vlen); \
+ R8 = __riscv_vsmul (R8, R8, 1, vlen); \
+ \
+ VINT P_MID = PSTEP_I ( \
+ P_8, R, \
+ PSTEP_I (P_9, R, \
+ PSTEP_I (P_10, R, \
+ PSTEP_I (P_11, R, \
+ PSTEP_I (P_12, R, \
+ PSTEP_I (P_13, R, \
+ PSTEP_I (P_14, P_15, R, \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P_RIGHT = __riscv_vsmul (R8, P_RIGHT, 1, vlen); \
+ P_RIGHT = __riscv_vadd (P_RIGHT, P_MID, vlen); \
+ P_RIGHT = __riscv_vsmul (R8, P_RIGHT, 1, vlen); \
+ \
+ VINT P_LEFT = PSTEP_I ( \
+ P_0, R, \
+ PSTEP_I ( \
+ P_1, R, \
+ PSTEP_I (P_2, R, \
+ PSTEP_I (P_3, R, \
+ PSTEP_I (P_4, R, \
+ PSTEP_I (P_5, R, \
+ PSTEP_I (P_6, P_7, R, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VINT P = __riscv_vadd (P_LEFT, P_RIGHT, vlen); \
+ \
+ VINT m, B; \
+ RECIP_SCALE (vx, B, m, vlen); \
+ \
+ /* exp(-x^2/2) is 2^n * 2^(-62) * A \
+ // 1/(1+2x) is 2^(-m) * B, m >= 62 \
+ // exp(x^2/2)(1+2x)cdfnorm(-x) is 2^(-63) * P */ \
+ P = __riscv_vsmul (P, A, 1, vlen); /* Q62 */ \
+ P = __riscv_vsmul (P, B, 1, vlen); /* Q(m-1) */ \
+ n = __riscv_vsub (n, m, vlen); \
+ n = __riscv_vadd (n, 1, vlen); /* n <= -61 */ \
+ \
+ VUINT ell = I_AS_U (__riscv_vrsub (n, -61, vlen)); \
+ ell = __riscv_vminu (ell, 63, vlen); \
+ VINT PP = __riscv_vsra (P, ell, vlen); \
+ VINT Q = VMVI_VX (1, vlen); \
+ Q = __riscv_vsll (Q, 61, vlen); \
+ Q = __riscv_vsub (Q, PP, vlen); \
+ VFLOAT vz = __riscv_vfcvt_f (Q, vlen); \
+ vz = __riscv_vfmul (vz, 0x1.0p-61, vlen); \
+ \
+ vy = __riscv_vfcvt_f (P, vlen); \
+ FAST_LDEXP (vy, n, vlen); \
+ /* vy is cdfnorm(-|x|) at this point */ \
+ \
+ VBOOL x_is_pos = __riscv_vmfgt (vx_orig, fp_posZero, vlen); \
+ vy = __riscv_vmerge (vy, vz, x_is_pos, vlen); \
+ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_cdfnorminv.c b/sysdeps/riscv/rvd/v_d_cdfnorminv.c
new file mode 100644
index 0000000000..6391bfa5c5
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_cdfnorminv.c
@@ -0,0 +1,292 @@
+/* Double-precision vector cdfnorminv function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_CDFNORMINVD_VSET_CONFIG
+
+#define COMPILE_FOR_CDFNORMINV
+#include "rvvlm_inverrorfuncsD.h"
+
+// cdfnorminv is defined on (0, 1). Suffices to consider (0, 1/2]
+// Two regions of approximation: left is [0, 0x1.2p-3) and right is [0x1.2p-3,
+// 1/2) Both are done with rational functions. For right, t*P(t)/Q(t) t =
+// 1/2-x; x in [0x1.2p-3, 1/2) For left, y*P(t)/Q(t), y = sqrt(-log(2x)); and t
+// = 1/y
+
+// P_coefficients in asending order, all in Q79.
+// p0_delta is in floating point, scale 79
+#define P_right_0 -0x6709ca23d4199a8L
+#define P_right_1 -0xfd998fbae8eb3c8L
+#define P_right_2 0x48ca86036ae6e955L
+#define P_right_3 -0x278f4a98238f8c27L
+#define P_right_4 -0x40132208941e6a5aL
+#define P_right_5 0x402e2635719a3914L
+#define P_right_6 0x31c67fdc7e5073fL
+#define P_right_7 -0x12d1e1d375fb5d31L
+#define P_right_8 0x4232daca563749dL
+#define P_right_9 0xb02a8971665c0dL
+#define P_right_10 -0x2a7ae4292a6a4fL
+#define DELTA_P0_right 0x1.6c4b0b32778d0p-3
+
+// Q_coefficients in asending order, all in Q79.
+// q0_delta is in floating point, scale 79
+#define Q_right_0 -0x52366e5b14c0970L
+#define Q_right_1 -0xca57e95abcc599bL
+#define Q_right_2 0x3b6c91ec67f5759cL
+#define Q_right_3 -0x1c40d5daa3be22bcL
+#define Q_right_4 -0x41f11eb5d837386cL
+#define Q_right_5 0x3c6ce478fcd75c9aL
+#define Q_right_6 0xbb1cd7270cfba1dL
+#define Q_right_7 -0x1988a4116498f1afL
+#define Q_right_8 0x44dc3042f103d20L
+#define Q_right_9 0x2390e683d02edf3L
+#define Q_right_10 -0x8ec66f2a7e410cL
+#define DELTA_Q0_right -0x1.29a0161e99446p-3
+
+// P_coefficients in asending order, all in Q67. p0_delta is in floating point
+#define P_left_0 0x216a32ed581bfL
+#define P_left_1 0x5ac486106d127fL
+#define P_left_2 0x3a9f84d231c6131L
+#define P_left_3 0xb54f6ab23cca5a3L
+#define P_left_4 0xecc53db7ed5eccbL
+#define P_left_5 0x194382b2de726d58L
+#define P_left_6 0x166fc6bd87b1b0b6L
+#define P_left_7 0xfd7bc0d477f41a9L
+#define P_left_8 0x7fc186088d7ad8cL
+#define P_left_9 0x18d6aeeb448b50aL
+#define P_left_10 -0x8fb330020a5bL
+#define DELTA_P0_left 0x1.b81f6f45914f0p-2
+
+// Q_coefficients in asending order, all in Q67. q0_delta is in floating point
+#define Q_left_0 0x17a09aabf9ceeL
+#define Q_left_1 0x4030b9059ffcadL
+#define Q_left_2 0x29b26b0d87f7855L
+#define Q_left_3 0x87572a13d3fa2ddL
+#define Q_left_4 0xd7a728b5620ac3cL
+#define Q_left_5 0x1754392b473fd439L
+#define Q_left_6 0x1791b9a091a816c2L
+#define Q_left_7 0x167f71db9e13b075L
+#define Q_left_8 0xcb9f5f3e5e618a4L
+#define Q_left_9 0x68271fae767c68eL
+#define Q_left_10 0x13745c4fa224b25L
+#define DELTA_Q0_left 0x1.f7e7557a34ae6p-2
+
+// cdfnorminv(x) = -sqrt(2)*erfcinv(2x)
+// The approximation rational functions are based on those for erfcinv
+// hence you will see a doubling of arguments here and there so that
+// "2x" is created
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, cdfnorminv) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx, vx_sign, vy, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx = x; \
+ \
+ /* Handle Inf and NaN */ \
+ EXCEPTION_HANDLING_CDFNORMINV (vx, special_args, vy_special, vlen); \
+ \
+ vx_sign = __riscv_vfsub (vx, 0x1.0p-1, vlen); \
+ VFLOAT one_minus_x = __riscv_vfrsub (vx, fp_posOne, vlen); \
+ VBOOL x_gt_half = __riscv_vmfgt (vx_sign, fp_posZero, vlen); \
+ vx = __riscv_vmerge (vx, one_minus_x, x_gt_half, vlen); \
+ /* vx is now in (0, 1/2] */ \
+ VBOOL x_in_left = __riscv_vmfle (vx, 0x1.2p-3, vlen); \
+ \
+ VFLOAT w_hi, w_lo, w_hi_left, w_lo_left, y_hi, y_lo; \
+ VINT T, T_left, T_tiny; \
+ VBOOL x_is_tiny; \
+ x_is_tiny = __riscv_vmxor (x_is_tiny, x_is_tiny, vlen); \
+ \
+ if (__riscv_vcpop (x_in_left, vlen) > 0) \
+ { \
+ VFLOAT x_left = VFMV_VF (0x1.0p-4, vlen); \
+ x_left = __riscv_vmerge (x_left, vx, x_in_left, vlen); \
+ x_is_tiny = __riscv_vmflt (x_left, 0x1.0p-53, vlen); \
+ INT n_adjust = 59; \
+ x_left = __riscv_vfmul (x_left, 0x1.0p60, vlen); \
+ /* adjusting only 59 instead of 60 essentially doubles x */ \
+ NEG_LOGX_4_TRANSFORM (x_left, n_adjust, y_hi, y_lo, vlen); \
+ \
+ SQRTX_4_TRANSFORM (y_hi, y_lo, w_hi_left, w_lo_left, T_left, \
+ 0x1.0p63, 0x1.0p-63, vlen); \
+ if (__riscv_vcpop (x_is_tiny, vlen) > 0) \
+ { \
+ VFLOAT w_hi_dummy, w_lo_dummy; \
+ SQRTX_4_TRANSFORM (y_hi, y_lo, w_hi_dummy, w_lo_dummy, T_tiny, \
+ 0x1.0p64, 0x1.0p-64, vlen); \
+ } \
+ } \
+ vx = __riscv_vfadd (vx, vx, vlen); \
+ w_hi = VFMV_VF (fp_posOne, vlen); \
+ w_hi = __riscv_vfsub (w_hi, vx, vlen); \
+ w_lo = __riscv_vfrsub (w_hi, fp_posOne, vlen); \
+ w_lo = __riscv_vfsub (w_lo, vx, vlen); \
+ T = __riscv_vfcvt_x (__riscv_vfmul (w_hi, 0x1.0p63, vlen), vlen); \
+ VFLOAT delta_t = __riscv_vfmul (w_lo, 0x1.0p63, vlen); \
+ T = __riscv_vadd (T, __riscv_vfcvt_x (delta_t, vlen), vlen); \
+ T = __riscv_vmerge (T, T_left, x_in_left, vlen); \
+ \
+ w_hi = __riscv_vmerge (w_hi, w_hi_left, x_in_left, vlen); \
+ w_lo = __riscv_vmerge (w_lo, w_lo_left, x_in_left, vlen); \
+ \
+ /* For transformed branch, compute (w_hi + w_lo) * P(T)/Q(T) */ \
+ VINT P, Q; \
+ \
+ P = __riscv_vmerge (VMVI_VX (P_right_10, vlen), P_left_10, x_in_left, \
+ vlen); \
+ P = PSTEP_I_ab ( \
+ x_in_left, P_left_6, P_right_6, T, \
+ PSTEP_I_ab (x_in_left, P_left_7, P_right_7, T, \
+ PSTEP_I_ab (x_in_left, P_left_8, P_right_8, T, \
+ PSTEP_I_ab (x_in_left, P_left_9, P_right_9, \
+ T, P, vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ Q = __riscv_vmerge (VMVI_VX (Q_right_10, vlen), Q_left_10, x_in_left, \
+ vlen); \
+ Q = PSTEP_I_ab ( \
+ x_in_left, Q_left_6, Q_right_6, T, \
+ PSTEP_I_ab (x_in_left, Q_left_7, Q_right_7, T, \
+ PSTEP_I_ab (x_in_left, Q_left_8, Q_right_8, T, \
+ PSTEP_I_ab (x_in_left, Q_left_9, Q_right_9, \
+ T, Q, vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P = PSTEP_I_ab ( \
+ x_in_left, P_left_0, P_right_0, T, \
+ PSTEP_I_ab ( \
+ x_in_left, P_left_1, P_right_1, T, \
+ PSTEP_I_ab ( \
+ x_in_left, P_left_2, P_right_2, T, \
+ PSTEP_I_ab (x_in_left, P_left_3, P_right_3, T, \
+ PSTEP_I_ab (x_in_left, P_left_4, P_right_4, T, \
+ PSTEP_I_ab (x_in_left, P_left_5, \
+ P_right_5, T, P, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ Q = PSTEP_I_ab ( \
+ x_in_left, Q_left_0, Q_right_0, T, \
+ PSTEP_I_ab ( \
+ x_in_left, Q_left_1, Q_right_1, T, \
+ PSTEP_I_ab ( \
+ x_in_left, Q_left_2, Q_right_2, T, \
+ PSTEP_I_ab (x_in_left, Q_left_3, Q_right_3, T, \
+ PSTEP_I_ab (x_in_left, Q_left_4, Q_right_4, T, \
+ PSTEP_I_ab (x_in_left, Q_left_5, \
+ Q_right_5, T, Q, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VFLOAT p_hi, p_lo; \
+ p_hi = __riscv_vfcvt_f (P, vlen); \
+ \
+ p_lo = __riscv_vfcvt_f ( \
+ __riscv_vsub (P, __riscv_vfcvt_x (p_hi, vlen), vlen), vlen); \
+ VFLOAT delta_p0 = VFMV_VF (DELTA_P0_right, vlen); \
+ delta_p0 = __riscv_vfmerge (delta_p0, DELTA_P0_left, x_in_left, vlen); \
+ p_lo = __riscv_vfadd (p_lo, delta_p0, vlen); \
+ \
+ VFLOAT q_hi, q_lo; \
+ q_hi = __riscv_vfcvt_f (Q, vlen); \
+ q_lo = __riscv_vfcvt_f ( \
+ __riscv_vsub (Q, __riscv_vfcvt_x (q_hi, vlen), vlen), vlen); \
+ VFLOAT delta_q0 = VFMV_VF (DELTA_Q0_right, vlen); \
+ delta_q0 = __riscv_vfmerge (delta_q0, DELTA_Q0_left, x_in_left, vlen); \
+ q_lo = __riscv_vfadd (q_lo, delta_q0, vlen); \
+ \
+ if (__riscv_vcpop (x_is_tiny, vlen) > 0) \
+ { \
+ VFLOAT p_hi_tiny, p_lo_tiny, q_hi_tiny, q_lo_tiny; \
+ ERFCINV_PQ_HILO_TINY (T_tiny, p_hi_tiny, p_lo_tiny, q_hi_tiny, \
+ q_lo_tiny, vlen); \
+ p_hi = __riscv_vmerge (p_hi, p_hi_tiny, x_is_tiny, vlen); \
+ p_lo = __riscv_vmerge (p_lo, p_lo_tiny, x_is_tiny, vlen); \
+ q_hi = __riscv_vmerge (q_hi, q_hi_tiny, x_is_tiny, vlen); \
+ q_lo = __riscv_vmerge (q_lo, q_lo_tiny, x_is_tiny, vlen); \
+ } \
+ \
+ /* (y_hi, y_lo) <-- (w_hi + w_lo) * (p_hi + p_lo) */ \
+ y_hi = __riscv_vfmul (w_hi, p_hi, vlen); \
+ y_lo = __riscv_vfmsub (w_hi, p_hi, y_hi, vlen); \
+ y_lo = __riscv_vfmacc (y_lo, w_hi, p_lo, vlen); \
+ y_lo = __riscv_vfmacc (y_lo, w_lo, p_hi, vlen); \
+ \
+ DIV_N2D2 (y_hi, y_lo, q_hi, q_lo, w_hi, vlen); \
+ \
+ vy = w_hi; \
+ \
+ vy = __riscv_vfsgnj (vy, vx_sign, vlen); \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_cos.c b/sysdeps/riscv/rvd/v_d_cos.c
new file mode 100644
index 0000000000..3649d5eb6b
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_cos.c
@@ -0,0 +1,201 @@
+/* Double-precision vector cos function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_COSD_VSET_CONFIG
+
+#define COMPILE_FOR_COS
+#include "rvvlm_trigD.h"
+
+// This versions reduces argument to [-pi/4, pi/4] and computes sin(r) or
+// cos(r) by merging the appropriate coefficients into a vector register
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, cos) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx_orig, vx, vy, vy_special; \
+ VBOOL special_args; \
+ VUINT expo_x; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx_orig = x; \
+ vx = __riscv_vfsgnj (vx_orig, fp_posOne, vlen); \
+ expo_x = __riscv_vsrl (F_AS_U (vx), MAN_LEN, vlen); \
+ \
+ /* Set results for input of NaN and Inf and also for |x| very small */ \
+ EXCEPTION_HANDLING_TRIG (vx_orig, expo_x, special_args, vy_special, \
+ vlen); \
+ \
+ VBOOL x_large \
+ = __riscv_vmsgeu (expo_x, EXP_BIAS + 24, vlen); /* |x| >= 2^(24) */ \
+ VFLOAT vx_copy = vx; \
+ vx = __riscv_vfmerge (vx, fp_posZero, x_large, vlen); \
+ \
+ VFLOAT n_flt = __riscv_vfmul (vx, PIBY2_INV, vlen); \
+ VINT n = __riscv_vfcvt_x (n_flt, vlen); \
+ n_flt = __riscv_vfcvt_f (n, vlen); \
+ VFLOAT r_hi = __riscv_vfnmsac (vx, PIBY2_HI, n_flt, vlen); \
+ VUINT expo_r = __riscv_vsrl (F_AS_U (r_hi), MAN_LEN, vlen); \
+ expo_r = __riscv_vand (expo_r, 0x7FF, vlen); \
+ VBOOL r_small = __riscv_vmsleu (expo_r, EXP_BIAS - 16, \
+ vlen); /* |r_hi| < 2^(-15) */ \
+ UINT nb_r_small = __riscv_vcpop (r_small, vlen); \
+ VFLOAT r = __riscv_vfnmsac (r_hi, PIBY2_MID, n_flt, vlen); \
+ VFLOAT r_delta = __riscv_vfsub (r_hi, r, vlen); \
+ r_delta = __riscv_vfnmsac (r_delta, PIBY2_MID, n_flt, vlen); \
+ /* At this point, r + r_delta is an accurate reduced argument PROVIDED */ \
+ /* |r_hi| >= 2^(-15) */ \
+ if (nb_r_small > 0) \
+ { \
+ VFLOAT A = __riscv_vfmul (n_flt, PIBY2_MID, vlen); \
+ VFLOAT a = __riscv_vfmsub (n_flt, PIBY2_MID, A, vlen); \
+ /* A + a is n * piby2_mid exactly */ \
+ VFLOAT S = __riscv_vfsub (r_hi, A, vlen); \
+ VFLOAT s = __riscv_vfsub (r_hi, S, vlen); \
+ s = __riscv_vfsub (s, A, vlen); \
+ s = __riscv_vfnmsac (s, PIBY2_LO, n_flt, vlen); \
+ r = __riscv_vmerge (r, S, r_small, vlen); \
+ r_delta = __riscv_vmerge (r_delta, s, r_small, vlen); \
+ } \
+ \
+ if (__riscv_vcpop (x_large, vlen) > 0) \
+ { \
+ VFLOAT r_xlarge, r_delta_xlarge; \
+ VINT n_xlarge; \
+ LARGE_ARGUMENT_REDUCTION_Piby2 (vx_copy, vlen, x_large, n_xlarge, \
+ r_xlarge, r_delta_xlarge); \
+ r = __riscv_vmerge (r, r_xlarge, x_large, vlen); \
+ r_delta = __riscv_vmerge (r_delta, r_delta_xlarge, x_large, vlen); \
+ n = __riscv_vmerge (n, n_xlarge, x_large, vlen); \
+ } \
+ \
+ VUINT n_lsb = __riscv_vand (I_AS_U (n), 0x1, vlen); \
+ VBOOL pick_c = __riscv_vmseq (n_lsb, 0, vlen); \
+ \
+ /* Instead of always computing both sin(r) and cos(r) for |r| <= pi/4 \
+ We merge the sin and cos case together in picking the correct \
+ polynomial coefficients. This way we save on the bulk of the poly \
+ computation except for a couple of terms. \
+ \ \
+ This standard algorithm either computes sin(r+r_delta) or \
+ cos(r+r_delta), depending on the parity of n \
+ Note that sin(t) = t + t^3(s_poly(t^2)) \
+ and cos(t) = 1 - t^2/2 + t^4(c_poly(t^2)) \
+ where s_poly and c_poly are of the same degree. Hence \
+ it suffices to load the coefficient vector with the correct \
+ coefficients for s_poly or c_poly. We compute the needed s_poly or \
+ c_poly without wasteful operations. (That is, computing s_poly for all r and \
+ c_poly for all r and in general discarding half of these results.) \
+ */ \
+ \
+ /* sin(r+r_delta) ~=~ sin(r) + r_delta(1 - r^2/2) */ \
+ /* sin(r) is approximated by 7 terms, starting from x, x^3, ..., x^13 */ \
+ /* cos(r+r_delta) ~=~ cos(r) - r * r_delta */ \
+ VFLOAT rsq, rcube, r_to_6, s_corr, c_corr, r_prime, One, C; \
+ One = VFMV_VF (fp_posOne, vlen); \
+ rsq = __riscv_vfmul (r, r, vlen); \
+ rcube = __riscv_vfmul (rsq, r, vlen); \
+ r_to_6 = __riscv_vfmul (rcube, rcube, vlen); \
+ \
+ r_prime = __riscv_vfmul (r, -0x1.0p-1, vlen); \
+ C = __riscv_vfmacc (One, r_prime, r, vlen); \
+ s_corr = __riscv_vfmul (r_delta, C, vlen); \
+ \
+ c_corr = __riscv_vfsub (One, C, vlen); \
+ c_corr = __riscv_vfmacc (c_corr, r, r_prime, vlen); \
+ c_corr = __riscv_vfnmsac (c_corr, r, r_delta, vlen); \
+ \
+ VFLOAT poly_right = VFMV_VF (0x1.5d8b5ae12066ap-33, vlen); \
+ poly_right \
+ = __riscv_vfmerge (poly_right, -0x1.8f5dd75850673p-37, pick_c, vlen); \
+ poly_right = PSTEP_ab ( \
+ pick_c, -0x1.27e4f72551e3dp-22, 0x1.71de35553ddb6p-19, rsq, \
+ PSTEP_ab (pick_c, 0x1.1ee950032f74cp-29, -0x1.ae5e4b94836f8p-26, rsq, \
+ poly_right, vlen), \
+ vlen); \
+ \
+ VFLOAT poly_left = VFMV_VF (-0x1.a01a019be932ap-13, vlen); \
+ poly_left \
+ = __riscv_vfmerge (poly_left, 0x1.a01a019b77545p-16, pick_c, vlen); \
+ poly_left \
+ = PSTEP_ab (pick_c, 0x1.5555555555546p-5, -0x1.5555555555548p-3, rsq, \
+ PSTEP_ab (pick_c, -0x1.6c16c16c1450cp-10, \
+ 0x1.111111110f730p-7, rsq, poly_left, vlen), \
+ vlen); \
+ \
+ poly_right = __riscv_vfmadd (poly_right, r_to_6, poly_left, vlen); \
+ \
+ VFLOAT t = __riscv_vfmul (rsq, rsq, vlen); \
+ t = __riscv_vmerge (rcube, t, pick_c, vlen); \
+ /* t is r^3 for sin(r) and r^4 for cos(r) */ \
+ \
+ VFLOAT A = __riscv_vmerge (r, C, pick_c, vlen); \
+ VFLOAT a = __riscv_vmerge (s_corr, c_corr, pick_c, vlen); \
+ vy = __riscv_vfmadd (poly_right, t, a, vlen); \
+ vy = __riscv_vfadd (A, vy, vlen); \
+ \
+ n = __riscv_vsll (n, BIT_WIDTH - 2, vlen); \
+ vy = __riscv_vfsgnjx (vy, I_AS_F (n), vlen); \
+ n_lsb = __riscv_vsll (n_lsb, 63, vlen); \
+ vy = __riscv_vfsgnjx (vy, U_AS_F (n_lsb), vlen); \
+ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_cosh.c b/sysdeps/riscv/rvd/v_d_cosh.c
new file mode 100644
index 0000000000..0d4174abc9
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_cosh.c
@@ -0,0 +1,187 @@
+/* Double-precision vector cosh function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_COSHD_VSET_CONFIG
+
+#define COMPILE_FOR_COSH
+#include "rvvlm_hyperbolicsD.h"
+
+// This versions reduces argument to [-log2/2, log2/2]
+// Exploit common expressions exp(R) and exp(-R), and uses purely
+// floating point method to preserve precision
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, cosh) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx_orig, vx, vy, vy_special; \
+ VBOOL special_args; \
+ VUINT expo_x; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx_orig = x; \
+ expo_x = __riscv_vand (__riscv_vsrl (F_AS_U (vx_orig), MAN_LEN, vlen), \
+ 0x7FF, vlen); \
+ \
+ /* Set results for input of NaN and Inf and also for |x| very small */ \
+ EXCEPTION_HANDLING_HYPER (vx_orig, expo_x, special_args, vy_special, \
+ vlen); \
+ \
+ /* Both sinh and cosh have sign symmetry; suffices to work on |x|. \
+ // For sinh(x) = sign(x) * sinh(|x|) and cosh(x) = cosh(|x|).*/ \
+ vx = __riscv_vfsgnj (vx_orig, fp_posOne, vlen); \
+ \
+ /* Suffices to clip |x| to 714.0, which is bigger than 1030 log(2) */ \
+ vx = __riscv_vfmin (vx, 0x1.65p9, vlen); \
+ VINT n; \
+ VFLOAT r, r_delta; \
+ ARGUMENT_REDUCTION (vx, n, r, r_delta, vlen); \
+ \
+ /* At this point exp(x) = 2^n exp(r'), where r' = r + delta_r \
+ // sinh(x) or cosh(x) is 2^(n-1) ( exp(r') -/+ 2^(-2n) exp(-r') ) \
+ // Note that n >= 0. Moreover, the factor 2^(-2n) can be replaced by \
+ // s = 2^(-m), m = min(2n, 60) \
+ // sinh(x) / cosh(x) = 2^(n-1)(exp(r') -/+ s exp(-r')) \
+ \ \
+ // exp(r') and exp(-r') will be computed purely in floating point \
+ // using extra-precision simulation when needed \
+ // Note exp(t) is approximated by \
+ // 1 + t + t^2/2 + t^3(p_even(t^2) + t*p_odd(t^2)) \
+ // and thus exp(-t) is approximated \
+ // 1 - t + t^2/2 - t^3(p_even(t^2) - t*p_odd(t^2)) \
+ // So we compute the common expressions p_even and p_odd separately. \
+ // Moreover, they can be evaluated as r*r alone, not needing r_delta \
+ // because they are at least a factor of (log(2)/2)^2/6 smaller than the \
+ // final result of interest. */ \
+ \
+ VFLOAT rsq = __riscv_vfmul (r, r, vlen); \
+ VFLOAT rcube = __riscv_vfmul (rsq, r, vlen); \
+ \
+ VFLOAT p_even \
+ = PSTEP (0x1.555555555555ap-3, rsq, \
+ PSTEP (0x1.111111110ef6ap-7, rsq, \
+ PSTEP (0x1.a01a01b32b633p-13, rsq, \
+ PSTEP (0x1.71ddef82f4beep-19, \
+ 0x1.af6eacd796f0bp-26, rsq, vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VFLOAT p_odd = PSTEP (0x1.5555555553aefp-5, rsq, \
+ PSTEP (0x1.6c16c17a09506p-10, rsq, \
+ PSTEP (0x1.a019b37a2b3dfp-16, \
+ 0x1.289788d8bdadfp-22, rsq, vlen), \
+ vlen), \
+ vlen); \
+ \
+ VFLOAT p_pos = __riscv_vfmadd (p_odd, r, p_even, vlen); \
+ VFLOAT p_neg = __riscv_vfnmsub (p_odd, r, p_even, vlen); \
+ p_pos = __riscv_vfmul (p_pos, rcube, vlen); \
+ p_neg = __riscv_vfmul (p_neg, rcube, vlen); \
+ \
+ /* exp( r') is approximated by 1 + r' + (r')^2/2 + p_pos */ \
+ /* exp(-r') is approximated by 1 - r' + (r')^2/2 - p_neg */ \
+ \
+ VINT m = __riscv_vmin (__riscv_vadd (n, n, vlen), 60, vlen); \
+ VFLOAT s = U_AS_F (__riscv_vsll ( \
+ I_AS_U (__riscv_vrsub (m, EXP_BIAS, vlen)), MAN_LEN, vlen)); \
+ VFLOAT poly = __riscv_vfnmsac (p_pos, s, p_neg, vlen); \
+ /* sinh / cosh = (1 -/+ s) + ([r' + (r'2)^2/2] +/- s [r' - (r')^2/2]) + \
+ poly \
+ // We need r' +/- (r')^2/2 and their sum/diff to high precision \
+ // and 1 -/+ s to high precision */ \
+ VFLOAT r_half = __riscv_vfmul (r, 0x1.0p-1, vlen); \
+ VFLOAT B_plus = __riscv_vfmadd (r, r_half, r, vlen); \
+ VFLOAT b_plus \
+ = __riscv_vfmacc (__riscv_vfsub (r, B_plus, vlen), r, r_half, vlen); \
+ VFLOAT delta_b_plus = __riscv_vfmadd (r, r_delta, r_delta, vlen); \
+ b_plus = __riscv_vfadd (b_plus, delta_b_plus, vlen); \
+ VFLOAT B_minus = __riscv_vfnmsub (r, r_half, r, vlen); \
+ VFLOAT b_minus = __riscv_vfnmsac (__riscv_vfsub (r, B_minus, vlen), r, \
+ r_half, vlen); \
+ VFLOAT delta_b_minus = __riscv_vfnmsub (r, r_delta, r_delta, vlen); \
+ b_minus = __riscv_vfadd (b_minus, delta_b_minus, vlen); \
+ VFLOAT B = __riscv_vfnmsub (B_minus, s, B_plus, vlen); \
+ VFLOAT b = __riscv_vfnmsac (__riscv_vfsub (B_plus, B, vlen), s, B_minus, \
+ vlen); \
+ b = __riscv_vfadd (b, __riscv_vfnmsub (b_minus, s, b_plus, vlen), vlen); \
+ VBOOL n_large = __riscv_vmsge (n, 50, vlen); \
+ VFLOAT s_hi = s; \
+ VFLOAT s_lo; \
+ s_lo = U_AS_F (__riscv_vxor (F_AS_U (s_lo), F_AS_U (s_lo), vlen)); \
+ s_hi = __riscv_vfmerge (s_hi, fp_posZero, n_large, vlen); \
+ s_lo = __riscv_vmerge (s_lo, s, n_large, vlen); \
+ VFLOAT A = __riscv_vfadd (s_hi, fp_posOne, vlen); \
+ b = __riscv_vfadd (b, s_lo, vlen); \
+ VFLOAT Z_hi, Z_lo; \
+ FAST2SUM (B, poly, Z_hi, Z_lo, vlen); \
+ b = __riscv_vfadd (b, Z_lo, vlen); \
+ B = Z_hi; \
+ FAST2SUM (A, B, Z_hi, Z_lo, vlen); \
+ b = __riscv_vfadd (b, Z_lo, vlen); \
+ vy = __riscv_vfadd (Z_hi, b, vlen); \
+ \
+ /* scale vy by 2^(n-1) */ \
+ n = __riscv_vsub (n, 1, vlen); \
+ FAST_LDEXP (vy, n, vlen); \
+ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_cospi.c b/sysdeps/riscv/rvd/v_d_cospi.c
new file mode 100644
index 0000000000..31b4f0128b
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_cospi.c
@@ -0,0 +1,182 @@
+/* Double-precision vector cospi function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_COSPID_VSET_CONFIG
+
+#define COMPILE_FOR_COSPI
+#include "rvvlm_trigD.h"
+
+// This versions reduces argument to [-pi/4, pi/4] and computes sin(r) or
+// cos(r) by merging the appropriate coefficients into a vector register
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, cospi) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx_orig, vx, vy, vy_special; \
+ VBOOL special_args; \
+ VUINT expo_x; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx_orig = x; \
+ vx = __riscv_vfsgnj (vx_orig, fp_posOne, vlen); \
+ expo_x = __riscv_vsrl (F_AS_U (vx), MAN_LEN, vlen); \
+ \
+ /* Set results for input of NaN and Inf and also for |x| very small */ \
+ EXCEPTION_HANDLING_TRIG (vx_orig, expo_x, special_args, vy_special, \
+ vlen); \
+ \
+ VBOOL x_large \
+ = __riscv_vmsgeu (expo_x, EXP_BIAS + 53, vlen); /* |x| >= 2^(53) */ \
+ vx = __riscv_vfmerge (vx, fp_posZero, x_large, vlen); \
+ \
+ /* Usual argument reduction \
+ // N = rint(2x); rem := 2x - N, |rem| <= 1/2 and x = (N/2) + (rem/2); \
+ // x pi = N (pi/2) + rem * (pi/2) */ \
+ VFLOAT two_x = __riscv_vfadd (vx, vx, vlen); \
+ VINT n = __riscv_vfcvt_x (two_x, vlen); \
+ VFLOAT n_flt = __riscv_vfcvt_f (n, vlen); \
+ VFLOAT rem = __riscv_vfsub (two_x, n_flt, vlen); \
+ VBOOL x_is_n_piby2 = __riscv_vmseq (F_AS_U (rem), 0, vlen); \
+ /* Now rem * pi_by_2 as r + r_delta */ \
+ VFLOAT r = __riscv_vfmul (rem, PIBY2_HI, vlen); \
+ VFLOAT r_delta = __riscv_vfmsac (r, PIBY2_HI, rem, vlen); \
+ r_delta = __riscv_vfmacc (r_delta, PIBY2_MID, rem, vlen); \
+ /* At this point, r + r_delta is an accurate reduced argument PROVIDED */ \
+ \
+ VUINT n_lsb = __riscv_vand (I_AS_U (n), 0x1, vlen); \
+ VBOOL pick_c = __riscv_vmseq (n_lsb, 0, vlen); \
+ \
+ VBOOL exact_zero = __riscv_vmandn (x_is_n_piby2, pick_c, vlen); \
+ \
+ /* Instead of always computing both sin(r) and cos(r) for |r| <= pi/4 \
+ // We merge the sin and cos case together in picking the correct \
+ // polynomial coefficients. This way we save on the bulk of the poly \
+ // computation except for a couple of terms. \
+ \ \
+ // This standard algorithm either computes sin(r+r_delta) or \
+ // cos(r+r_delta), depending on the parity of n \
+ // Note that sin(t) = t + t^3(s_poly(t^2)) \
+ // and cos(t) = 1 - t^2/2 + t^4(c_poly(t^2)) \
+ // where s_poly and c_poly are of the same degree. Hence \
+ // it suffices to load the coefficient vector with the correct \
+ // coefficients for s_poly or c_poly. We compute the needed s_poly or \
+ c_poly \
+ // without wasteful operations. (That is, computing s_poly for all r \
+ // and c_poly for all r and in general discarding half of these results.) \
+ // \
+ \ \
+ // sin(r+r_delta) ~=~ sin(r) + r_delta(1 - r^2/2) \
+ // sin(r) is approximated by 7 terms, starting from x, x^3, ..., x^13 \
+ // cos(r+r_delta) ~=~ cos(r) - r * r_delta \
+ // */ \
+ VFLOAT rsq, rcube, r_to_6, s_corr, c_corr, r_prime, One, C; \
+ One = VFMV_VF (fp_posOne, vlen); \
+ rsq = __riscv_vfmul (r, r, vlen); \
+ rcube = __riscv_vfmul (rsq, r, vlen); \
+ r_to_6 = __riscv_vfmul (rcube, rcube, vlen); \
+ \
+ r_prime = __riscv_vfmul (r, -0x1.0p-1, vlen); \
+ C = __riscv_vfmacc (One, r_prime, r, vlen); \
+ s_corr = __riscv_vfmul (r_delta, C, vlen); \
+ \
+ c_corr = __riscv_vfsub (One, C, vlen); \
+ c_corr = __riscv_vfmacc (c_corr, r, r_prime, vlen); \
+ c_corr = __riscv_vfnmsac (c_corr, r, r_delta, vlen); \
+ \
+ VFLOAT poly_right = VFMV_VF (0x1.5d8b5ae12066ap-33, vlen); \
+ poly_right \
+ = __riscv_vfmerge (poly_right, -0x1.8f5dd75850673p-37, pick_c, vlen); \
+ poly_right = PSTEP_ab ( \
+ pick_c, -0x1.27e4f72551e3dp-22, 0x1.71de35553ddb6p-19, rsq, \
+ PSTEP_ab (pick_c, 0x1.1ee950032f74cp-29, -0x1.ae5e4b94836f8p-26, rsq, \
+ poly_right, vlen), \
+ vlen); \
+ \
+ VFLOAT poly_left = VFMV_VF (-0x1.a01a019be932ap-13, vlen); \
+ poly_left \
+ = __riscv_vfmerge (poly_left, 0x1.a01a019b77545p-16, pick_c, vlen); \
+ poly_left \
+ = PSTEP_ab (pick_c, 0x1.5555555555546p-5, -0x1.5555555555548p-3, rsq, \
+ PSTEP_ab (pick_c, -0x1.6c16c16c1450cp-10, \
+ 0x1.111111110f730p-7, rsq, poly_left, vlen), \
+ vlen); \
+ \
+ poly_right = __riscv_vfmadd (poly_right, r_to_6, poly_left, vlen); \
+ \
+ VFLOAT t = __riscv_vfmul (rsq, rsq, vlen); \
+ t = __riscv_vmerge (rcube, t, pick_c, vlen); \
+ /* t is r^3 for sin(r) and r^4 for cos(r) */ \
+ \
+ VFLOAT A = __riscv_vmerge (r, C, pick_c, vlen); \
+ VFLOAT a = __riscv_vmerge (s_corr, c_corr, pick_c, vlen); \
+ vy = __riscv_vfmadd (poly_right, t, a, vlen); \
+ vy = __riscv_vfadd (A, vy, vlen); \
+ \
+ n = __riscv_vsll (n, BIT_WIDTH - 2, vlen); \
+ vy = __riscv_vfsgnjx (vy, I_AS_F (n), vlen); \
+ n_lsb = __riscv_vsll (n_lsb, 63, vlen); \
+ vy = __riscv_vfsgnjx (vy, U_AS_F (n_lsb), vlen); \
+ \
+ vy = __riscv_vmerge (vy, VFMV_VF (fp_posZero, vlen), exact_zero, vlen); \
+ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_erf.c b/sysdeps/riscv/rvd/v_d_erf.c
new file mode 100644
index 0000000000..1a20d9d4c1
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_erf.c
@@ -0,0 +1,269 @@
+/* Double-precision vector erf function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_ERFD_VSET_CONFIG
+
+#define COMPILE_FOR_ERF
+#include "rvvlm_errorfuncsD.h"
+
+// T is 2.0
+#define T 0x1.0p+1
+
+// For x in [0, T] odd-polynomial
+// coefficients P_1 to P_17 are in fixed-point
+// scaled so that they have high precision
+#define P_1 0x120dd750429b6d0f // Q60
+#define P_3 -0x1812746b0379e00c // Q62
+#define P_5 0x1ce2f21a04292b5f // Q64
+#define P_7 -0x1b82ce31281b38e1 // Q66
+#define P_9 0x1565bcd0dd0bcd58 // Q68
+#define P_11 -0xe016d9f815a019d // Q70
+#define P_13 0x7e68c976c0ebcdc // Q72
+#define P_15 -0x3e9a49c76e6ee9a // Q74
+#define P_17 0x1b9e64a589f8da9 // Q76
+#define P_19 -0x1.5f70cd90f1878p-23
+#define P_21 0x1.fca2b5f17c85ap-27
+#define P_23 -0x1.514eafaeffc30p-30
+#define P_25 0x1.9b3583b6b826dp-34
+#define P_27 -0x1.c97ffcf4f4e22p-38
+#define P_29 0x1.c2f4a46d3297dp-42
+#define P_31 -0x1.6ef3c7000b58bp-46
+#define P_33 0x1.ac36453182837p-51
+#define P_35 -0x1.0482966738f0ep-56
+
+// For x in (T, 6.0], general polynomial
+// Coefficients Q_0 through Q_8 are in fixed points
+#define Q_0 0xffff87b6641370f // Q60
+#define Q_1 -0x9062a79f9b29022 // Q62
+#define Q_2 -0x11dc7e40e4efb77d // Q64
+#define Q_3 -0x1dd1004e1f59ed4 // Q66
+#define Q_4 0x1980c051527d41e7 // Q68
+#define Q_5 0x902cddcb829790b // Q70
+#define Q_6 -0x33d6f572cdbfa228 // Q72
+#define Q_7 0x425f9974bef87221 // Q74
+#define Q_8 -0x5363e91dfca5d4df // Q76
+#define Q_9 0x1.b5eea4ad8cdbfp-16
+#define Q_10 -0x1.ded0a34468c8cp-18
+#define Q_11 0x1.af4968b4d634ap-20
+#define Q_12 -0x1.51de51c57f11ap-22
+#define Q_13 0x1.cbbf535e64b65p-25
+#define Q_14 -0x1.025a03d4fdf7bp-27
+#define Q_15 0x1.c735f1e16e8cdp-31
+#define Q_16 -0x1.2de00f5eeee49p-34
+#define Q_17 0x1.219bdcb68d070p-38
+#define Q_18 -0x1.7b5fc54357bcfp-43
+#define Q_19 0x1.301ac8caec6e3p-48
+#define Q_20 -0x1.c3232aa28d427p-55
+
+// The error function erf is an odd function: erf(-x) = -erf(x)
+// and thus we compute erf(|x|) and restore the sign at the end.
+// For x >= 6, erf(x) rounds to 1.0
+// The algorithm uses two approximation methods on [0, T], and
+// (T, 6.]. For the first region, we approximate with an odd
+// polynomial. For the second region, the polynomial used actually
+// approximates (erfc(x))^(1/8). The desired result is 1 - (poly(x))^8
+// Some algorithm for erf approximates log(erfc(x)) for x large. But
+// this requires an evaluation of expm1(y) after the polynomial approximation.
+// We essentially replaced the the cost of expm1 with 3 multiplications.
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, erf) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx, vx_orig, vy, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx = x; \
+ vx_orig = vx; \
+ \
+ /* Handle Inf and NaN */ \
+ EXCEPTION_HANDLING_ERF (vx, special_args, vy_special, vlen); \
+ \
+ /* At this point, vx is 0 or >= 2^(-30). Can saturate vx at 6.0 */ \
+ vx = __riscv_vfsgnj (vx, fp_posOne, vlen); \
+ vx = __riscv_vfmin (vx, 0x1.8p+2, vlen); \
+ \
+ VBOOL x_gt_T = __riscv_vmfgt (vx, T, vlen); \
+ VFLOAT r, delta_r, xsq; \
+ xsq = __riscv_vfmul (vx, vx, vlen); \
+ r = __riscv_vmerge (xsq, vx, x_gt_T, vlen); \
+ delta_r \
+ = I_AS_F (__riscv_vxor (F_AS_I (delta_r), F_AS_I (delta_r), vlen)); \
+ delta_r = __riscv_vmerge (__riscv_vfmsub (vx, vx, xsq, vlen), delta_r, \
+ x_gt_T, vlen); \
+ \
+ /* Compute polynomial in r. \
+ // For x in [0, T], r = x*x \
+ // the polynomial in r is x*(p_1 + p_3 r + p_5 r^2 ... + p_35 r^22) \
+ // For x in (T, 6], r = x \
+ // the polynomial in r is q_0 + q_1 r + q_2 r^2 + ... + q_20 r^20 \
+ // The higher order of the polynomial is computed in floating point; \
+ // the lower order part (more significant) are then done in fixed \
+ point \
+ // Both lower parts have 17 coefficients and so can be done with the \
+ // exact instruction sequence using the corresponding coefficients */ \
+ \
+ VFLOAT poly = PSTEP (Q_18, r, PSTEP (Q_19, Q_20, r, vlen), vlen); \
+ \
+ VFLOAT poly_right; \
+ poly_right = I_AS_F ( \
+ __riscv_vxor (F_AS_I (poly_right), F_AS_I (poly_right), vlen)); \
+ poly_right = __riscv_vmerge (poly_right, poly, x_gt_T, vlen); \
+ \
+ poly_right = PSTEP_ab (x_gt_T, Q_17, P_35, r, poly_right, vlen); \
+ poly_right = PSTEP_ab (x_gt_T, Q_16, P_33, r, poly_right, vlen); \
+ poly_right = PSTEP_ab (x_gt_T, Q_15, P_31, r, poly_right, vlen); \
+ poly_right = PSTEP_ab (x_gt_T, Q_14, P_29, r, poly_right, vlen); \
+ poly_right = PSTEP_ab (x_gt_T, Q_13, P_27, r, poly_right, vlen); \
+ \
+ VFLOAT r4 = __riscv_vfmul (r, r, vlen); \
+ r4 = __riscv_vfmul (r4, r4, vlen); \
+ \
+ VFLOAT poly_left = VFMV_VF (P_25, vlen); \
+ poly_left = __riscv_vfmerge (poly_left, Q_12, x_gt_T, vlen); \
+ poly_left = PSTEP_ab (x_gt_T, Q_11, P_23, r, poly_left, vlen); \
+ poly_left = PSTEP_ab (x_gt_T, Q_10, P_21, r, poly_left, vlen); \
+ poly_left = PSTEP_ab (x_gt_T, Q_9, P_19, r, poly_left, vlen); \
+ \
+ poly = __riscv_vfmadd (poly_right, r4, poly_left, vlen); \
+ VINT POLY = __riscv_vfcvt_x (__riscv_vfmul (poly, 0x1.0p78, vlen), vlen); \
+ \
+ VINT R = __riscv_vfcvt_x (__riscv_vfmul (r, 0x1.0p60, vlen), vlen); \
+ VINT D_R \
+ = __riscv_vfcvt_x (__riscv_vfmul (delta_r, 0x1.0p60, vlen), vlen); \
+ R = __riscv_vadd (R, D_R, vlen); \
+ /* POLY is in Q78, R is in Q60 */ \
+ \
+ VINT COEFF = __riscv_vmerge (VMVI_VX (P_17, vlen), Q_8, x_gt_T, vlen); \
+ POLY = __riscv_vsll (__riscv_vsmul (R, POLY, 1, vlen), 1, vlen); \
+ POLY = __riscv_vadd (POLY, COEFF, vlen); /* Q76 */ \
+ \
+ COEFF = __riscv_vmerge (VMVI_VX (P_15, vlen), Q_7, x_gt_T, vlen); \
+ POLY = __riscv_vsll (__riscv_vsmul (R, POLY, 1, vlen), 1, vlen); \
+ POLY = __riscv_vadd (POLY, COEFF, vlen); /* Q74 */ \
+ \
+ COEFF = __riscv_vmerge (VMVI_VX (P_13, vlen), Q_6, x_gt_T, vlen); \
+ POLY = __riscv_vsll (__riscv_vsmul (R, POLY, 1, vlen), 1, vlen); \
+ POLY = __riscv_vadd (POLY, COEFF, vlen); /* Q72 */ \
+ \
+ COEFF = __riscv_vmerge (VMVI_VX (P_11, vlen), Q_5, x_gt_T, vlen); \
+ POLY = __riscv_vsll (__riscv_vsmul (R, POLY, 1, vlen), 1, vlen); \
+ POLY = __riscv_vadd (POLY, COEFF, vlen); /* Q70 */ \
+ \
+ COEFF = __riscv_vmerge (VMVI_VX (P_9, vlen), Q_4, x_gt_T, vlen); \
+ POLY = __riscv_vsll (__riscv_vsmul (R, POLY, 1, vlen), 1, vlen); \
+ POLY = __riscv_vadd (POLY, COEFF, vlen); /* Q68 */ \
+ \
+ COEFF = __riscv_vmerge (VMVI_VX (P_7, vlen), Q_3, x_gt_T, vlen); \
+ POLY = __riscv_vsll (__riscv_vsmul (R, POLY, 1, vlen), 1, vlen); \
+ POLY = __riscv_vadd (POLY, COEFF, vlen); /* Q66 */ \
+ \
+ COEFF = __riscv_vmerge (VMVI_VX (P_5, vlen), Q_2, x_gt_T, vlen); \
+ POLY = __riscv_vsll (__riscv_vsmul (R, POLY, 1, vlen), 1, vlen); \
+ POLY = __riscv_vadd (POLY, COEFF, vlen); /* Q64 */ \
+ \
+ COEFF = __riscv_vmerge (VMVI_VX (P_3, vlen), Q_1, x_gt_T, vlen); \
+ POLY = __riscv_vsll (__riscv_vsmul (R, POLY, 1, vlen), 1, vlen); \
+ POLY = __riscv_vadd (POLY, COEFF, vlen); /* Q62 */ \
+ \
+ COEFF = __riscv_vmerge (VMVI_VX (P_1, vlen), Q_0, x_gt_T, vlen); \
+ POLY = __riscv_vsll (__riscv_vsmul (R, POLY, 1, vlen), 1, vlen); \
+ POLY = __riscv_vadd (POLY, COEFF, vlen); /* Q60 */ \
+ \
+ VINT POLY_RIGHT = __riscv_vsll (POLY, 3, vlen); /* Q63 */ \
+ POLY_RIGHT = __riscv_vsmul (POLY_RIGHT, POLY_RIGHT, 1, vlen); \
+ POLY_RIGHT = __riscv_vsmul (POLY_RIGHT, POLY_RIGHT, 1, vlen); \
+ POLY_RIGHT = __riscv_vsmul (POLY_RIGHT, POLY_RIGHT, 1, vlen); \
+ /* POLY_RIGHT is POLY^8 */ \
+ \
+ /* convert x to fixed-point Q62+m, 2^m <= x < 2^(m+1) */ \
+ VINT e = __riscv_vsra (F_AS_I (vx), MAN_LEN, vlen); \
+ e = __riscv_vmax (e, EXP_BIAS - 40, vlen); \
+ e = __riscv_vrsub (e, 2 * EXP_BIAS + 62, vlen); \
+ VFLOAT scale = I_AS_F (__riscv_vsll (e, MAN_LEN, vlen)); \
+ /* scale is 2^(62-m), X is x in Q_(62-m) */ \
+ VINT X = __riscv_vfcvt_x (__riscv_vfmul (vx, scale, vlen), vlen); \
+ POLY = __riscv_vsmul (X, POLY, 1, vlen); \
+ /* X is Q_(62-m) POLY is now Q_(59-m) */ \
+ /* x in [0, T], POLY is result in Q 59-m */ \
+ \
+ /* x in (T, 6], result is 1 - 2^(-63) POLY_RIGHT */ \
+ /* that is, 2^(-62)(2^62 - (POLY_RIGHT>>1)) */ \
+ INT one = (1LL << 62); \
+ POLY_RIGHT = __riscv_vsra (POLY_RIGHT, 1, vlen); \
+ POLY_RIGHT = __riscv_vrsub (POLY_RIGHT, one, vlen); \
+ \
+ POLY = __riscv_vmerge (POLY, POLY_RIGHT, x_gt_T, vlen); \
+ /* POLY contains the result in fixed point \
+ // scale is 59-m for x in [0, T] and 62 for x > T */ \
+ \
+ e = __riscv_vrsub (e, 2 * EXP_BIAS + 3, vlen); \
+ /* exponent field of 2^(-59+m) */ \
+ e = __riscv_vmerge (e, EXP_BIAS - 62, x_gt_T, vlen); \
+ scale = I_AS_F (__riscv_vsll (e, MAN_LEN, vlen)); \
+ \
+ vy = __riscv_vfcvt_f (POLY, vlen); \
+ vy = __riscv_vfmul (vy, scale, vlen); \
+ vy = __riscv_vfsgnj (vy, vx_orig, vlen); \
+ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ /* copy vy into y and increment addr pointers */ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_erfc.c b/sysdeps/riscv/rvd/v_d_erfc.c
new file mode 100644
index 0000000000..ca6b1196a6
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_erfc.c
@@ -0,0 +1,258 @@
+/* Double-precision vector erfc function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_ERFCD_VSET_CONFIG
+
+#define COMPILE_FOR_ERFC
+#include "rvvlm_errorfuncsD.h"
+
+#if defined(COMPILE_FOR_ERFC)
+// polynomial coefficients Q62
+#define P_0 0x4f33682d757709e8
+#define P_1 -0x95970864bc25c71
+#define P_2 0xa377a56796fd6f
+#define P_3 0x5ea2d221c412d2d
+#define P_4 -0x8f0caa24847e2a3
+#define P_5 0x8ac6781d49af506
+#define P_6 -0x67476ebb9bc1f58
+#define P_7 0x3d0ed00f93b86cb
+#define P_8 -0x1c36fb9d9556ac0
+#define P_9 0x96c3f45eaad23b
+#define P_10 -0x1a6d434ab9ada1
+#define P_11 -0x4dd9356c9c3f8
+#define P_12 0x4bb31b11d0a1a
+#define P_13 -0xf2d325083d5b
+#define P_14 -0x52720383749f
+#define P_15 0x33f7f3f6cb7d
+#define P_16 0x4ed13a394f
+#define P_17 -0x770e9d9af50
+#define P_18 0x108f3f3cf59
+#define P_19 0x101b7f3c485
+#define P_20 -0x3ab6fb75ad
+#define P_21 -0x237088721c
+#define P_22 0x6ed93407e
+#define P_23 0x3dbfb2c72
+#define NEG_A_SCALED -0x1.ep+64
+#define B_FOR_TRANS 0x1.04p+2
+#define MIN_CLIP 0x1.0p-60
+#define MAX_CLIP 0x1.cp4
+#else
+// polynomial coefficients in Q63
+#define P_0 0x6c25c9f6cfd132e7
+#define P_1 -0x5abb8f458c7895f
+#define P_2 -0x5ea2dcf3956792c
+#define P_3 0xdd22963d83fa7d8
+#define P_4 -0x107d667db8b90c84
+#define P_5 0xea0acc44786d840
+#define P_6 -0xa5e5b52ef29e23a
+#define P_7 0x5ef73d5784d9dc6
+#define P_8 -0x2acb1deb9208ae5
+#define P_9 0xdf0d75186479cf
+#define P_10 -0x25493132730985
+#define P_11 -0x7daed4327549c
+#define P_12 0x6ff2fb205b4f9
+#define P_13 -0x15242feefcc0f
+#define P_14 -0x7f14d7432d2b
+#define P_15 0x4b2791427dab
+#define P_16 0x17d0499cfa7
+#define P_17 -0xae9fb960b85
+#define P_18 0x15d4aa6975c
+#define P_19 0x17cff734612
+#define P_20 -0x505ad971f3
+#define P_21 -0x34366c3ea9
+#define P_22 0x97dfa0691
+#define P_23 0x591d3b55a
+#define NEG_A_SCALED -0x1.536p+65
+#define B_FOR_TRANS 0x1.6fap+2
+#define MIN_CLIP 0x1.0p-60
+#define MAX_CLIP 0x1.4p5
+#endif
+
+// When COMPILE_FOR_ERFC
+// The main computation is for erfc(|x|) and exploits the symmetry
+// erfc(-|x|) = 2 - erfc(|x|)
+// When COMPILE_FOR_CDFNORM
+// The main compputation is for cdfnorm(-|x|) and exploits the symmetry
+// cdfnorm(|x|) = 1 - cdfnorm(-|x|)
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, erfc) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx, vx_orig, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx = x; \
+ vx_orig = vx; \
+ \
+ /* Handle Inf and NaN */ \
+ EXCEPTION_HANDLING (vx, special_args, vy_special, vlen); \
+ \
+ /* suffices to focus on |x| clipped to [2^-60, 28] */ \
+ vx = __riscv_vfsgnj (vx, fp_posOne, vlen); \
+ vx = __riscv_vfmin (vx, MAX_CLIP, vlen); \
+ vx = __riscv_vfmax (vx, MIN_CLIP, vlen); \
+ \
+ VINT R; \
+ /* Compute (x-a)/(x+b) as Q63 fixed-point */ \
+ X_TRANSFORM (vx, NEG_A_SCALED, B_FOR_TRANS, R, vlen); \
+ \
+ VINT n, A; \
+ VFLOAT vy = vx; \
+ /* Compute exp(-x*x) or exp(-x*x/2) as 2^n a \
+ // but return a as Q62 fixed-point A */ \
+ EXP_negAB (vx, vy, n, A, vlen); \
+ \
+ /* Approximate exp(x*x)*(1+2x)*erfc(x) \
+ // or exp(x*x/2)*(1+2x)*cdfnorm(-x) \
+ // using a polynomial in r = (x-a)/(x+b) \
+ // We use fixed-point computing \
+ // -1 < r < 1, thus using Q63 fixed-point for r \
+ // All coefficients are scaled the same and thus \
+ // the final value is in this scaling. \
+ // Scale is 2^62 for erfc and 2^63 for cdfnorm */ \
+ VINT P_RIGHT = PSTEP_I ( \
+ P_16, R, \
+ PSTEP_I (P_17, R, \
+ PSTEP_I (P_18, R, \
+ PSTEP_I (P_19, R, \
+ PSTEP_I (P_20, R, \
+ PSTEP_I (P_21, R, \
+ PSTEP_I (P_22, P_23, R, \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VINT R8 = __riscv_vsmul (R, R, 1, vlen); \
+ R8 = __riscv_vsmul (R8, R8, 1, vlen); \
+ R8 = __riscv_vsmul (R8, R8, 1, vlen); \
+ \
+ VINT P_MID = PSTEP_I ( \
+ P_8, R, \
+ PSTEP_I (P_9, R, \
+ PSTEP_I (P_10, R, \
+ PSTEP_I (P_11, R, \
+ PSTEP_I (P_12, R, \
+ PSTEP_I (P_13, R, \
+ PSTEP_I (P_14, P_15, R, \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P_RIGHT = __riscv_vsmul (R8, P_RIGHT, 1, vlen); \
+ P_RIGHT = __riscv_vadd (P_RIGHT, P_MID, vlen); \
+ P_RIGHT = __riscv_vsmul (R8, P_RIGHT, 1, vlen); \
+ \
+ VINT P_LEFT = PSTEP_I ( \
+ P_0, R, \
+ PSTEP_I ( \
+ P_1, R, \
+ PSTEP_I (P_2, R, \
+ PSTEP_I (P_3, R, \
+ PSTEP_I (P_4, R, \
+ PSTEP_I (P_5, R, \
+ PSTEP_I (P_6, P_7, R, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VINT P = __riscv_vadd (P_LEFT, P_RIGHT, vlen); \
+ \
+ VINT m, B; \
+ RECIP_SCALE (vx, B, m, vlen); \
+ \
+ /* exp(-x^2) is 2^n * 2^(-62) * A \
+ // 1/(1+2x) is 2^(-m) * B, m >= 62 \
+ // exp(x^2)(1+2x)erfc(x) is 2^(-62) * P */ \
+ P = __riscv_vsmul (P, A, 1, vlen); /* Q61 */ \
+ P = __riscv_vsmul (P, B, 1, vlen); /* Q(m-2) */ \
+ n = __riscv_vsub (n, m, vlen); \
+ n = __riscv_vadd (n, 2, vlen); /* n <= -60 */ \
+ \
+ VUINT ell = I_AS_U (__riscv_vrsub (n, -60, vlen)); \
+ ell = __riscv_vminu (ell, 63, vlen); \
+ VINT PP = __riscv_vsra (P, ell, vlen); \
+ VINT Q = VMVI_VX (1, vlen); \
+ Q = __riscv_vsll (Q, 61, vlen); \
+ Q = __riscv_vsub (Q, PP, vlen); \
+ VFLOAT vz = __riscv_vfcvt_f (Q, vlen); \
+ vz = __riscv_vfmul (vz, 0x1.0p-60, vlen); \
+ \
+ vy = __riscv_vfcvt_f (P, vlen); \
+ FAST_LDEXP (vy, n, vlen); \
+ /* vy is erfc(|x|) at this point */ \
+ \
+ VBOOL x_is_neg = __riscv_vmflt (vx_orig, fp_posZero, vlen); \
+ vy = __riscv_vmerge (vy, vz, x_is_neg, vlen); \
+ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_erfcinv.c b/sysdeps/riscv/rvd/v_d_erfcinv.c
new file mode 100644
index 0000000000..f979811598
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_erfcinv.c
@@ -0,0 +1,283 @@
+/* Double-precision vector erfcinv function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_ERFCINVD_VSET_CONFIG
+
+#define COMPILE_FOR_ERFCINV
+#include "rvvlm_inverrorfuncsD.h"
+
+// Erfcinv is defined on (0, 2). Suffices to consider (0, 1]
+// Two regions of approximation: left is [0, 0x1.2p-2) and right is [0x1.2p-1,
+// 1) Both are done with rational functions. For right, t*P(t)/Q(t) t = 1-x; x
+// in [0x1.2p-1, 1) For left, y*P(t)/Q(t), y = sqrt(-log(x)); and t = 1/y
+
+// P_coefficients in asending order, all in Q79. p0_delta is in floating point
+#define P_right_0 -0x48dbe9f5b3eabaa
+#define P_right_1 -0xb35279f1a626ae5
+#define P_right_2 0x33789911873d184a
+#define P_right_3 -0x1bf9138fc77c0fbf
+#define P_right_4 -0x2d4ec43bc48403d4
+#define P_right_5 0x2d61deb53842cca1
+#define P_right_6 0x23324eca6b3ff02
+#define P_right_7 -0xd4ec1d31542c4fc
+#define P_right_8 0x2ecf3c60308b0f2
+#define P_right_9 0x7c917b3378071e
+#define P_right_10 -0x1e09b597f226ca
+#define DELTA_P0_right -0x1.ec7dc41c17860p-2
+
+// Q_coefficients in asending order, all in Q79. q0_delta is in floating point
+#define Q_right_0 -0x52366e5b14c0970
+#define Q_right_1 -0xca57e95abcc599b
+#define Q_right_2 0x3b6c91ec67f5759c
+#define Q_right_3 -0x1c40d5daa3be22bc
+#define Q_right_4 -0x41f11eb5d837386c
+#define Q_right_5 0x3c6ce478fcd75c9a
+#define Q_right_6 0xbb1cd7270cfba1d
+#define Q_right_7 -0x1988a4116498f1af
+#define Q_right_8 0x44dc3042f103d20
+#define Q_right_9 0x2390e683d02edf3
+#define Q_right_10 -0x8ec66f2a7e410c
+#define DELTA_Q0_right -0x1.29a0161e99446p-3
+
+// P_coefficients in asending order, all in Q67. p0_delta is in floating point
+#define P_left_0 0x17a0bb69321df
+#define P_left_1 0x402eb416ae6015
+#define P_left_2 0x2973eb18028ce34
+#define P_left_3 0x8034a7ece1d5370
+#define P_left_4 0xa76c08a74dae273
+#define P_left_5 0x11dd3876b83dd078
+#define P_left_6 0xfdd7693c3b77653
+#define P_left_7 0xb33d66152b3c223
+#define P_left_8 0x5a564c28c6a41a9
+#define P_left_9 0x1190449fe630213
+#define P_left_10 -0x659c784274e1
+#define DELTA_P0_left -0x1.d622f4cbe0eeep-2
+
+// Q_coefficients in asending order, all in Q67. q0_delta is in floating point
+#define Q_left_0 0x17a09aabf9cee
+#define Q_left_1 0x4030b9059ffcad
+#define Q_left_2 0x29b26b0d87f7855
+#define Q_left_3 0x87572a13d3fa2dd
+#define Q_left_4 0xd7a728b5620ac3c
+#define Q_left_5 0x1754392b473fd439
+#define Q_left_6 0x1791b9a091a816c2
+#define Q_left_7 0x167f71db9e13b075
+#define Q_left_8 0xcb9f5f3e5e618a4
+#define Q_left_9 0x68271fae767c68e
+#define Q_left_10 0x13745c4fa224b25
+#define DELTA_Q0_left 0x1.f7e7557a34ae6p-2
+
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, erfcinv) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx, vx_sign, vy, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx = x; \
+ \
+ /* Handle Inf and NaN */ \
+ EXCEPTION_HANDLING_ERFCINV (vx, special_args, vy_special, vlen); \
+ \
+ vx_sign = __riscv_vfrsub (vx, fp_posOne, vlen); \
+ VFLOAT two_minus_x = __riscv_vfadd (vx_sign, fp_posOne, vlen); \
+ VBOOL x_gt_1 = __riscv_vmflt (vx_sign, fp_posZero, vlen); \
+ vx = __riscv_vmerge (vx, two_minus_x, x_gt_1, vlen); \
+ /* vx is now in (0, 1] */ \
+ VBOOL x_in_left = __riscv_vmfle (vx, 0x1.2p-2, vlen); \
+ \
+ VFLOAT w_hi, w_lo, w_hi_left, w_lo_left, y_hi, y_lo; \
+ VINT T, T_left, T_tiny; \
+ VBOOL x_is_tiny; \
+ x_is_tiny = __riscv_vmxor (x_is_tiny, x_is_tiny, vlen); \
+ \
+ if (__riscv_vcpop (x_in_left, vlen) > 0) \
+ { \
+ VFLOAT x_left = VFMV_VF (0x1.0p-3, vlen); \
+ x_left = __riscv_vmerge (x_left, vx, x_in_left, vlen); \
+ x_is_tiny = __riscv_vmflt (x_left, 0x1.0p-52, vlen); \
+ INT n_adjust = 60; \
+ x_left = __riscv_vfmul (x_left, 0x1.0p60, vlen); \
+ NEG_LOGX_4_TRANSFORM (x_left, n_adjust, y_hi, y_lo, vlen); \
+ \
+ SQRTX_4_TRANSFORM (y_hi, y_lo, w_hi_left, w_lo_left, T_left, \
+ 0x1.0p63, 0x1.0p-63, vlen); \
+ if (__riscv_vcpop (x_is_tiny, vlen) > 0) \
+ { \
+ VFLOAT w_hi_dummy, w_lo_dummy; \
+ SQRTX_4_TRANSFORM (y_hi, y_lo, w_hi_dummy, w_lo_dummy, T_tiny, \
+ 0x1.0p64, 0x1.0p-64, vlen); \
+ } \
+ } \
+ w_hi = VFMV_VF (fp_posOne, vlen); \
+ w_hi = __riscv_vfsub (w_hi, vx, vlen); \
+ w_lo = __riscv_vfrsub (w_hi, fp_posOne, vlen); \
+ w_lo = __riscv_vfsub (w_lo, vx, vlen); \
+ T = __riscv_vfcvt_x (__riscv_vfmul (w_hi, 0x1.0p63, vlen), vlen); \
+ VFLOAT delta_t = __riscv_vfmul (w_lo, 0x1.0p63, vlen); \
+ T = __riscv_vadd (T, __riscv_vfcvt_x (delta_t, vlen), vlen); \
+ T = __riscv_vmerge (T, T_left, x_in_left, vlen); \
+ \
+ w_hi = __riscv_vmerge (w_hi, w_hi_left, x_in_left, vlen); \
+ w_lo = __riscv_vmerge (w_lo, w_lo_left, x_in_left, vlen); \
+ \
+ /* For transformed branch, compute (w_hi + w_lo) * P(T)/Q(T) */ \
+ VINT P, Q; \
+ \
+ P = __riscv_vmerge (VMVI_VX (P_right_10, vlen), P_left_10, x_in_left, \
+ vlen); \
+ P = PSTEP_I_ab ( \
+ x_in_left, P_left_6, P_right_6, T, \
+ PSTEP_I_ab (x_in_left, P_left_7, P_right_7, T, \
+ PSTEP_I_ab (x_in_left, P_left_8, P_right_8, T, \
+ PSTEP_I_ab (x_in_left, P_left_9, P_right_9, \
+ T, P, vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ Q = __riscv_vmerge (VMVI_VX (Q_right_10, vlen), Q_left_10, x_in_left, \
+ vlen); \
+ Q = PSTEP_I_ab ( \
+ x_in_left, Q_left_6, Q_right_6, T, \
+ PSTEP_I_ab (x_in_left, Q_left_7, Q_right_7, T, \
+ PSTEP_I_ab (x_in_left, Q_left_8, Q_right_8, T, \
+ PSTEP_I_ab (x_in_left, Q_left_9, Q_right_9, \
+ T, Q, vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P = PSTEP_I_ab ( \
+ x_in_left, P_left_0, P_right_0, T, \
+ PSTEP_I_ab ( \
+ x_in_left, P_left_1, P_right_1, T, \
+ PSTEP_I_ab ( \
+ x_in_left, P_left_2, P_right_2, T, \
+ PSTEP_I_ab (x_in_left, P_left_3, P_right_3, T, \
+ PSTEP_I_ab (x_in_left, P_left_4, P_right_4, T, \
+ PSTEP_I_ab (x_in_left, P_left_5, \
+ P_right_5, T, P, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ Q = PSTEP_I_ab ( \
+ x_in_left, Q_left_0, Q_right_0, T, \
+ PSTEP_I_ab ( \
+ x_in_left, Q_left_1, Q_right_1, T, \
+ PSTEP_I_ab ( \
+ x_in_left, Q_left_2, Q_right_2, T, \
+ PSTEP_I_ab (x_in_left, Q_left_3, Q_right_3, T, \
+ PSTEP_I_ab (x_in_left, Q_left_4, Q_right_4, T, \
+ PSTEP_I_ab (x_in_left, Q_left_5, \
+ Q_right_5, T, Q, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VFLOAT p_hi, p_lo; \
+ p_hi = __riscv_vfcvt_f (P, vlen); \
+ \
+ p_lo = __riscv_vfcvt_f ( \
+ __riscv_vsub (P, __riscv_vfcvt_x (p_hi, vlen), vlen), vlen); \
+ VFLOAT delta_p0 = VFMV_VF (DELTA_P0_right, vlen); \
+ delta_p0 = __riscv_vfmerge (delta_p0, DELTA_P0_left, x_in_left, vlen); \
+ p_lo = __riscv_vfadd (p_lo, delta_p0, vlen); \
+ \
+ VFLOAT q_hi, q_lo; \
+ q_hi = __riscv_vfcvt_f (Q, vlen); \
+ q_lo = __riscv_vfcvt_f ( \
+ __riscv_vsub (Q, __riscv_vfcvt_x (q_hi, vlen), vlen), vlen); \
+ VFLOAT delta_q0 = VFMV_VF (DELTA_Q0_right, vlen); \
+ delta_q0 = __riscv_vfmerge (delta_q0, DELTA_Q0_left, x_in_left, vlen); \
+ q_lo = __riscv_vfadd (q_lo, delta_q0, vlen); \
+ \
+ if (__riscv_vcpop (x_is_tiny, vlen) > 0) \
+ { \
+ VFLOAT p_hi_tiny, p_lo_tiny, q_hi_tiny, q_lo_tiny; \
+ ERFCINV_PQ_HILO_TINY (T_tiny, p_hi_tiny, p_lo_tiny, q_hi_tiny, \
+ q_lo_tiny, vlen); \
+ p_hi = __riscv_vmerge (p_hi, p_hi_tiny, x_is_tiny, vlen); \
+ p_lo = __riscv_vmerge (p_lo, p_lo_tiny, x_is_tiny, vlen); \
+ q_hi = __riscv_vmerge (q_hi, q_hi_tiny, x_is_tiny, vlen); \
+ q_lo = __riscv_vmerge (q_lo, q_lo_tiny, x_is_tiny, vlen); \
+ } \
+ \
+ /* (y_hi, y_lo) <-- (w_hi + w_lo) * (p_hi + p_lo) */ \
+ y_hi = __riscv_vfmul (w_hi, p_hi, vlen); \
+ y_lo = __riscv_vfmsub (w_hi, p_hi, y_hi, vlen); \
+ y_lo = __riscv_vfmacc (y_lo, w_hi, p_lo, vlen); \
+ y_lo = __riscv_vfmacc (y_lo, w_lo, p_hi, vlen); \
+ \
+ DIV_N2D2 (y_hi, y_lo, q_hi, q_lo, w_hi, vlen); \
+ \
+ vy = w_hi; \
+ \
+ vy = __riscv_vfsgnj (vy, vx_sign, vlen); \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_erfinv.c b/sysdeps/riscv/rvd/v_d_erfinv.c
new file mode 100644
index 0000000000..ca0a93fb6c
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_erfinv.c
@@ -0,0 +1,262 @@
+/* Double-precision vector erfinv function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_ERFINVD_VSET_CONFIG
+
+#define COMPILE_FOR_ERFINV
+#include "rvvlm_inverrorfuncsD.h"
+
+#if (STRIDE == UNIT_STRIDE)
+#define F_VER1 RVVLM_ERFINVD_STD
+#else
+#define F_VER1 RVVLM_ERFINVDI_STD
+#endif
+
+// Two regions of approximation: left is [0, 0x1.7p-1) and right is [0x1.7p-1,
+// 1) Both are done with rational functions. For left, x*P(x)/Q(x) x in [0,
+// 0x1.7p-1) For right, y*P(t)/Q(t), y = sqrt(-log(1-x)); and t = 1/y
+
+// P_coefficients in asending order, all in Q79. p0_delta is in floating point
+#define P_left_0 -0x48dbe9f5b3eabaa
+#define P_left_1 -0xb35279f1a626ae5
+#define P_left_2 0x33789911873d184a
+#define P_left_3 -0x1bf9138fc77c0fbf
+#define P_left_4 -0x2d4ec43bc48403d4
+#define P_left_5 0x2d61deb53842cca1
+#define P_left_6 0x23324eca6b3ff02
+#define P_left_7 -0xd4ec1d31542c4fc
+#define P_left_8 0x2ecf3c60308b0f2
+#define P_left_9 0x7c917b3378071e
+#define P_left_10 -0x1e09b597f226ca
+#define DELTA_P0_left -0x1.ec7dc41c17860p-2
+
+// Q_coefficients in asending order, all in Q79. q0_delta is in floating point
+#define Q_left_0 -0x52366e5b14c0970
+#define Q_left_1 -0xca57e95abcc599b
+#define Q_left_2 0x3b6c91ec67f5759c
+#define Q_left_3 -0x1c40d5daa3be22bc
+#define Q_left_4 -0x41f11eb5d837386c
+#define Q_left_5 0x3c6ce478fcd75c9a
+#define Q_left_6 0xbb1cd7270cfba1d
+#define Q_left_7 -0x1988a4116498f1af
+#define Q_left_8 0x44dc3042f103d20
+#define Q_left_9 0x2390e683d02edf3
+#define Q_left_10 -0x8ec66f2a7e410c
+#define DELTA_Q0_left -0x1.29a0161e99446p-3
+
+// P_coefficients in asending order, all in Q67. p0_delta is in floating point
+#define P_right_0 0x17a0bb69321df
+#define P_right_1 0x402eb416ae6015
+#define P_right_2 0x2973eb18028ce34
+#define P_right_3 0x8034a7ece1d5370
+#define P_right_4 0xa76c08a74dae273
+#define P_right_5 0x11dd3876b83dd078
+#define P_right_6 0xfdd7693c3b77653
+#define P_right_7 0xb33d66152b3c223
+#define P_right_8 0x5a564c28c6a41a9
+#define P_right_9 0x1190449fe630213
+#define P_right_10 -0x659c784274e1
+#define DELTA_P0_right -0x1.d622f4cbe0eeep-2
+
+// Q_coefficients in asending order, all in Q67. q0_delta is in floating point
+#define Q_right_0 0x17a09aabf9cee
+#define Q_right_1 0x4030b9059ffcad
+#define Q_right_2 0x29b26b0d87f7855
+#define Q_right_3 0x87572a13d3fa2dd
+#define Q_right_4 0xd7a728b5620ac3c
+#define Q_right_5 0x1754392b473fd439
+#define Q_right_6 0x1791b9a091a816c2
+#define Q_right_7 0x167f71db9e13b075
+#define Q_right_8 0xcb9f5f3e5e618a4
+#define Q_right_9 0x68271fae767c68e
+#define Q_right_10 0x13745c4fa224b25
+#define DELTA_Q0_right 0x1.f7e7557a34ae6p-2
+
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, erfinv) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx, vx_orig, vy, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx = x; \
+ vx_orig = vx; \
+ \
+ /* Handle Inf and NaN */ \
+ EXCEPTION_HANDLING_ERFINV (vx, special_args, vy_special, vlen); \
+ \
+ vx = __riscv_vfsgnj (vx, fp_posOne, vlen); \
+ VBOOL x_in_right = __riscv_vmfge (vx, 0x1.7p-1, vlen); \
+ \
+ VFLOAT w_hi, w_lo, w_hi_right, w_lo_right, y_hi, y_lo; \
+ VINT T, T_right; \
+ \
+ if (__riscv_vcpop (x_in_right, vlen) > 0) \
+ { \
+ VFLOAT one_minus_x; \
+ one_minus_x = __riscv_vfrsub (vx, fp_posOne, vlen); \
+ \
+ VINT n_adjust; \
+ n_adjust = __riscv_vxor (n_adjust, n_adjust, vlen); \
+ \
+ NEG_LOGX_4_TRANSFORM (one_minus_x, n_adjust, y_hi, y_lo, vlen); \
+ \
+ SQRTX_4_TRANSFORM (y_hi, y_lo, w_hi_right, w_lo_right, T_right, \
+ 0x1.0p63, 0x1.0p-63, vlen); \
+ } \
+ T = __riscv_vfcvt_x (__riscv_vfmul (vx, 0x1.0p63, vlen), vlen); \
+ T = __riscv_vmerge (T, T_right, x_in_right, vlen); \
+ \
+ w_hi = vx; \
+ w_lo = I_AS_F (__riscv_vxor (F_AS_I (w_lo), F_AS_I (w_lo), vlen)); \
+ w_hi = __riscv_vmerge (w_hi, w_hi_right, x_in_right, vlen); \
+ w_lo = __riscv_vmerge (w_lo, w_lo_right, x_in_right, vlen); \
+ \
+ /* For transformed branch, compute (w_hi + w_lo) * P(T)/Q(T) */ \
+ VINT P, Q; \
+ \
+ P = __riscv_vmerge (VMVI_VX (P_left_10, vlen), P_right_10, x_in_right, \
+ vlen); \
+ P = PSTEP_I_ab ( \
+ x_in_right, P_right_6, P_left_6, T, \
+ PSTEP_I_ab (x_in_right, P_right_7, P_left_7, T, \
+ PSTEP_I_ab (x_in_right, P_right_8, P_left_8, T, \
+ PSTEP_I_ab (x_in_right, P_right_9, P_left_9, \
+ T, P, vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ Q = __riscv_vmerge (VMVI_VX (Q_left_10, vlen), Q_right_10, x_in_right, \
+ vlen); \
+ Q = PSTEP_I_ab ( \
+ x_in_right, Q_right_6, Q_left_6, T, \
+ PSTEP_I_ab (x_in_right, Q_right_7, Q_left_7, T, \
+ PSTEP_I_ab (x_in_right, Q_right_8, Q_left_8, T, \
+ PSTEP_I_ab (x_in_right, Q_right_9, Q_left_9, \
+ T, Q, vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ P = PSTEP_I_ab ( \
+ x_in_right, P_right_0, P_left_0, T, \
+ PSTEP_I_ab ( \
+ x_in_right, P_right_1, P_left_1, T, \
+ PSTEP_I_ab ( \
+ x_in_right, P_right_2, P_left_2, T, \
+ PSTEP_I_ab (x_in_right, P_right_3, P_left_3, T, \
+ PSTEP_I_ab (x_in_right, P_right_4, P_left_4, T, \
+ PSTEP_I_ab (x_in_right, P_right_5, \
+ P_left_5, T, P, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ Q = PSTEP_I_ab ( \
+ x_in_right, Q_right_0, Q_left_0, T, \
+ PSTEP_I_ab ( \
+ x_in_right, Q_right_1, Q_left_1, T, \
+ PSTEP_I_ab ( \
+ x_in_right, Q_right_2, Q_left_2, T, \
+ PSTEP_I_ab (x_in_right, Q_right_3, Q_left_3, T, \
+ PSTEP_I_ab (x_in_right, Q_right_4, Q_left_4, T, \
+ PSTEP_I_ab (x_in_right, Q_right_5, \
+ Q_left_5, T, Q, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VFLOAT p_hi, p_lo; \
+ p_hi = __riscv_vfcvt_f (P, vlen); \
+ \
+ p_lo = __riscv_vfcvt_f ( \
+ __riscv_vsub (P, __riscv_vfcvt_x (p_hi, vlen), vlen), vlen); \
+ VFLOAT delta_p0 = VFMV_VF (DELTA_P0_left, vlen); \
+ delta_p0 = __riscv_vfmerge (delta_p0, DELTA_P0_right, x_in_right, vlen); \
+ p_lo = __riscv_vfadd (p_lo, delta_p0, vlen); \
+ \
+ VFLOAT q_hi, q_lo; \
+ q_hi = __riscv_vfcvt_f (Q, vlen); \
+ q_lo = __riscv_vfcvt_f ( \
+ __riscv_vsub (Q, __riscv_vfcvt_x (q_hi, vlen), vlen), vlen); \
+ VFLOAT delta_q0 = VFMV_VF (DELTA_Q0_left, vlen); \
+ delta_q0 = __riscv_vfmerge (delta_q0, DELTA_Q0_right, x_in_right, vlen); \
+ q_lo = __riscv_vfadd (q_lo, delta_q0, vlen); \
+ \
+ /* (y_hi, y_lo) <-- (w_hi + w_lo) * (p_hi + p_lo) */ \
+ y_hi = __riscv_vfmul (w_hi, p_hi, vlen); \
+ y_lo = __riscv_vfmsub (w_hi, p_hi, y_hi, vlen); \
+ y_lo = __riscv_vfmacc (y_lo, w_hi, p_lo, vlen); \
+ y_lo = __riscv_vfmacc (y_lo, w_lo, p_hi, vlen); \
+ \
+ DIV_N2D2 (y_hi, y_lo, q_hi, q_lo, w_hi, vlen); \
+ \
+ vy = w_hi; \
+ \
+ vy = __riscv_vfsgnj (vy, vx_orig, vlen); \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_exp.c b/sysdeps/riscv/rvd/v_d_exp.c
new file mode 100644
index 0000000000..b89a01bf9e
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_exp.c
@@ -0,0 +1,153 @@
+/* Double-precision vector exp function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_EXPD_VSET_CONFIG
+
+#define COMPILE_FOR_EXP
+
+#define EXCEPTION_HANDLING_EXP(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ IDENTIFY (vclass, class_NaN | class_Inf, (special_args), (vlen)); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ /* Substitute -Inf with +0 */ \
+ VBOOL id_mask; \
+ IDENTIFY (vclass, class_negInf, id_mask, (vlen)); \
+ vx = __riscv_vfmerge (vx, fp_posZero, id_mask, (vlen)); \
+ vy_special = __riscv_vfadd ((special_args), (vx), (vx), (vlen)); \
+ vx = __riscv_vfmerge ((vx), fp_posZero, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+
+#define P_INV_STD 0x1.71547652b82fep+0
+#define P_HI_STD 0x1.62e42fefa39efp-1
+#define P_LO_STD 0x1.abc9e3b39803fp-56
+#define P_INV_TBL 0x1.71547652b82fep+6
+#define P_HI_TBL 0x1.62e42fefa39efp-7
+#define P_LO_TBL 0x1.abc9e3b39803fp-62
+#define X_MAX 0x1.65p+9
+#define X_MIN -0x1.77p+9
+
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, exp) (VFLOAT x) \
+ { \
+ size_t vl; \
+ VFLOAT vx, vy, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ vl = VSET (simdlen); \
+ vx = x; \
+ /* Set results for input of NaN and Inf; substitute them with zero */ \
+ EXCEPTION_HANDLING_EXP (vx, special_args, vy_special, vl); \
+ \
+ /* Clip */ \
+ vx = FCLIP (vx, X_MIN, X_MAX, vl); \
+ \
+ /* Argument reduction */ \
+ VFLOAT flt_n = __riscv_vfmul (vx, P_INV_STD, vl); \
+ VINT n = __riscv_vfcvt_x (flt_n, vl); \
+ flt_n = __riscv_vfcvt_f (n, vl); \
+ VFLOAT r = __riscv_vfnmsac (vx, P_HI_STD, flt_n, vl); \
+ \
+ r = __riscv_vfnmsac (r, P_LO_STD, flt_n, vl); \
+ \
+ /* Polynomial computation, we have a degree 11 \
+ We compute the part from r^3 in three segments, increasing parallelism \
+ Ideally the compiler will interleave the computations of the segments \
+ */ \
+ VFLOAT poly_right = PSTEP ( \
+ 0x1.71df804f1baa1p-19, r, \
+ PSTEP (0x1.28aa3ea739296p-22, 0x1.acf86201fd199p-26, r, vl), vl); \
+ \
+ VFLOAT poly_mid = PSTEP ( \
+ 0x1.6c16c1825c970p-10, r, \
+ PSTEP (0x1.a01a00fe6f730p-13, 0x1.a0199e1789c72p-16, r, vl), vl); \
+ \
+ VFLOAT poly_left = PSTEP ( \
+ 0x1.55555555554d2p-3, r, \
+ PSTEP (0x1.5555555551307p-5, 0x1.11111111309a4p-7, r, vl), vl); \
+ \
+ VFLOAT r_sq = __riscv_vfmul (r, r, vl); \
+ VFLOAT r_cube = __riscv_vfmul (r_sq, r, vl); \
+ \
+ VFLOAT poly = __riscv_vfmadd (poly_right, r_cube, poly_mid, vl); \
+ poly = __riscv_vfmadd (poly, r_cube, poly_left, vl); \
+ \
+ poly = PSTEP (0x1.0000000000007p-1, r, poly, vl); \
+ \
+ r = __riscv_vfmacc (r, r_sq, poly, vl); \
+ vy = __riscv_vfadd (r, 0x1.0p0, vl); \
+ \
+ /* at this point, vy is the entire degree-11 polynomial vy ~=~ exp(r) */ \
+ \
+ /* Need to compute 2^n * exp(r).*/ \
+ FAST_LDEXP (vy, n, vl); \
+ \
+ /* Incorporate results of exceptional inputs */ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vl); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_exp10.c b/sysdeps/riscv/rvd/v_d_exp10.c
new file mode 100644
index 0000000000..f0646fb11e
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_exp10.c
@@ -0,0 +1,158 @@
+/* Double-precision vector exp10 function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_EXP10D_VSET_CONFIG
+
+#define COMPILE_FOR_EXP10
+
+#define EXCEPTION_HANDLING_EXP(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ IDENTIFY (vclass, class_NaN | class_Inf, (special_args), (vlen)); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ /* Substitute -Inf with +0 */ \
+ VBOOL id_mask; \
+ IDENTIFY (vclass, class_negInf, id_mask, (vlen)); \
+ vx = __riscv_vfmerge (vx, fp_posZero, id_mask, (vlen)); \
+ vy_special = __riscv_vfadd ((special_args), (vx), (vx), (vlen)); \
+ vx = __riscv_vfmerge ((vx), fp_posZero, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+
+#define P_INV_STD 0x1.a934f0979a371p+1
+#define P_HI_STD 0x1.34413509f79ffp-2
+#define P_LO_STD -0x1.9dc1da994fd21p-59
+#define P_INV_TBL 0x1.a934f0979a371p+7
+#define P_HI_TBL 0x1.34413509f79ffp-8
+#define P_LO_TBL -0x1.9dc1da994fd21p-65
+#define LOGB_HI 0x1.26bb1bbb55516p+1
+#define LOGB_LO -0x1.f48ad494ea3e9p-53
+#define X_MAX 0x1.36p8
+#define X_MIN -0x1.46p8
+
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, exp10) (VFLOAT x) \
+ { \
+ size_t vlen = 2; \
+ VFLOAT vx, vy, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ \
+ vlen = VSET (simdlen); \
+ \
+ vx = x; \
+ /* Set results for input of NaN and Inf; substitute them with zero */ \
+ EXCEPTION_HANDLING_EXP (vx, special_args, vy_special, vlen); \
+ \
+ /* Clip */ \
+ vx = FCLIP (vx, X_MIN, X_MAX, vlen); \
+ \
+ /* Argument reduction */ \
+ VFLOAT flt_n = __riscv_vfmul (vx, P_INV_STD, vlen); \
+ VINT n = __riscv_vfcvt_x (flt_n, vlen); \
+ flt_n = __riscv_vfcvt_f (n, vlen); \
+ VFLOAT r = __riscv_vfnmsac (vx, P_HI_STD, flt_n, vlen); \
+ \
+ VFLOAT r_lo = __riscv_vfmul (flt_n, P_LO_STD, vlen); \
+ r_lo = __riscv_vfnmsac (__riscv_vfmul (r, LOGB_LO, vlen), LOGB_HI, r_lo, \
+ vlen); \
+ r = __riscv_vfmadd (r, LOGB_HI, r_lo, vlen); \
+ \
+ /* Polynomial computation, we have a degree 11 \
+ We compute the part from r^3 in three segments, increasing parallelism \
+ Ideally the compiler will interleave the computations of the segments \
+ */ \
+ VFLOAT poly_right = PSTEP ( \
+ 0x1.71df804f1baa1p-19, r, \
+ PSTEP (0x1.28aa3ea739296p-22, 0x1.acf86201fd199p-26, r, vlen), vlen); \
+ \
+ VFLOAT poly_mid = PSTEP ( \
+ 0x1.6c16c1825c970p-10, r, \
+ PSTEP (0x1.a01a00fe6f730p-13, 0x1.a0199e1789c72p-16, r, vlen), vlen); \
+ \
+ VFLOAT poly_left = PSTEP ( \
+ 0x1.55555555554d2p-3, r, \
+ PSTEP (0x1.5555555551307p-5, 0x1.11111111309a4p-7, r, vlen), vlen); \
+ \
+ VFLOAT r_sq = __riscv_vfmul (r, r, vlen); \
+ VFLOAT r_cube = __riscv_vfmul (r_sq, r, vlen); \
+ \
+ VFLOAT poly = __riscv_vfmadd (poly_right, r_cube, poly_mid, vlen); \
+ poly = __riscv_vfmadd (poly, r_cube, poly_left, vlen); \
+ \
+ poly = PSTEP (0x1.0000000000007p-1, r, poly, vlen); \
+ \
+ r = __riscv_vfmacc (r, r_sq, poly, vlen); \
+ vy = __riscv_vfadd (r, 0x1.0p0, vlen); \
+ \
+ /* at this point, vy is the entire degree-11 polynomial */ \
+ /* vy ~=~ exp(r) */ \
+ \
+ /* Need to compute 2^n * exp(r).*/ \
+ FAST_LDEXP (vy, n, vlen); \
+ \
+ /* Incorporate results of exceptional inputs */ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_exp2.c b/sysdeps/riscv/rvd/v_d_exp2.c
new file mode 100644
index 0000000000..55e3e27596
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_exp2.c
@@ -0,0 +1,153 @@
+/* Double-precision vector exp2 function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_EXP2D_VSET_CONFIG
+
+#define COMPILE_FOR_EXP2
+
+#define EXCEPTION_HANDLING_EXP(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ IDENTIFY (vclass, class_NaN | class_Inf, (special_args), (vlen)); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ /* Substitute -Inf with +0 */ \
+ VBOOL id_mask; \
+ IDENTIFY (vclass, class_negInf, id_mask, (vlen)); \
+ vx = __riscv_vfmerge (vx, fp_posZero, id_mask, (vlen)); \
+ vy_special = __riscv_vfadd ((special_args), (vx), (vx), (vlen)); \
+ vx = __riscv_vfmerge ((vx), fp_posZero, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+
+#define P_INV_STD 0x1.71547652b82fep+0
+#define P_HI_STD 0x1.62e42fefa39efp-1
+#define P_LO_STD 0x1.abc9e3b39803fp-56
+#define P_INV_TBL 0x1.0p6
+#define P_HI_TBL 0x1.0p-6
+#define P_LO_TBL 0x1.abc9e3b39803fp-56
+#define LOGB_HI 0x1.62e42fefa39efp-1
+#define LOGB_LO 0x1.abc9e3b39803fp-56
+#define X_MAX 0x1.018p10
+#define X_MIN -0x1.0ep10
+
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, exp2) (VFLOAT x) \
+ { \
+ size_t vlen = 2; \
+ VFLOAT vx, vy, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ \
+ vlen = VSET (simdlen); \
+ \
+ vx = x; \
+ /* Set results for input of NaN and Inf; substitute them with zero */ \
+ EXCEPTION_HANDLING_EXP (vx, special_args, vy_special, vlen); \
+ \
+ /* Clip */ \
+ vx = FCLIP (vx, X_MIN, X_MAX, vlen); \
+ \
+ /* Argument reduction */ \
+ VINT n = __riscv_vfcvt_x (vx, vlen); \
+ VFLOAT flt_n = __riscv_vfcvt_f (n, vlen); \
+ VFLOAT r = __riscv_vfsub (vx, flt_n, vlen); \
+ r = __riscv_vfmul (r, LOGB_HI, vlen); \
+ \
+ /* Polynomial computation, we have a degree 11 \
+ We compute the part from r^3 in three segments, increasing parallelism \
+ Ideally the compiler will interleave the computations of the segments \
+ */ \
+ VFLOAT poly_right = PSTEP ( \
+ 0x1.71df804f1baa1p-19, r, \
+ PSTEP (0x1.28aa3ea739296p-22, 0x1.acf86201fd199p-26, r, vlen), vlen); \
+ \
+ VFLOAT poly_mid = PSTEP ( \
+ 0x1.6c16c1825c970p-10, r, \
+ PSTEP (0x1.a01a00fe6f730p-13, 0x1.a0199e1789c72p-16, r, vlen), vlen); \
+ \
+ VFLOAT poly_left = PSTEP ( \
+ 0x1.55555555554d2p-3, r, \
+ PSTEP (0x1.5555555551307p-5, 0x1.11111111309a4p-7, r, vlen), vlen); \
+ \
+ VFLOAT r_sq = __riscv_vfmul (r, r, vlen); \
+ VFLOAT r_cube = __riscv_vfmul (r_sq, r, vlen); \
+ \
+ VFLOAT poly = __riscv_vfmadd (poly_right, r_cube, poly_mid, vlen); \
+ poly = __riscv_vfmadd (poly, r_cube, poly_left, vlen); \
+ \
+ poly = PSTEP (0x1.0000000000007p-1, r, poly, vlen); \
+ \
+ r = __riscv_vfmacc (r, r_sq, poly, vlen); \
+ vy = __riscv_vfadd (r, 0x1.0p0, vlen); \
+ \
+ /* at this point, vy is the entire degree-11 polynomial */ \
+ /* vy ~=~ exp(r) */ \
+ \
+ /* Need to compute 2^n * exp(r). */ \
+ FAST_LDEXP (vy, n, vlen); \
+ \
+ /* Incorporate results of exceptional inputs */ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_expint1.c b/sysdeps/riscv/rvd/v_d_expint1.c
new file mode 100644
index 0000000000..c0e7b26b45
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_expint1.c
@@ -0,0 +1,479 @@
+/* Double-precision vector expnt1 function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_EXPINT1D_VSET_CONFIG
+
+// expint1 exceptions are:
+// sNaN, -ve values: return qNaN and invalid exception signal
+// +-0: return +Inf and divide-by-zero exception signal
+// +Inf: return +Inf, no exception
+// positive denorm: normalize for later processing
+#define EXCEPTION_HANDLING(vx, special_args, vy_special, n_adjust, vlen) \
+ do \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ IDENTIFY (vclass, 0x3BF, (special_args), (vlen)); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ (n_adjust) = __riscv_vxor ((n_adjust), (n_adjust), (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ VBOOL id_mask; \
+ /* substitute negative arguments with sNaN */ \
+ IDENTIFY (vclass, class_negative, id_mask, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_sNaN, id_mask, vlen); \
+ /* substitute -0 argument with +0 */ \
+ IDENTIFY (vclass, class_negZero, id_mask, vlen); \
+ (vx) = __riscv_vfmerge ((vx), fp_posZero, id_mask, vlen); \
+ /* eliminate positive denorm from special arguments */ \
+ IDENTIFY (vclass, 0x39F, (special_args), (vlen)); \
+ /* for narrowed set of special arguments, compute vx+vfrec7(vx) */ \
+ (vy_special) = __riscv_vfrec7 ((special_args), (vx), (vlen)); \
+ (vy_special) \
+ = __riscv_vfadd ((special_args), (vy_special), (vx), (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posOne, (special_args), (vlen)); \
+ /* scale up input for positive denormals */ \
+ IDENTIFY (vclass, class_posDenorm, id_mask, (vlen)); \
+ (n_adjust) = __riscv_vmerge ((n_adjust), 64, id_mask, (vlen)); \
+ VFLOAT vx_normalized \
+ = __riscv_vfmul (id_mask, (vx), 0x1.0p64, (vlen)); \
+ (vx) = __riscv_vmerge ((vx), vx_normalized, id_mask, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posOne, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+
+#define NEG_LOG2_HI -0x1.62e42fefa4000p-1
+#define NEG_LOG2_LO 0x1.8432a1b0e2634p-43
+
+#define EXPINT1_NEG_LOG(vx_in, n_adjust, y_hi, y_lo, vlen) \
+ do \
+ { \
+ /* in_arg at this point are positive, finite and not subnormal */ \
+ /* Decompose in_arg into 2^n * X, where 0.75 <= X < 1.5 */ \
+ /* log(2^n X) = n * log(2) + log(X) */ \
+ /* log(X) = 2 atanh((X-1)/(X+1)) */ \
+ \
+ /* Argument reduction: represent in_arg as 2^n X */ \
+ /* where 1/rt(2) <= X < rt(2) approximately */ \
+ /* Then compute 2(X-1)/(X+1) as r + delta_r. */ \
+ /* natural log, log(X) = 2 atanh(w/2) = w + p1 w^3 + p2 w5 ...; */ \
+ /* w = r+delta_r */ \
+ VFLOAT vx = (vx_in); \
+ VINT n = U_AS_I (__riscv_vadd ( \
+ __riscv_vsrl (F_AS_U ((vx)), MAN_LEN - 8, (vlen)), 0x96, (vlen))); \
+ n = __riscv_vsra (n, 8, (vlen)); \
+ n = __riscv_vsub (n, EXP_BIAS, (vlen)); \
+ (vx) = I_AS_F (__riscv_vsub ( \
+ F_AS_I ((vx)), __riscv_vsll (n, MAN_LEN, (vlen)), (vlen))); \
+ n = __riscv_vsub (n, (n_adjust), (vlen)); \
+ VFLOAT n_flt = __riscv_vfcvt_f (n, (vlen)); \
+ \
+ VFLOAT numer = __riscv_vfsub ((vx), fp_posOne, (vlen)); \
+ numer = __riscv_vfadd (numer, numer, (vlen)); \
+ VFLOAT denom, delta_d; \
+ denom = __riscv_vfadd ((vx), fp_posOne, (vlen)); \
+ delta_d = __riscv_vfrsub (denom, fp_posOne, (vlen)); \
+ delta_d = __riscv_vfadd (delta_d, (vx), (vlen)); \
+ VFLOAT r, delta_r; \
+ DIV_N1D2 (numer, denom, delta_d, r, delta_r, (vlen)); \
+ \
+ VFLOAT rsq = __riscv_vfmul (r, r, (vlen)); \
+ VFLOAT rcube = __riscv_vfmul (rsq, r, (vlen)); \
+ VFLOAT r6 = __riscv_vfmul (rcube, rcube, (vlen)); \
+ \
+ VFLOAT poly_right \
+ = PSTEP (0x1.c71c51c73bb7fp-12, rsq, \
+ PSTEP (0x1.74664bed42062p-14, rsq, \
+ PSTEP (0x1.39a071f83b771p-16, \
+ 0x1.2f123764244dfp-18, rsq, (vlen)), \
+ (vlen)), \
+ (vlen)); \
+ \
+ VFLOAT poly_left = PSTEP ( \
+ 0x1.5555555555594p-4, rsq, \
+ PSTEP (0x1.999999997f6b6p-7, 0x1.2492494248a48p-9, rsq, (vlen)), \
+ (vlen)); \
+ \
+ VFLOAT poly = __riscv_vfmadd (poly_right, r6, poly_left, (vlen)); \
+ delta_r = __riscv_vfmsac (delta_r, NEG_LOG2_LO, n_flt, (vlen)); \
+ poly = __riscv_vfnmsub (poly, rcube, delta_r, (vlen)); \
+ /* At this point -r + poly approximates -log(X) */ \
+ \
+ /* Reconstruction: -log(in_arg) is -n log(2) - log(X) computed as */ \
+ /* n*(-log_2_hi - log_2_lo) - r - poly */ \
+ /* n*log_2_hi is exact as log_2_hi has enough trailing zeros */ \
+ VFLOAT A = __riscv_vfmul (n_flt, NEG_LOG2_HI, (vlen)); \
+ /* A either donimates r, or A is exactly 0 */ \
+ r = __riscv_vfsgnjx (r, fp_negOne, (vlen)); \
+ FAST2SUM (A, r, (y_hi), (y_lo), (vlen)); \
+ (y_lo) = __riscv_vfadd ((y_lo), poly, (vlen)); \
+ } \
+ while (0)
+
+// EXPINT1 on [0,1] is approximated by -log(x) + poly(x)
+// poly(x) = p0 + p1 x + ... + p12 x^12
+// p0, ..., p4 fixed point in Q62
+#define P_near0_0 -0x24f119f8df6c31e9
+#define P_near0_1 0x3fffffffffffffdc
+#define P_near0_2 -0x0ffffffffffff993
+#define P_near0_3 0x038e38e38e3862ef
+#define P_near0_4 -0x00aaaaaaaaa516c7
+// p5, ..., p13 as floating point
+#define P_near0_5 0x1.b4e81b4c194fap-10
+#define P_near0_6 -0x1.e573ac379c696p-13
+#define P_near0_7 0x1.db8b66c555673p-16
+#define P_near0_8 -0x1.a01962ee439b6p-19
+#define P_near0_9 0x1.48bd8c51ff717p-22
+#define P_near0_10 -0x1.d8dbdf85f5051p-26
+#define P_near0_11 0x1.355966c463c2ap-29
+#define P_near0_12 -0x1.5f2978a4477ccp-33
+#define P_near0_13 0x1.0c38e425fee47p-37
+
+#define EXPINT_POLY_01(vx, y_hi, y_lo, vlen) \
+ do \
+ { \
+ /* Compute leading poly in fixed point */ \
+ VINT P_FIX, X; \
+ X = __riscv_vfcvt_x (__riscv_vfmul ((vx), 0x1.0p63, (vlen)), (vlen)); \
+ P_FIX = PSTEP_I ( \
+ P_near0_0, X, \
+ PSTEP_I (P_near0_1, X, \
+ PSTEP_I (P_near0_2, X, \
+ PSTEP_I (P_near0_3, P_near0_4, X, (vlen)), \
+ (vlen)), \
+ (vlen)), \
+ (vlen)); \
+ VFLOAT p_left_hi, p_left_lo; \
+ p_left_hi = __riscv_vfcvt_f (P_FIX, (vlen)); \
+ P_FIX = __riscv_vsub (P_FIX, __riscv_vfcvt_x (p_left_hi, (vlen)), \
+ (vlen)); \
+ p_left_lo = __riscv_vfcvt_f (P_FIX, (vlen)); \
+ p_left_lo = __riscv_vfmul (p_left_lo, 0x1.0p-62, (vlen)); \
+ p_left_hi = __riscv_vfmul (p_left_hi, 0x1.0p-62, (vlen)); \
+ \
+ VFLOAT poly_mid, poly_right; \
+ poly_right = PSTEP ( \
+ P_near0_9, (vx), \
+ PSTEP (P_near0_10, (vx), \
+ PSTEP (P_near0_11, (vx), \
+ PSTEP (P_near0_12, P_near0_13, (vx), (vlen)), \
+ (vlen)), \
+ (vlen)), \
+ (vlen)); \
+ \
+ poly_mid = PSTEP (P_near0_5, (vx), \
+ PSTEP (P_near0_6, (vx), \
+ PSTEP (P_near0_7, P_near0_8, (vx), (vlen)), \
+ (vlen)), \
+ ((vlen))); \
+ VFLOAT x4, x5; \
+ x4 = __riscv_vfmul ((vx), (vx), (vlen)); \
+ x4 = __riscv_vfmul (x4, x4, (vlen)); \
+ x5 = __riscv_vfmul ((vx), x4, (vlen)); \
+ poly_mid = __riscv_vfmacc (poly_mid, x4, poly_right, (vlen)); \
+ p_left_lo = __riscv_vfmacc (p_left_lo, x5, poly_mid, (vlen)); \
+ KNUTH2SUM (p_left_hi, p_left_lo, (y_hi), (y_lo), (vlen)); \
+ } \
+ while (0)
+
+// Rational function deg-11 for x >= 1
+// expint1(x) ~= exp(-x) * y * P(y)/Q(y), y=1/x
+#define P_0 0x01cd7ed8aff2c99a // Q fmt 89
+#define P_1 0x569fe822aee57cb5 // Q fmt 89
+#define P_2 0x066ef748e71155e7 // Q fmt 81
+#define P_3 0x3eef1f3e5518e60c // Q fmt 81
+#define P_4 0x0565f4de088a3f6f // Q fmt 75
+#define P_5 0x110eb5a49f6eb3f7 // Q fmt 75
+#define P_6 0x1eb3d58b6063612a // Q fmt 75
+#define P_7 0x1e310833e20e2c95 // Q fmt 75
+#define P_8 0x0ef22d4e45dc890f // Q fmt 75
+#define P_9 0x0332908f3ee32e50 // Q fmt 75
+#define P_10 0x00363991b63aebbe // Q fmt 75
+#define P_11 0x00003fcc5b1fe05f // Q fmt 75
+#define delta_P_0 -0x1.991be2150638dp-91
+
+#define Q_0 0x01cd7ed8aff2c99a // Q fmt 89
+#define Q_1 0x586d66fb5ed84240 // Q fmt 89
+#define Q_2 0x06c3c9b231105002 // Q fmt 81
+#define Q_3 0x450cdf1ba384745a // Q fmt 81
+#define Q_4 0x0325d39f7df69cef // Q fmt 74
+#define Q_5 0x0adb448d52de9c87 // Q fmt 74
+#define Q_6 0x162a2f5f98fba589 // Q fmt 74
+#define Q_7 0x1a29d73cbd365659 // Q fmt 74
+#define Q_8 0x10f713cf2428b2ff // Q fmt 74
+#define Q_9 0x0582b960921c6dee // Q fmt 74
+#define Q_10 0x00c13a848700a0a5 // Q fmt 74
+#define Q_11 0x0007802b6e574e3e // Q fmt 74
+#define delta_Q_0 0x1.91d57a67cdde2p-91
+
+// Compute p(y)/q(y), y = 1/x
+// Computation done in fixed point. Y62 is 1/x in Q62 format
+// On return rat_hi and rat_lo are floating-point values
+#define EXPINT1_RAT_GE1(x_hi, x_lo, scale, rat_hi, rat_lo, vlen) \
+ do \
+ { \
+ VINT P75, P81, P89, Q74, Q81, Q89; \
+ VINT _X; \
+ FLT2FIX ((x_hi), (x_lo), (scale), _X, (vlen)); \
+ P75 = PSTEP_I_SLL (P_10, P_11, 1, _X, (vlen)); \
+ P75 = PSTEP_I_SLL (P_9, _X, 1, P75, (vlen)); \
+ P75 = PSTEP_I_SLL (P_8, _X, 1, P75, (vlen)); \
+ P75 = PSTEP_I_SLL (P_7, _X, 1, P75, (vlen)); \
+ P75 = PSTEP_I_SLL (P_6, _X, 1, P75, (vlen)); \
+ P75 = PSTEP_I_SLL (P_5, _X, 1, P75, (vlen)); \
+ P75 = PSTEP_I_SLL (P_4, _X, 1, P75, (vlen)); \
+ VFLOAT _xsq_hi, _xsq_lo; \
+ SQR_X2 ((x_hi), (x_lo), _xsq_hi, _xsq_lo, (vlen)); \
+ VFLOAT _p_right_hi, _p_right_lo; \
+ FIX2FLT (P75, 0x1.0p-75, _p_right_hi, _p_right_lo, (vlen)); \
+ VFLOAT _p_tmp1_hi, _p_tmp1_lo; \
+ PROD_X2Y2 (_xsq_hi, _xsq_lo, _p_right_hi, _p_right_lo, _p_tmp1_hi, \
+ _p_tmp1_lo, (vlen)); \
+ \
+ P81 = PSTEP_I_SLL (P_2, P_3, 1, _X, (vlen)); \
+ VFLOAT _p_mid_hi, _p_mid_lo; \
+ FIX2FLT (P81, 0x1.0p-81, _p_mid_hi, _p_mid_lo, (vlen)); \
+ VFLOAT _p_tmp2_hi, _p_tmp2_lo; \
+ POS2SUM (_p_tmp1_hi, _p_mid_hi, _p_tmp2_hi, _p_tmp2_lo, (vlen)); \
+ _p_tmp2_lo = __riscv_vfadd (_p_tmp2_lo, _p_tmp1_lo, (vlen)); \
+ _p_tmp2_lo = __riscv_vfadd (_p_tmp2_lo, _p_mid_lo, (vlen)); \
+ \
+ PROD_X2Y2 (_xsq_hi, _xsq_lo, _p_tmp2_hi, _p_tmp2_lo, _p_tmp1_hi, \
+ _p_tmp1_lo, (vlen)); \
+ VFLOAT _p_left_hi, _p_left_lo; \
+ P89 = PSTEP_I_SLL (P_0, P_1, 1, _X, (vlen)); \
+ FIX2FLT (P89, 0x1.0p-89, _p_left_hi, _p_left_lo, (vlen)); \
+ POS2SUM (_p_left_hi, _p_tmp1_hi, _p_tmp2_hi, _p_tmp2_lo, (vlen)); \
+ _p_tmp2_lo = __riscv_vfadd (_p_tmp2_lo, _p_left_lo, (vlen)); \
+ _p_tmp2_lo = __riscv_vfadd (_p_tmp2_lo, _p_tmp1_lo, (vlen)); \
+ _p_tmp2_lo = __riscv_vfadd (_p_tmp2_lo, delta_P_0, (vlen)); \
+ VFLOAT _p_hi, _p_lo; \
+ FAST2SUM (_p_tmp2_hi, _p_tmp2_lo, _p_hi, _p_lo, (vlen)); \
+ /* (_p_hi, _p_lo) is an accurate version of p(x) */ \
+ \
+ Q74 = PSTEP_I_SLL (Q_10, Q_11, 1, _X, (vlen)); \
+ Q74 = PSTEP_I_SLL (Q_9, _X, 1, Q74, (vlen)); \
+ Q74 = PSTEP_I_SLL (Q_8, _X, 1, Q74, (vlen)); \
+ Q74 = PSTEP_I_SLL (Q_7, _X, 1, Q74, (vlen)); \
+ Q74 = PSTEP_I_SLL (Q_6, _X, 1, Q74, (vlen)); \
+ Q74 = PSTEP_I_SLL (Q_5, _X, 1, Q74, (vlen)); \
+ Q74 = PSTEP_I_SLL (Q_4, _X, 1, Q74, (vlen)); \
+ \
+ VFLOAT _q_right_hi, _q_right_lo; \
+ FIX2FLT (Q74, 0x1.0p-74, _q_right_hi, _q_right_lo, (vlen)); \
+ VFLOAT _q_tmp1_hi, _q_tmp1_lo; \
+ PROD_X2Y2 (_xsq_hi, _xsq_lo, _q_right_hi, _q_right_lo, _q_tmp1_hi, \
+ _q_tmp1_lo, (vlen)); \
+ \
+ Q81 = PSTEP_I_SLL (Q_2, Q_3, 1, _X, (vlen)); \
+ VFLOAT _q_mid_hi, _q_mid_lo; \
+ FIX2FLT (Q81, 0x1.0p-81, _q_mid_hi, _q_mid_lo, (vlen)); \
+ VFLOAT _q_tmp2_hi, _q_tmp2_lo; \
+ POS2SUM (_q_tmp1_hi, _q_mid_hi, _q_tmp2_hi, _q_tmp2_lo, (vlen)); \
+ _q_tmp2_lo = __riscv_vfadd (_q_tmp2_lo, _q_tmp1_lo, (vlen)); \
+ _q_tmp2_lo = __riscv_vfadd (_q_tmp2_lo, _q_mid_lo, (vlen)); \
+ \
+ PROD_X2Y2 (_xsq_hi, _xsq_lo, _q_tmp2_hi, _q_tmp2_lo, _q_tmp1_hi, \
+ _q_tmp1_lo, (vlen)); \
+ VFLOAT _q_left_hi, _q_left_lo; \
+ Q89 = PSTEP_I_SLL (Q_0, Q_1, 1, _X, (vlen)); \
+ FIX2FLT (Q89, 0x1.0p-89, _q_left_hi, _q_left_lo, (vlen)); \
+ POS2SUM (_q_left_hi, _q_tmp1_hi, _q_tmp2_hi, _q_tmp2_lo, (vlen)); \
+ _q_tmp2_lo = __riscv_vfadd (_q_tmp2_lo, _q_left_lo, (vlen)); \
+ _q_tmp2_lo = __riscv_vfadd (_q_tmp2_lo, _q_tmp1_lo, (vlen)); \
+ _q_tmp2_lo = __riscv_vfadd (_q_tmp2_lo, delta_Q_0, (vlen)); \
+ VFLOAT _q_hi, _q_lo; \
+ FAST2SUM (_q_tmp2_hi, _q_tmp2_lo, _q_hi, _q_lo, (vlen)); \
+ /* deliver the final rat_hi, rat_lo */ \
+ DIV2_N2D2 (_p_hi, _p_lo, _q_hi, _q_lo, (rat_hi), (rat_lo), (vlen)); \
+ } \
+ while (0)
+
+// exp(x) for x in [-log/2, log2/2], deg {deg}
+// the coefficients are scaled up by 2^62
+#define P_one 0x4000000000000000
+#define P_half 0x2000000000000000
+#define P_exp_0 0x1.0000000000000p+62
+#define P_exp_1 0x1.0000000000000p+62
+#define P_exp_2 0x1.0000000000000p+61
+
+#define P_exp_3 0x1.555555555555ap+59
+#define P_exp_4 0x1.5555555555533p+57
+
+#define P_exp_5 0x1.111111110ef1dp+55
+#define P_exp_6 0x1.6c16c16c23cabp+52
+#define P_exp_7 0x1.a01a01b2eeafdp+49
+#define P_exp_8 0x1.a01a016c97838p+46
+
+#define P_exp_9 0x1.71ddf0af3f3a4p+43
+#define P_exp_10 0x1.27e542d471a01p+40
+#define P_exp_11 0x1.af6bfc694314ap+36
+#define P_exp_12 0x1.1ef1a5cf633bap+33
+
+#define LOG2_HI 0x1.62e42fefa39efp-1
+#define LOG2_LO 0x1.abc9e3b39803fp-56
+#define NEG_LOG2_INV -0x1.71547652b82fep+0
+
+// compute exp(-x) as 2^n(y_hi + y_lo)
+#define EXP_NEGX(vx, n, y_hi, y_lo, vlen) \
+ do \
+ { \
+ VFLOAT _n_flt = __riscv_vfmul ((vx), NEG_LOG2_INV, (vlen)); \
+ (n) = __riscv_vfcvt_x (_n_flt, (vlen)); \
+ _n_flt = __riscv_vfcvt_f ((n), (vlen)); \
+ VFLOAT _r_hi = __riscv_vfnmadd (_n_flt, LOG2_HI, (vx), (vlen)); \
+ VFLOAT _r_lo = __riscv_vfmul (_n_flt, LOG2_LO, (vlen)); \
+ /* _r_hi - _r_lo is _r */ \
+ VFLOAT _r = __riscv_vfsub (_r_hi, _r_lo, (vlen)); \
+ _r_lo = __riscv_vfsgnjx (_r_lo, fp_negOne, (vlen)); \
+ VFLOAT _p_right, _p_mid; \
+ _p_right \
+ = PSTEP (P_exp_9, _r, \
+ PSTEP (P_exp_10, _r, \
+ PSTEP (P_exp_11, P_exp_12, _r, (vlen)), (vlen)), \
+ (vlen)); \
+ _p_mid = PSTEP ( \
+ P_exp_5, _r, \
+ PSTEP (P_exp_6, _r, PSTEP (P_exp_7, P_exp_8, _r, (vlen)), (vlen)), \
+ (vlen)); \
+ VFLOAT _rsq, _r4; \
+ _rsq = __riscv_vfmul (_r, _r, (vlen)); \
+ _r4 = __riscv_vfmul (_rsq, _rsq, (vlen)); \
+ _p_mid = __riscv_vfmacc (_p_mid, _r4, _p_right, (vlen)); \
+ VFLOAT _p_left = PSTEP (P_exp_3, P_exp_4, _r, (vlen)); \
+ _p_left = __riscv_vfmacc (_p_left, _rsq, _p_mid, (vlen)); \
+ VINT _P = __riscv_vfcvt_x (_p_left, (vlen)); \
+ VINT _R; \
+ FLT2FIX (_r_hi, _r_lo, 0x1.0p63, _R, (vlen)); \
+ _P = PSTEP_I ( \
+ P_one, _R, \
+ PSTEP_I (P_one, _R, PSTEP_I (P_half, _R, _P, (vlen)), (vlen)), \
+ (vlen)); \
+ FIX2FLT (_P, 0x1.0p-62, (y_hi), (y_lo), (vlen)); \
+ } \
+ while (0)
+
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, expint1) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx, vy, vy_special; \
+ VBOOL special_args; \
+ VINT n_adjust; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx = x; \
+ \
+ /* Handle Inf, NaN, +-0, -ve, and positive denormals */ \
+ EXCEPTION_HANDLING (vx, special_args, vy_special, n_adjust, vlen); \
+ \
+ /* Compute for 0 < x < 1 if such arguments exist */ \
+ VBOOL args_lt_1 = __riscv_vmflt (vx, fp_posOne, vlen); \
+ VBOOL args_ge_1 = __riscv_vmnot (args_lt_1, vlen); \
+ VFLOAT vy_xlt1; \
+ if (__riscv_vcpop (args_lt_1, vlen) > 0) \
+ { \
+ VFLOAT vx_lt_1 = __riscv_vfmerge (vx, 0x1.0p-1, args_ge_1, vlen); \
+ VFLOAT neg_logx_hi, neg_logx_lo; \
+ EXPINT1_NEG_LOG (vx_lt_1, n_adjust, neg_logx_hi, neg_logx_lo, vlen); \
+ VFLOAT poly_hi, poly_lo; \
+ EXPINT_POLY_01 (vx_lt_1, poly_hi, poly_lo, vlen); \
+ \
+ VFLOAT AA, aa; \
+ KNUTH2SUM (neg_logx_hi, poly_hi, AA, aa, vlen); \
+ aa = __riscv_vfadd (aa, neg_logx_lo, vlen); \
+ aa = __riscv_vfadd (aa, poly_lo, vlen); \
+ vy_xlt1 = __riscv_vfadd (AA, aa, vlen); \
+ } \
+ VFLOAT vy_xge1; \
+ if (__riscv_vcpop (args_ge_1, vlen) > 0) \
+ { \
+ VFLOAT vx_ge_1 = __riscv_vfmerge (vx, 0x1.0p1, args_lt_1, vlen); \
+ /* suffices to clip at 750.0 */ \
+ vx_ge_1 = __riscv_vfmin (vx_ge_1, 0x1.77p+9, vlen); \
+ VFLOAT recip_x_hi, recip_x_lo; \
+ recip_x_hi = __riscv_vfrdiv (vx_ge_1, fp_posOne, vlen); \
+ recip_x_lo = VFMV_VF (fp_posOne, vlen); \
+ recip_x_lo = __riscv_vfnmsac (recip_x_lo, vx_ge_1, recip_x_hi, vlen); \
+ recip_x_lo = __riscv_vfmul (recip_x_hi, recip_x_lo, vlen); \
+ VFLOAT rat_hi, rat_lo; \
+ EXPINT1_RAT_GE1 (recip_x_hi, recip_x_lo, 0x1.0p62, rat_hi, rat_lo, \
+ vlen); \
+ /* (rat_hi, rat_lo) approximates expint1(x)*exp(x)*x \
+ // so we need to multiply (rat_hi, rat_lo) by (recip_x_hi, \
+ recip_x_lo) \
+ // and exp(-x) */ \
+ VFLOAT rat_by_x_hi, rat_by_x_lo; \
+ PROD_X2Y2 (recip_x_hi, recip_x_lo, rat_hi, rat_lo, rat_by_x_hi, \
+ rat_by_x_lo, vlen); \
+ VFLOAT exp_negx_hi, exp_negx_lo; \
+ VINT n; \
+ EXP_NEGX (vx_ge_1, n, exp_negx_hi, exp_negx_lo, vlen); \
+ VFLOAT result_hi, result_lo; \
+ PROD_X2Y2 (rat_by_x_hi, rat_by_x_lo, exp_negx_hi, exp_negx_lo, \
+ result_hi, result_lo, vlen); \
+ vy_xge1 = __riscv_vfadd (result_hi, result_lo, vlen); \
+ FAST_LDEXP (vy_xge1, n, vlen); \
+ } \
+ vy = __riscv_vmerge (vy_xlt1, vy_xge1, args_ge_1, vlen); \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_expm1.c b/sysdeps/riscv/rvd/v_d_expm1.c
new file mode 100644
index 0000000000..c3bc3ab48c
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_expm1.c
@@ -0,0 +1,197 @@
+/* Double-precision vector expm1 function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_EXPM1D_VSET_CONFIG
+
+#define EXCEPTION_HANDLING_EXPM1(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ IDENTIFY (vclass, class_NaN | class_Inf, (special_args), (vlen)); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ /* Substitute -Inf with -1 */ \
+ VBOOL id_mask; \
+ IDENTIFY (vclass, class_negInf, id_mask, (vlen)); \
+ vx = __riscv_vfmerge ((vx), fp_negOne, id_mask, (vlen)); \
+ vy_special \
+ = __riscv_vfmul ((special_args), (vx), fp_posOne, (vlen)); \
+ vx = __riscv_vfmerge ((vx), fp_posZero, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+
+#define P_INV_STD 0x1.71547652b82fep+0
+#define P_HI_STD 0x1.62e42fefa39efp-1
+#define P_LO_STD 0x1.abc9e3b39803fp-56
+#define P_INV_TBL 0x1.71547652b82fep+6
+#define P_HI_TBL 0x1.62e42fefa39efp-7
+#define P_LO_TBL 0x1.abc9e3b39803fp-62
+#define X_MAX 0x1.65p+9
+#define X_MIN -0x1.5p+5
+
+// We use the EPsim version of expD to compute expm1
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, expm1) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx, vy, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx = x; \
+ \
+ EXCEPTION_HANDLING_EXPM1 (vx, special_args, vy_special, vlen); \
+ \
+ /* Clip */ \
+ vx = FCLIP (vx, X_MIN, X_MAX, vlen); \
+ \
+ /* Argument reduction */ \
+ VFLOAT flt_n = __riscv_vfmul (vx, P_INV_STD, vlen); \
+ VINT n = __riscv_vfcvt_x (flt_n, vlen); \
+ flt_n = __riscv_vfcvt_f (n, vlen); \
+ \
+ VFLOAT r_tmp = __riscv_vfnmsac (vx, P_HI_STD, flt_n, vlen); \
+ VFLOAT r = __riscv_vfnmsub (flt_n, P_LO_STD, r_tmp, vlen); \
+ VFLOAT r_lo = __riscv_vfsub (r_tmp, r, vlen); \
+ r_lo = __riscv_vfnmsac (r_lo, P_LO_STD, flt_n, vlen); \
+ /* r is the reduced argument in working precision; but r + r_lo is extra \
+ // precise exp(r+r_lo) is 1 + (r+r_lo) + (r+r_lo)^2/2 + r^3 * \
+ polynomial(r) \
+ // 1 + r + r^2/2 + (r_lo + r * r_lo) + r^3 * polynomial(r) */ \
+ r_lo = __riscv_vfmacc (r_lo, r, r_lo, vlen); \
+ /* 1 + r + r^2/2 + r_lo + r^3 * polynomial(r) */ \
+ \
+ /* Compute P_head + P_tail as r + r^2/2 accurately */ \
+ VFLOAT r_prime \
+ = __riscv_vfmul (r, 0x1.0p-1, vlen); /* this coeff is 1/2 */ \
+ VFLOAT P_head = __riscv_vfmadd (r, r_prime, r, vlen); \
+ VFLOAT P_tail = __riscv_vfsub (r, P_head, vlen); \
+ P_tail = __riscv_vfmacc (P_tail, r, r_prime, vlen); \
+ \
+ /* Polynomial computation, we have a degree 11 polynomial */ \
+ VFLOAT poly_right = PSTEP ( \
+ 0x1.71ddf7aef0679p-19, r, \
+ PSTEP (0x1.27e4e210af311p-22, r, \
+ PSTEP (0x1.af5ff637cd647p-26, 0x1.1f6562eae5ba9p-29, r, vlen), \
+ vlen), \
+ vlen); \
+ \
+ VFLOAT poly_mid = PSTEP ( \
+ 0x1.6c16c16c166f3p-10, r, \
+ PSTEP (0x1.a01a01b0207e3p-13, 0x1.a01a01a4af90ap-16, r, vlen), vlen); \
+ \
+ VFLOAT poly_left = PSTEP ( \
+ 0x1.5555555555559p-3, r, \
+ PSTEP (0x1.5555555555556p-5, 0x1.111111110f62ap-7, r, vlen), vlen); \
+ \
+ VFLOAT r_sq = __riscv_vfmul (r, r, vlen); \
+ VFLOAT r_cube = __riscv_vfmul (r_sq, r, vlen); \
+ \
+ VFLOAT poly = __riscv_vfmadd (poly_right, r_cube, poly_mid, vlen); \
+ poly = __riscv_vfmadd (poly, r_cube, poly_left, vlen); \
+ /* At this point, exp(r) is 1 + P_head + P_tail + r_lo + r^3 * poly */ \
+ /* expm1(x) = 2^n ( 1 - 2^(-n) + P_head + P_tail + r_lo + r^3 * poly ) */ \
+ \
+ /* Compute 1 - 2^(-n) accurately \
+ // Note that n >= -61 because the input argument was clipped because \
+ // expm1(x) = 1 as long as x <= -54 log(2). \
+ // For the purpose of 1 - 2^(-n), n can be clipped to n <= 64 as well \
+ // Then 1 - 2^(-n) = A + a, where A := 1 - 2^(-n), and \
+ // a = 1.0 for n <= -54; a = -2^(-n) if n >= 54; and a = 0 otherwise \
+ // While it is true we can use a KNUTH2SUM to compute A and a using 6 \
+ // floating-point instructions, we can obtain A and a with just one \
+ // floating-point instructions and other simple integer instructions. \
+ // This should be more performant on most hardware implementations as \
+ // integer instructions have lower latency in general and possibly using \
+ // hardware resources different from that for floating point. */ \
+ VFLOAT One = VFMV_VF (fp_posOne, vlen); \
+ VINT n_clip = __riscv_vmin (n, 64, vlen); \
+ /* n_clip <= 64; note that n_clip >= -61 */ \
+ VBOOL n_le53 = __riscv_vmsle (n_clip, 53, vlen); \
+ VBOOL n_le_neg54 = __riscv_vmsle (n_clip, -54, vlen); \
+ VINT I_tail = __riscv_vrsub (n_clip, -(EXP_BIAS + 2), vlen); \
+ /* The 12 lsb of I_tail is (sign,expo) of -2^(-n_clip) */ \
+ VFLOAT u = I_AS_F ( \
+ __riscv_vsll (I_tail, MAN_LEN, vlen)); /* u = -2^(-n_clip) */ \
+ I_tail = __riscv_vmerge (I_tail, 0, n_le53, vlen); \
+ I_tail = __riscv_vmerge (I_tail, EXP_BIAS, n_le_neg54, vlen); \
+ VFLOAT a = I_AS_F (__riscv_vsll (I_tail, MAN_LEN, vlen)); \
+ /* a is 1.0, 0, -2^(-n_clip) */ \
+ /* for n <= -54, -53 <= n <= 53, 54 <= n, respectively */ \
+ VFLOAT A = __riscv_vfadd (One, u, vlen); \
+ \
+ /* Compute A + a + P_head + P_tail + r_lo + r^3 * poly */ \
+ P_tail = __riscv_vfadd (P_tail, a, vlen); \
+ P_tail = __riscv_vfadd (P_tail, r_lo, vlen); \
+ poly = __riscv_vfmadd (poly, r_cube, P_tail, vlen); \
+ P_head = __riscv_vfadd (P_head, poly, vlen); \
+ \
+ vy = __riscv_vfadd (P_head, A, vlen); \
+ /* vy is now exp(r) - 1 accurately. */ \
+ \
+ /* Need to compute 2^n * exp(r). */ \
+ FAST_LDEXP (vy, n, vlen); \
+ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_lgamma.c b/sysdeps/riscv/rvd/v_d_lgamma.c
new file mode 100644
index 0000000000..3937bac7af
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_lgamma.c
@@ -0,0 +1,647 @@
+/* Double-precision vector lgamma function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_LGAMMAD_VSET_CONFIG
+
+#include "rvvlm_gammafuncsD.h"
+
+// Gamma(x) ~ (x-1)(x-2) * P(t)/Q(t), t = x - (3/2 - 1/8) x in [1-1/4, 2+1/4]
+// Coefficients P in ascending order:
+// P coeffs in ascending order
+#define P_left_0 0x717ad571ef5dc61 // Q_69
+#define P_left_1 0x13b685202b6fd6a4 // Q_69
+#define P_left_2 0x16296a1296488970 // Q_69
+#define P_left_3 0x19ca4fa8bc759bde // Q_70
+#define P_left_4 0x1079dccb79c3089c // Q_71
+#define P_left_5 0x16847f57936dc8fb // Q_74
+#define P_left_6 0x1d546ec89ba5d14e // Q_78
+#define P_left_7 0x1ad0cdf5663cfacf // Q_83
+#define P_left_8 0x1f1f571e9999b6c7 // Q_92
+
+#define Q_left_0 0x3877618277fdb07 // Q_67
+#define Q_left_1 0x15e97ae7797617d9 // Q_68
+#define Q_left_2 0x1c1a630c4311e499 // Q_68
+#define Q_left_3 0x133731ff844280b6 // Q_68
+#define Q_left_4 0x1e248b8ebb59488b // Q_70
+#define Q_left_5 0x1b1f236d04a448b5 // Q_72
+#define Q_left_6 0x1a6052955f6252ac // Q_75
+#define Q_left_7 0x17e477b664d95b52 // Q_79
+#define Q_left_8 0x1b97bd09b9b48410 // Q_85
+
+//---Approximate log(x) by w + w^3 poly(w^2)
+// w = 2(x-1)/(x+1), x roughly in [1/rt(2), rt(2)]
+#define P_log_0 0x5555555555555090 // Q_66
+#define P_log_1 0x666666666686863a // Q_69
+#define P_log_2 0x49249248fc99ba4b // Q_71
+#define P_log_3 0x71c71ca402e164fa // Q_74
+#define P_log_4 0x5d1733e3ae94dde0 // Q_76
+#define P_log_5 0x4ec8b69784234032 // Q_78
+#define P_log_6 0x43cc44a056dc3c93 // Q_80
+#define P_log_7 0x4432439bb76e7d74 // Q_82
+
+#define LOG2_HI 0x1.62e42fefa4000p-1
+#define LOG2_LO -0x1.8432a1b0e2634p-43
+
+// Correction to log_stirling formula
+// logGamma(x) - log_sterling(x) ~ P(t)/Q(t), t = 1/x
+// x in [2.25 x_max], t in (0, 1/2.25)
+// Coefficients P in ascending order:
+#define P_LS_corr_0 0x13eb19ce38760e4 // Q_82
+#define P_LS_corr_1 0x54ebdd91a33a236 // Q_82
+#define P_LS_corr_2 0xf5302c2f3171924 // Q_82
+#define P_LS_corr_3 0x17e6ca6c67d42c45 // Q_82
+#define P_LS_corr_4 0x18e683b7eb793968 // Q_82
+#define P_LS_corr_5 0xe6a7d68df697b37 // Q_82
+#define P_LS_corr_6 0x48f07444527e095 // Q_82
+#define P_LS_corr_7 0x5ac3ca10d36d7d // Q_82
+#define P_LS_corr_8 -0x115718edb07d53 // Q_82
+
+#define Q_LS_corr_0 0x2f8b7a297052f62 // Q_82
+#define Q_LS_corr_1 0xc13fa37f8190cf5 // Q_82
+#define Q_LS_corr_2 0x222d203fd991122d // Q_82
+#define Q_LS_corr_3 0x32462b6d38e0bfd3 // Q_82
+#define Q_LS_corr_4 0x31844bff55d6561a // Q_82
+#define Q_LS_corr_5 0x18c83406788ab40e // Q_82
+#define Q_LS_corr_6 0x643329f595fac69 // Q_82
+#define Q_LS_corr_7 -0x21b0b1bff373cd // Q_82
+#define Q_LS_corr_8 -0xc9c05b696db07 // Q_82
+
+//---Approximate log(sin(pi x)/(pi x)) as x^2 poly(x^2)
+#define P_logsin_0 -0x34a34cc4a60fa863 // Q_61
+#define P_logsin_1 -0x1151322ac7d51d2e // Q_61
+#define P_logsin_2 -0xada0658820c4e34 // Q_61
+#define P_logsin_3 -0x80859b50a7b1918 // Q_61
+#define P_logsin_4 -0x66807a019daf246 // Q_61
+#define P_logsin_5 -0x555a97e7d8482c8 // Q_61
+#define P_logsin_6 -0x4927ceefdc18f62 // Q_61
+#define P_logsin_7 -0x3fe5862d4e702a2 // Q_61
+#define P_logsin_8 -0x39da522c5099734 // Q_61
+#define P_logsin_9 -0x2cbb6825e3efaad // Q_61
+#define P_logsin_10 -0x4df815d2f21e674 // Q_61
+#define P_logsin_11 0x41cf7e791cb446c // Q_61
+#define P_logsin_12 -0x126ea0159b1a7052 // Q_61
+#define P_logsin_13 0x155103f2634da2c6 // Q_61
+#define P_logsin_14 -0x13e497482ec6dff4 // Q_61
+
+//---Approximate exp(R) by 1 + R + R^2*poly(R)
+#define P_exp_0 0x400000000000004e // Q_63
+#define P_exp_1 0x1555555555555b6e // Q_63
+#define P_exp_2 0x555555555553378 // Q_63
+#define P_exp_3 0x1111111110ec10d // Q_63
+#define P_exp_4 0x2d82d82d87a9b5 // Q_63
+#define P_exp_5 0x6806806ce6d6f // Q_63
+#define P_exp_6 0xd00d00841fcf // Q_63
+#define P_exp_7 0x171ddefda54b // Q_63
+#define P_exp_8 0x24fcc01d627 // Q_63
+#define P_exp_9 0x35ed8bbd24 // Q_63
+#define P_exp_10 0x477745b6c // Q_63
+
+//---Approximate Stirling correction by P(t)/Q(t)
+// Gamma(x) = (x/e)^(x-1/2) * P(t)/Q(t), t = 1/x, x in [2, 180]
+#define P_corr_0 0x599ecf7a9368327 // Q_78
+#define P_corr_1 0x120a4be8e3d8673d // Q_78
+#define P_corr_2 0x2ab73aec63e90213 // Q_78
+#define P_corr_3 0x32f903e18454e088 // Q_78
+#define P_corr_4 0x29f463d533d0a4b5 // Q_78
+#define P_corr_5 0x1212989fdf61f6c1 // Q_78
+#define P_corr_6 0x48706d4f75a0491 // Q_78
+#define P_corr_7 0x5591439d2d51a6 // Q_78
+
+#define Q_corr_0 0x75e5053ce715a76 // Q_79
+#define Q_corr_1 0x171e2068d3ef7453 // Q_79
+#define Q_corr_2 0x363d736690f2373f // Q_79
+#define Q_corr_3 0x3e793a1cc19bbc32 // Q_79
+#define Q_corr_4 0x31dc2fbf92ec978c // Q_79
+#define Q_corr_5 0x138c2244d1c1e0b1 // Q_79
+#define Q_corr_6 0x450a7392d81c20f // Q_79
+#define Q_corr_7 0x1ed9c605221435 // Q_79
+
+//---Approximate sin(pi x)/pi as x + x^3 poly(x^2)
+#define P_sin_0 -0x694699894c1f4ae7 // Q_62
+#define P_sin_1 0x33f396805788034f // Q_62
+#define P_sin_2 -0xc3547239048c220 // Q_62
+#define P_sin_3 0x1ac6805cc1cecf4 // Q_62
+#define P_sin_4 -0x26702d2fd5a3e6 // Q_62
+#define P_sin_5 0x26e8d360232c6 // Q_62
+#define P_sin_6 -0x1d3e4d9787ba // Q_62
+#define P_sin_7 0x107298fc107 // Q_62
+
+// lgamma(qNaN/sNaN) is qNaN, invalid if input is sNaN
+// lgamma(+-Inf) is +Inf
+// lgamma(+-0) is +Inf and divide by zero
+// lgamma(x) = -log(|x|) when |x| < 2^(-60)
+#define EXCEPTION_HANDLING_LGAMMA(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT expo_x = __riscv_vand ( \
+ __riscv_vsrl (F_AS_U ((vx)), MAN_LEN, (vlen)), 0x7FF, (vlen)); \
+ VBOOL x_small = __riscv_vmsltu (expo_x, EXP_BIAS - 60, (vlen)); \
+ VBOOL x_InfNaN = __riscv_vmseq (expo_x, 0x7FF, (vlen)); \
+ (special_args) = __riscv_vmor (x_small, x_InfNaN, (vlen)); \
+ if (__riscv_vcpop ((special_args), (vlen)) > 0) \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ VBOOL x_Inf; \
+ IDENTIFY (vclass, class_Inf, x_Inf, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posInf, x_Inf, (vlen)); \
+ VBOOL x_Zero; \
+ IDENTIFY (vclass, 0x18, x_Zero, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posZero, x_Zero, (vlen)); \
+ VFLOAT y_tmp = (vx); \
+ VFLOAT y_0 = __riscv_vfrec7 (x_Zero, (vx), (vlen)); \
+ y_tmp = __riscv_vmerge (y_tmp, y_0, x_Zero, (vlen)); \
+ (vy_special) = __riscv_vfadd ((special_args), (vx), y_tmp, (vlen)); \
+ x_small = __riscv_vmandn ( \
+ x_small, __riscv_vmfeq ((vx), fp_posZero, (vlen)), (vlen)); \
+ if (__riscv_vcpop (x_small, (vlen)) > 0) \
+ { \
+ VFLOAT x_tmp = VFMV_VF (fp_posOne, (vlen)); \
+ x_tmp = __riscv_vmerge (x_tmp, (vx), x_small, (vlen)); \
+ x_tmp = __riscv_vfsgnj (x_tmp, fp_posOne, (vlen)); \
+ VFLOAT zero = VFMV_VF (fp_posZero, (vlen)); \
+ VFLOAT y_hi, y_lo; \
+ LGAMMA_LOG (x_tmp, zero, zero, y_hi, y_lo, (vlen)); \
+ y_hi = __riscv_vfadd (y_hi, y_lo, (vlen)); \
+ y_hi = __riscv_vfsgnj (y_hi, fp_posOne, (vlen)); \
+ (vy_special) \
+ = __riscv_vmerge ((vy_special), y_hi, x_small, (vlen)); \
+ } \
+ (vx) = __riscv_vfmerge ((vx), 0x1.0p2, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+
+// This macro computes loggamma(x) for 0 < x <= 2.25
+// It uses a rational approximation for x in [0.75, 2.25]
+// For x < 0.75, it uses the relation gamma(x) = gamma(x+1)/x
+#define LGAMMA_LE_225(x, logx_hi, logx_lo, y_hi, y_lo, vlen) \
+ do \
+ { \
+ VBOOL x_lt_75 = __riscv_vmflt ((x), 0x1.8p-1, (vlen)); \
+ VFLOAT c = VFMV_VF (0x1.6p63, (vlen)); \
+ c = __riscv_vfmerge (c, 0x1.8p61, x_lt_75, (vlen)); \
+ VFLOAT rt1 = VFMV_VF (0x1.0p0, (vlen)); \
+ VFLOAT rt2 = VFMV_VF (0x1.0p1, (vlen)); \
+ rt1 = __riscv_vfmerge (rt1, fp_posZero, x_lt_75, (vlen)); \
+ rt2 = __riscv_vfmerge (rt2, fp_posOne, x_lt_75, (vlen)); \
+ VFLOAT t_hi = __riscv_vfmsub ((x), 0x1.0p63, c, (vlen)); \
+ VFLOAT t_lo = __riscv_vfadd (t_hi, c, (vlen)); \
+ t_lo = __riscv_vfmsub ((x), 0x1.0p63, t_lo, (vlen)); \
+ VFLOAT fact1 = __riscv_vfsub ((x), rt1, (vlen)); \
+ VINT T = __riscv_vfcvt_x (t_hi, (vlen)); \
+ T = __riscv_vadd (T, __riscv_vfcvt_x (t_lo, (vlen)), (vlen)); \
+ VFLOAT fact2_hi = __riscv_vfsub ((x), rt2, (vlen)); \
+ VFLOAT fact2_lo = __riscv_vfadd (fact2_hi, rt2, (vlen)); \
+ fact2_lo = __riscv_vfsub ((x), fact2_lo, (vlen)); \
+ VFLOAT fact_hi, fact_lo; \
+ PROD_X1Y2 (fact1, fact2_hi, fact2_lo, fact_hi, fact_lo, (vlen)); \
+ VINT P = PSTEP_I_SRA (P_left_7, P_left_8, 9, T, (vlen)); \
+ P = PSTEP_I_SRA (P_left_6, T, 5, P, (vlen)); \
+ P = PSTEP_I_SRA (P_left_5, T, 4, P, (vlen)); \
+ P = PSTEP_I_SRA (P_left_4, T, 3, P, (vlen)); \
+ P = PSTEP_I_SRA (P_left_3, T, 1, P, (vlen)); \
+ P = PSTEP_I_SRA (P_left_2, T, 1, P, (vlen)); \
+ P = PSTEP_I (P_left_1, T, P, (vlen)); \
+ P = PSTEP_I (P_left_0, T, P, (vlen)); \
+ VINT Q = PSTEP_I_SRA (Q_left_7, Q_left_8, 6, T, (vlen)); \
+ Q = PSTEP_I_SRA (Q_left_6, T, 4, Q, (vlen)); \
+ Q = PSTEP_I_SRA (Q_left_5, T, 3, Q, (vlen)); \
+ Q = PSTEP_I_SRA (Q_left_4, T, 2, Q, (vlen)); \
+ Q = PSTEP_I_SRA (Q_left_3, T, 2, Q, (vlen)); \
+ Q = PSTEP_I (Q_left_1, T, PSTEP_I (Q_left_2, T, Q, (vlen)), (vlen)); \
+ Q = PSTEP_I_SRA (Q_left_0, T, 1, Q, (vlen)); \
+ /* P is in Q69 and Q is in Q67 */ \
+ VFLOAT p_hi = __riscv_vfcvt_f (P, (vlen)); \
+ VFLOAT p_lo = __riscv_vfcvt_f ( \
+ __riscv_vsub (P, __riscv_vfcvt_x (p_hi, (vlen)), (vlen)), (vlen)); \
+ VFLOAT q_hi = __riscv_vfcvt_f (Q, (vlen)); \
+ VFLOAT q_lo = __riscv_vfcvt_f ( \
+ __riscv_vsub (Q, __riscv_vfcvt_x (q_hi, (vlen)), (vlen)), (vlen)); \
+ VFLOAT z_hi, z_lo; \
+ DIV2_N2D2 (p_hi, p_lo, q_hi, q_lo, z_hi, z_lo, (vlen)); \
+ z_hi = __riscv_vfmul (z_hi, 0x1.0p-2, (vlen)); \
+ z_lo = __riscv_vfmul (z_lo, 0x1.0p-2, (vlen)); \
+ PROD_X2Y2 (z_hi, z_lo, fact_hi, fact_lo, (y_hi), (y_lo), (vlen)); \
+ /* if original input is in (0, 3/4), need to add -log(x) */ \
+ VFLOAT A, a; \
+ A = I_AS_F (__riscv_vxor (F_AS_I (A), F_AS_I (A), (vlen))); \
+ a = I_AS_F (__riscv_vxor (F_AS_I (a), F_AS_I (a), (vlen))); \
+ A = __riscv_vmerge (A, (logx_hi), x_lt_75, (vlen)); \
+ a = __riscv_vmerge (a, (logx_lo), x_lt_75, (vlen)); \
+ /* y_hi + y_lo - (A + a), A is either 0 or dominates */ \
+ z_hi = __riscv_vfsub ((y_hi), A, (vlen)); \
+ z_lo = __riscv_vfadd (z_hi, A, (vlen)); \
+ z_lo = __riscv_vfsub ((y_hi), z_lo, (vlen)); \
+ (y_lo) = __riscv_vfadd ((y_lo), z_lo, (vlen)); \
+ (y_lo) = __riscv_vfsub ((y_lo), a, (vlen)); \
+ (y_hi) = z_hi; \
+ } \
+ while (0)
+
+//---Compute log(x/e) or log(x) to 2^(-65) absolute accuracy
+// log(x) - c, c is 1 or 0; x > 0
+#define LGAMMA_LOG(x_hi, x_lo, c, y_hi, y_lo, vlen) \
+ do \
+ { \
+ /* need x_hi, x_lo as input */ \
+ VFLOAT x_in_hi = (x_hi); \
+ VFLOAT x_in_lo = (x_lo); \
+ VINT n_adjust; \
+ n_adjust = __riscv_vxor (n_adjust, n_adjust, (vlen)); \
+ VBOOL x_tiny = __riscv_vmflt (x_in_hi, 0x1.0p-1020, (vlen)); \
+ if (__riscv_vcpop (x_tiny, (vlen)) > 0) \
+ { \
+ VFLOAT x_adjust \
+ = __riscv_vfmul (x_tiny, x_in_hi, 0x1.0p60, (vlen)); \
+ x_in_hi = __riscv_vmerge (x_in_hi, x_adjust, x_tiny, (vlen)); \
+ x_adjust = __riscv_vfmul (x_tiny, x_in_lo, 0x1.0p60, (vlen)); \
+ x_in_lo = __riscv_vmerge (x_in_lo, x_adjust, x_tiny, (vlen)); \
+ n_adjust = __riscv_vmerge (n_adjust, 60, x_tiny, (vlen)); \
+ } \
+ VINT n = __riscv_vadd ( \
+ __riscv_vsra (F_AS_I (x_in_hi), MAN_LEN - 8, (vlen)), 0x96, vlen); \
+ n = __riscv_vsub (__riscv_vsra (n, 8, vlen), EXP_BIAS, vlen); \
+ VFLOAT scale = I_AS_F (__riscv_vsll ( \
+ __riscv_vrsub (n, EXP_BIAS, (vlen)), MAN_LEN, (vlen))); \
+ x_in_hi = __riscv_vfmul (x_in_hi, scale, (vlen)); \
+ x_in_lo = __riscv_vfmul (x_in_lo, scale, (vlen)); \
+ n = __riscv_vsub (n, n_adjust, (vlen)); \
+ /* x is scaled, and log(x) is 2 atanh(w/2); w = 2(x-1)/(x+1) */ \
+ \
+ VFLOAT numer, denom, denom_delta; \
+ numer = __riscv_vfsub (x_in_hi, fp_posOne, (vlen)); /* exact */ \
+ denom = __riscv_vfadd (x_in_hi, fp_posOne, (vlen)); \
+ denom_delta = __riscv_vfadd (__riscv_vfrsub (denom, fp_posOne, (vlen)), \
+ x_in_hi, (vlen)); \
+ denom_delta = __riscv_vfadd (denom_delta, x_in_lo, (vlen)); \
+ VFLOAT w_hi, w_lo; \
+ ACC_DIV2_N2D2 (numer, x_in_lo, denom, denom_delta, w_hi, w_lo, vlen); \
+ /* w_hi + w_lo is at this point (x-1)/(x+1) */ \
+ /* Next get 2(x-1)/(x+1) in Q64 fixed point */ \
+ VINT W \
+ = __riscv_vfcvt_x (__riscv_vfmul (w_hi, 0x1.0p65, (vlen)), (vlen)); \
+ W = __riscv_vadd ( \
+ W, \
+ __riscv_vfcvt_x (__riscv_vfmul (w_lo, 0x1.0p65, (vlen)), (vlen)), \
+ (vlen)); \
+ /* W is in Q64 because W is 2(x-1)/(x+1) */ \
+ \
+ VFLOAT n_flt = __riscv_vfcvt_f (n, (vlen)); \
+ VINT W2 = __riscv_vsmul (W, W, 1, (vlen)); /* Q65 */ \
+ \
+ VINT P_right, P_left, W8; \
+ P_right = PSTEP_I_SRA (P_log_6, P_log_7, 4, W2, (vlen)); \
+ P_right = PSTEP_I_SRA (P_log_5, W2, 4, P_right, (vlen)); \
+ P_right = PSTEP_I_SRA (P_log_4, W2, 4, P_right, (vlen)); \
+ /* P_right in Q76 */ \
+ P_left = PSTEP_I_SRA (P_log_2, P_log_3, 5, W2, (vlen)); \
+ P_left = PSTEP_I_SRA (P_log_1, W2, 4, P_left, (vlen)); \
+ P_left = PSTEP_I_SRA (P_log_0, W2, 5, P_left, (vlen)); \
+ /* P_left in Q66 */ \
+ W8 = __riscv_vsmul (W2, W2, 1, (vlen)); /* Q67 */ \
+ W8 = __riscv_vsmul (W8, W8, 1, (vlen)); /* Q71 */ \
+ P_right = __riscv_vsmul (P_right, W8, 1, (vlen)); /* Q84 */ \
+ P_right = __riscv_vsra (P_right, 18, (vlen)); /* Q66 */ \
+ P_left = __riscv_vadd (P_left, P_right, (vlen)); /* Q66 */ \
+ \
+ VINT W3 = __riscv_vsmul (W2, W, 1, (vlen)); /* Q66 */ \
+ P_left = __riscv_vsmul (P_left, W3, 1, (vlen)); /* Q69 */ \
+ VFLOAT poly_hi = __riscv_vfcvt_f (P_left, (vlen)); \
+ P_left \
+ = __riscv_vsub (P_left, __riscv_vfcvt_x (poly_hi, (vlen)), (vlen)); \
+ VFLOAT poly_lo = __riscv_vfcvt_f (P_left, (vlen)); \
+ poly_hi = __riscv_vfmul (poly_hi, 0x1.0p-69, (vlen)); \
+ poly_lo = __riscv_vfmul (poly_lo, 0x1.0p-69, (vlen)); \
+ \
+ /* n*log(2) - c + w + poly is the desired result */ \
+ VFLOAT A, B; \
+ A = __riscv_vfmul (n_flt, LOG2_HI, (vlen)); /* exact */ \
+ A = __riscv_vfsub (A, (c), (vlen)); /* exact due to A's range */ \
+ w_hi = __riscv_vfadd (w_hi, w_hi, (vlen)); \
+ w_lo = __riscv_vfadd (w_lo, w_lo, (vlen)); \
+ FAST2SUM (A, w_hi, B, (y_lo), (vlen)); \
+ w_lo = __riscv_vfadd ((y_lo), w_lo, (vlen)); \
+ w_lo = __riscv_vfmacc (w_lo, LOG2_LO, n_flt, (vlen)); \
+ poly_lo = __riscv_vfadd (poly_lo, w_lo, (vlen)); \
+ FAST2SUM (B, poly_hi, (y_hi), (y_lo), (vlen)); \
+ (y_lo) = __riscv_vfadd ((y_lo), poly_lo, (vlen)); \
+ } \
+ while (0)
+
+// Use Stirling approximation with correction when x >= 9/4
+// on input logx_hi, logx_lo is log(x)-1
+// result is returned in y_hi, y_lo
+#define LGAMMA_LOG_STIRLING(x, logx_hi, logx_lo, y_hi, y_lo, expo_adj, vlen) \
+ do \
+ { \
+ VFLOAT x_in = (x); \
+ VBOOL adjust_x = __riscv_vmfge (x_in, 0x1.0p+200, (vlen)); \
+ (expo_adj) = __riscv_vmerge ((expo_adj), 100, adjust_x, (vlen)); \
+ VINT m = __riscv_vrsub ((expo_adj), EXP_BIAS, (vlen)); \
+ VFLOAT scale = I_AS_F (__riscv_vsll (m, MAN_LEN, (vlen))); \
+ x_in = __riscv_vfmul (x_in, scale, (vlen)); \
+ VFLOAT w_hi, w_lo; \
+ w_hi = __riscv_vfsub (x_in, 0x1.0p-1, (vlen)); \
+ w_lo = __riscv_vfsub (x_in, w_hi, (vlen)); \
+ w_lo = __riscv_vfsub (w_lo, 0x1.0p-1, (vlen)); \
+ PROD_X2Y2 (w_hi, w_lo, (logx_hi), (logx_lo), (y_hi), (y_lo), (vlen)); \
+ } \
+ while (0)
+
+// Compute log(x*r) + log(|sin(pi r)/(pi r)|) where x = N + r, |r| <= 1/2
+// This is for handling of gamma at negative arguments where
+// we have a denominator of x sin(pi x)/pi.
+// Since taking the log of |sin(pi x)/pi|, same as log |sin(pi r)/pi|
+// is more easily done with doing log(|r|) + log|sin(pi r)/(pi r)|
+// as the latter can be approximated by r^2 poly(r^2).
+// The term log(|r|) is combined with log(|x|) by log(|r * x|)
+// This macro also sets special arguments when x is of integral value
+// The macro assumes x > 0 and it suffices to clip it to 2^52 as x will be
+// of integral value at and beyond 2^52.
+#define LGAMMA_LOGSIN(x, y_hi, y_lo, vy_special, special_args, vlen) \
+ do \
+ { \
+ VFLOAT x_in = __riscv_vfmin ((x), 0x1.0p+52, (vlen)); \
+ VFLOAT n_flt; \
+ VINT n = __riscv_vfcvt_x (x_in, (vlen)); \
+ n_flt = __riscv_vfcvt_f (n, (vlen)); \
+ VFLOAT r = __riscv_vfsub (x_in, n_flt, (vlen)); \
+ VBOOL pole = __riscv_vmfeq (r, fp_posZero, (vlen)); \
+ if (__riscv_vcpop (pole, (vlen)) > 0) \
+ { \
+ r = __riscv_vfmerge (r, 0x1.0p-1, pole, (vlen)); \
+ (special_args) = __riscv_vmor ((special_args), pole, (vlen)); \
+ (vy_special) \
+ = __riscv_vfmerge ((vy_special), fp_posInf, pole, (vlen)); \
+ } \
+ VFLOAT rsq = __riscv_vfmul (r, r, (vlen)); \
+ VFLOAT rsq_lo = __riscv_vfmsub (r, r, rsq, (vlen)); \
+ VINT Rsq \
+ = __riscv_vfcvt_x (__riscv_vfmul (rsq, 0x1.0p63, (vlen)), (vlen)); \
+ Rsq = __riscv_vadd ( \
+ Rsq, \
+ __riscv_vfcvt_x (__riscv_vfmul (rsq_lo, 0x1.0p63, (vlen)), (vlen)), \
+ (vlen)); \
+ VINT P_right = PSTEP_I ( \
+ P_logsin_8, Rsq, \
+ PSTEP_I ( \
+ P_logsin_9, Rsq, \
+ PSTEP_I (P_logsin_10, Rsq, \
+ PSTEP_I (P_logsin_11, Rsq, \
+ PSTEP_I (P_logsin_12, Rsq, \
+ PSTEP_I (P_logsin_13, P_logsin_14, \
+ Rsq, (vlen)), \
+ (vlen)), \
+ (vlen)), \
+ (vlen)), \
+ (vlen)), \
+ (vlen)); \
+ \
+ VINT P_left = PSTEP_I ( \
+ P_logsin_0, Rsq, \
+ PSTEP_I (P_logsin_1, Rsq, \
+ PSTEP_I (P_logsin_2, Rsq, \
+ PSTEP_I (P_logsin_3, Rsq, \
+ PSTEP_I (P_logsin_4, Rsq, \
+ PSTEP_I (P_logsin_5, Rsq, \
+ PSTEP_I (P_logsin_6, \
+ P_logsin_7, \
+ Rsq, (vlen)), \
+ (vlen)), \
+ (vlen)), \
+ (vlen)), \
+ (vlen)), \
+ (vlen)), \
+ (vlen)); \
+ VINT R16 = __riscv_vsmul (Rsq, Rsq, 1, (vlen)); \
+ R16 = __riscv_vsmul (R16, R16, 1, (vlen)); \
+ R16 = __riscv_vsmul (R16, R16, 1, (vlen)); \
+ P_right = __riscv_vsmul (R16, P_right, 1, (vlen)); \
+ P_left = __riscv_vadd (P_left, P_right, (vlen)); \
+ VFLOAT z_hi = __riscv_vfcvt_f (P_left, (vlen)); \
+ P_right = __riscv_vfcvt_x (z_hi, (vlen)); \
+ VFLOAT z_lo \
+ = __riscv_vfcvt_f (__riscv_vsub (P_left, P_right, (vlen)), (vlen)); \
+ z_hi = __riscv_vfmul (z_hi, 0x1.0p-61, (vlen)); \
+ z_lo = __riscv_vfmul (z_lo, 0x1.0p-61, (vlen)); \
+ VFLOAT ls_hi, ls_lo; \
+ PROD_X2Y2 (z_hi, z_lo, rsq, rsq_lo, ls_hi, ls_lo, (vlen)); \
+ /* At this point we have log|sin(pi r)/(pi r)| */ \
+ \
+ /* we now compute log(|x r|); 2^(-60) <= x <= 2^52 by design */ \
+ VFLOAT xr_hi, xr_lo; \
+ r = __riscv_vfsgnj (r, fp_posOne, (vlen)); \
+ PROD_X1Y1 (r, x_in, xr_hi, xr_lo, (vlen)); \
+ VFLOAT logx_hi, logx_lo, c; \
+ c = I_AS_F (__riscv_vxor (F_AS_I (c), F_AS_I (c), (vlen))); \
+ LGAMMA_LOG (xr_hi, xr_lo, c, logx_hi, logx_lo, (vlen)); \
+ VFLOAT S_hi, S_lo; \
+ KNUTH2SUM (ls_hi, logx_hi, S_hi, S_lo, (vlen)); \
+ logx_lo = __riscv_vfadd (logx_lo, ls_lo, (vlen)); \
+ (y_lo) = __riscv_vfadd (S_lo, logx_lo, (vlen)); \
+ (y_hi) = S_hi; \
+ } \
+ while (0)
+
+// LogGamma based on Stirling formula is
+// LogGamma(x) ~ (x-1/2)*(log(x)-1) + poly(1/x)
+// This poly(1/x) is in essense a correction term
+// This form is used when x >= 9/4. We use Q63 to represent 1/x
+#define LOG_STIRLING_CORRECTION(x, y_hi, y_lo, vlen) \
+ do \
+ { \
+ VFLOAT x_in = __riscv_vfmin ((x), 0x1.0p80, (vlen)); \
+ VFLOAT z_hi = __riscv_vfrdiv (x_in, fp_posOne, (vlen)); \
+ VFLOAT z_lo = VFMV_VF (fp_posOne, (vlen)); \
+ z_lo = __riscv_vfnmsub (x_in, z_hi, z_lo, (vlen)); \
+ z_lo = __riscv_vfmul (z_hi, z_lo, (vlen)); \
+ z_hi = __riscv_vfmul (z_hi, 0x1.0p63, (vlen)); \
+ z_lo = __riscv_vfmul (z_lo, 0x1.0p63, (vlen)); \
+ VINT R = __riscv_vfcvt_x (z_hi, (vlen)); \
+ R = __riscv_vadd (R, __riscv_vfcvt_x (z_lo, (vlen)), (vlen)); \
+ VINT P_SC, Q_SC; \
+ /* R is 1/x in Q63 */ \
+ P_SC = PSTEP_I ( \
+ P_LS_corr_4, R, \
+ PSTEP_I (P_LS_corr_5, R, \
+ PSTEP_I (P_LS_corr_6, R, \
+ PSTEP_I (P_LS_corr_7, P_LS_corr_8, R, (vlen)), \
+ (vlen)), \
+ (vlen)), \
+ (vlen)); \
+ P_SC = PSTEP_I ( \
+ P_LS_corr_0, R, \
+ PSTEP_I (P_LS_corr_1, R, \
+ PSTEP_I (P_LS_corr_2, R, \
+ PSTEP_I (P_LS_corr_3, R, P_SC, (vlen)), (vlen)), \
+ (vlen)), \
+ (vlen)); \
+ Q_SC = PSTEP_I ( \
+ Q_LS_corr_4, R, \
+ PSTEP_I (Q_LS_corr_5, R, \
+ PSTEP_I (Q_LS_corr_6, R, \
+ PSTEP_I (Q_LS_corr_7, Q_LS_corr_8, R, (vlen)), \
+ (vlen)), \
+ (vlen)), \
+ (vlen)); \
+ Q_SC = PSTEP_I ( \
+ Q_LS_corr_0, R, \
+ PSTEP_I (Q_LS_corr_1, R, \
+ PSTEP_I (Q_LS_corr_2, R, \
+ PSTEP_I (Q_LS_corr_3, R, Q_SC, (vlen)), (vlen)), \
+ (vlen)), \
+ (vlen)); \
+ VFLOAT p_hi, p_lo, q_hi, q_lo; \
+ VINT P_tmp, Q_tmp; \
+ p_hi = __riscv_vfcvt_f (P_SC, (vlen)); \
+ P_tmp = __riscv_vfcvt_x (p_hi, (vlen)); \
+ p_lo = __riscv_vfcvt_f (__riscv_vsub (P_SC, P_tmp, (vlen)), (vlen)); \
+ q_hi = __riscv_vfcvt_f (Q_SC, (vlen)); \
+ Q_tmp = __riscv_vfcvt_x (q_hi, (vlen)); \
+ q_lo = __riscv_vfcvt_f (__riscv_vsub (Q_SC, Q_tmp, (vlen)), (vlen)); \
+ ACC_DIV2_N2D2 (p_hi, p_lo, q_hi, q_lo, (y_hi), (y_lo), (vlen)); \
+ } \
+ while (0)
+
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, lgamma) (VFLOAT x) \
+ { \
+ size_t vlen = VSET (simdlen); \
+ VFLOAT vx, vx_orig, vy, vy_special; \
+ VBOOL special_args; \
+ VFLOAT zero = VFMV_VF (fp_posZero, vlen); \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx = x; \
+ \
+ /* Handle Inf and NaN and Zero */ \
+ EXCEPTION_HANDLING_LGAMMA (vx, special_args, vy_special, vlen); \
+ vx_orig = vx; \
+ \
+ /* Work with the absolute value. \
+ // Modify loggamma(|x|) appropriately later on if x < 0.*/ \
+ vx = __riscv_vfabs (vx, vlen); \
+ vx_orig = __riscv_vfsgnj (vx, vx_orig, vlen); \
+ \
+ VBOOL x_lt_225 = __riscv_vmflt (vx, 0x1.2p+1, vlen); \
+ VFLOAT c = VFMV_VF (fp_posOne, vlen); \
+ c = __riscv_vfmerge (c, fp_posZero, x_lt_225, vlen); \
+ \
+ VFLOAT logx_hi, logx_lo; \
+ LGAMMA_LOG (vx, zero, c, logx_hi, logx_lo, vlen); \
+ \
+ VFLOAT y_left_hi, y_left_lo; \
+ if (__riscv_vcpop (x_lt_225, vlen) > 0) \
+ { \
+ /* Consider 0 < x < 2.25 to be rare cases */ \
+ VFLOAT vx_tmp; \
+ vx_tmp = VFMV_VF (0x1.0p0, vlen); \
+ vx_tmp = __riscv_vmerge (vx_tmp, vx, x_lt_225, vlen); \
+ LGAMMA_LE_225 (vx_tmp, logx_hi, logx_lo, y_left_hi, y_left_lo, vlen); \
+ } \
+ \
+ VFLOAT stir_hi, stir_lo; \
+ VFLOAT stir_corr_hi, stir_corr_lo; \
+ VINT expo_adj; \
+ expo_adj = __riscv_vxor (expo_adj, expo_adj, vlen); \
+ LGAMMA_LOG_STIRLING (vx, logx_hi, logx_lo, stir_hi, stir_lo, expo_adj, \
+ vlen); \
+ LOG_STIRLING_CORRECTION (vx, stir_corr_hi, stir_corr_lo, vlen); \
+ \
+ VFLOAT loggamma_hi, loggamma_lo; \
+ KNUTH2SUM (stir_hi, stir_corr_hi, loggamma_hi, loggamma_lo, vlen); \
+ loggamma_lo = __riscv_vfadd (loggamma_lo, stir_corr_lo, vlen); \
+ loggamma_lo = __riscv_vfadd (loggamma_lo, stir_lo, vlen); \
+ \
+ loggamma_hi = __riscv_vmerge (loggamma_hi, y_left_hi, x_lt_225, vlen); \
+ loggamma_lo = __riscv_vmerge (loggamma_lo, y_left_lo, x_lt_225, vlen); \
+ \
+ VBOOL x_lt_0 = __riscv_vmflt (vx_orig, fp_posZero, vlen); \
+ \
+ if (__riscv_vcpop (x_lt_0, vlen) > 0) \
+ { \
+ /* for negative x, the desired result is \
+ // log(1/gamma(|x|)) + log(1/(|x sin(pi x)/pi|)) \
+ // loggamma(|x|) is in loggamma_{hi,lo} \
+ // we use the macro to get log(|x sin(pi x)/ pi|) */ \
+ VFLOAT vx_for_neg = VFMV_VF (0x1.0p-1, vlen); \
+ vx_for_neg = __riscv_vmerge (vx_for_neg, vx, x_lt_0, vlen); \
+ VFLOAT logsin_hi, logsin_lo; \
+ LGAMMA_LOGSIN (vx_for_neg, logsin_hi, logsin_lo, vy_special, \
+ special_args, vlen); \
+ \
+ VFLOAT A, a; \
+ KNUTH2SUM (loggamma_hi, logsin_hi, A, a, vlen); \
+ a = __riscv_vfadd (a, logsin_lo, vlen); \
+ a = __riscv_vfadd (a, loggamma_lo, vlen); \
+ A = __riscv_vfsgnjx (A, fp_negOne, vlen); \
+ a = __riscv_vfsgnjx (a, fp_negOne, vlen); \
+ loggamma_hi = __riscv_vmerge (loggamma_hi, A, x_lt_0, vlen); \
+ loggamma_lo = __riscv_vmerge (loggamma_lo, a, x_lt_0, vlen); \
+ } \
+ loggamma_hi = __riscv_vfadd (loggamma_hi, loggamma_lo, vlen); \
+ expo_adj = __riscv_vadd (expo_adj, EXP_BIAS, vlen); \
+ vy = __riscv_vfmul ( \
+ loggamma_hi, I_AS_F (__riscv_vsll (expo_adj, MAN_LEN, vlen)), vlen); \
+ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_log.c b/sysdeps/riscv/rvd/v_d_log.c
new file mode 100644
index 0000000000..54c602c72a
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_log.c
@@ -0,0 +1,188 @@
+/* Double-precision vector log function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_LOGD_VSET_CONFIG
+
+#define COMPILE_FOR_LOG
+
+#define EXCEPTION_HANDLING_LOG(vx, special_args, vy_special, n_adjust, vlen) \
+ do \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ /* special handling except positive normal number */ \
+ IDENTIFY (vclass, 0x3BF, (special_args), (vlen)); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ n_adjust = VMVI_VX (0, (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ VBOOL id_mask; \
+ /* substitute negative arguments with sNaN */ \
+ IDENTIFY (vclass, class_negative, id_mask, (vlen)); \
+ vx = __riscv_vfmerge (vx, fp_sNaN, id_mask, vlen); \
+ /* substitute +0 argument with -0 */ \
+ IDENTIFY (vclass, class_posZero, id_mask, vlen); \
+ vx = __riscv_vfmerge (vx, fp_negZero, id_mask, vlen); \
+ /* eliminate positive denorm input from special_args */ \
+ IDENTIFY (vclass, 0x39F, (special_args), (vlen)); \
+ /* for narrowed set of special arguments, compute vx+vfrec7(vx) */ \
+ vy_special = __riscv_vfrec7 ((special_args), (vx), (vlen)); \
+ vy_special \
+ = __riscv_vfadd ((special_args), vy_special, (vx), (vlen)); \
+ vx = __riscv_vfmerge ((vx), fp_posOne, (special_args), (vlen)); \
+ /* scale up input for positive denormals */ \
+ IDENTIFY (vclass, class_posDenorm, id_mask, (vlen)); \
+ n_adjust = __riscv_vmerge (n_adjust, 64, id_mask, vlen); \
+ VFLOAT vx_normalized = __riscv_vfmul (id_mask, vx, 0x1.0p64, vlen); \
+ vx = __riscv_vmerge (vx, vx_normalized, id_mask, vlen); \
+ } \
+ } \
+ while (0)
+
+#define LOGB_2_HI 0x1.62e42fefa39efp-1
+#define LOGB_2_LO 0x1.abc9e3b39803fp-56
+#define LOGB_e_HI 0x1.0p0
+#define LOGB_e_LO 0.0
+
+// Version 1 uses a 128-entry LUT
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, log) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx, vy, vy_special; \
+ VBOOL special_args; \
+ VINT n_adjust; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx = x; \
+ \
+ /* NaN, Inf, and -ve handling, as well as scaling denormal input by 2^64 \
+ */ \
+ EXCEPTION_HANDLING_LOG (vx, special_args, vy_special, n_adjust, vlen); \
+ \
+ /* in_arg at this point are positive, finite and not subnormal \
+ Decompose in_arg into n, B, r: in_arg = 2^n (1/B) (1 + r) \
+ B is equivalently defined by ind, 0 <= ind < 128 */ \
+ VINT n = U_AS_I (__riscv_vadd ( \
+ __riscv_vsrl (F_AS_U (vx), MAN_LEN - 1, vlen), 1, vlen)); \
+ n = __riscv_vsra (n, 1, vlen); \
+ n = __riscv_vsub (n, EXP_BIAS, vlen); \
+ vx = U_AS_F ( \
+ __riscv_vsrl (__riscv_vsll (F_AS_U (vx), BIT_WIDTH - MAN_LEN, vlen), \
+ BIT_WIDTH - MAN_LEN, vlen)); \
+ vx = U_AS_F ( \
+ __riscv_vadd (F_AS_U (vx), (uint64_t)EXP_BIAS << MAN_LEN, vlen)); \
+ n = __riscv_vsub (n, n_adjust, vlen); \
+ VFLOAT n_flt = __riscv_vfcvt_f (n, vlen); \
+ VFLOAT B = __riscv_vfrec7 (vx, vlen); \
+ /* get 7 msb of mantissa, and left shift by 3 to get address */ \
+ VUINT ind = __riscv_vand (__riscv_vsrl (F_AS_U (vx), MAN_LEN - 10, vlen), \
+ 0x3F8, vlen); \
+ /* adjust B to be 1.0 if ind == 0 */ \
+ VBOOL adjust_B = __riscv_vmseq (ind, 0, vlen); \
+ B = __riscv_vfmerge (B, fp_posOne, adjust_B, vlen); \
+ /* finally get r = B * in_arg - 1.0 */ \
+ VFLOAT r = VFMV_VF (fp_posOne, vlen); \
+ r = __riscv_vfmsac (r, vx, B, vlen); \
+ \
+ /* Base-B log is logB(in_arg) = logB(2^n * 1/B) + logB(1 + r) \
+ (n + log2(1/B))*logB(2) + log(1+r)*logB(e) \
+ log2(1/B) is stored in a table \
+ and log(1+r) is approximated by r + poly \
+ poly is a polynomial in r in the form r^2 * (p0 + p1 r + ... ) \
+ To deliver this result accurately, one uses logB(2) and logB(e) \
+ with extra precision and sums the various terms in an appropriate \
+ order */ \
+ VFLOAT rsq = __riscv_vfmul (r, r, vlen); \
+ VFLOAT rcube = __riscv_vfmul (rsq, r, vlen); \
+ \
+ VFLOAT poly_right = PSTEP ( \
+ 0x1.9999998877038p-3, r, \
+ PSTEP (-0x1.555c54f8b7c6cp-3, 0x1.2499765b3c27ap-3, r, vlen), vlen); \
+ \
+ VFLOAT poly_left = PSTEP ( \
+ -0x1.000000000001cp-1, r, \
+ PSTEP (0x1.55555555555a9p-2, -0x1.fffffff2018cfp-3, r, vlen), vlen); \
+ \
+ VFLOAT poly = __riscv_vfmadd (poly_right, rcube, poly_left, vlen); \
+ poly = __riscv_vfmul (rsq, poly, vlen); \
+ /* log_e(1+r) is r + poly */ \
+ \
+ /* Load table values and get n_flt + T to be A + a */ \
+ VINT T = __riscv_vluxei64 (logD_tbl128_fixedpt, ind, vlen); \
+ VINT T_hi = __riscv_vsll (__riscv_vsra (T, 24, vlen), 24, vlen); \
+ VINT T_lo = __riscv_vsub (T, T_hi, vlen); \
+ VFLOAT T_hi_flt = __riscv_vfcvt_f (T_hi, vlen); \
+ VFLOAT A = __riscv_vfmadd (T_hi_flt, 0x1.0p-63, n_flt, vlen); \
+ VFLOAT a = __riscv_vfcvt_f (T_lo, vlen); \
+ a = __riscv_vfmul (a, 0x1.0p-63, vlen); \
+ \
+ /* Compute (A + a) * (logB_2_hi + logB_2_lo) + (r + P) * (logB_e_hi + \
+ logB_e_lo) where B can be e, 2, or 10 */ \
+ VFLOAT delta_1 = __riscv_vfmul (A, LOGB_2_LO, vlen); \
+ delta_1 = __riscv_vfmadd (a, LOGB_2_HI, delta_1, vlen); \
+ poly = __riscv_vfadd (poly, delta_1, vlen); \
+ \
+ poly = __riscv_vfadd (poly, r, vlen); \
+ \
+ vy = __riscv_vfmadd (A, LOGB_2_HI, poly, vlen); \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_log10.c b/sysdeps/riscv/rvd/v_d_log10.c
new file mode 100644
index 0000000000..9132e49d34
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_log10.c
@@ -0,0 +1,189 @@
+/* Double-precision vector log10 function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_LOGD_VSET_CONFIG
+
+#define COMPILE_FOR_LOG10
+
+#define EXCEPTION_HANDLING_LOG(vx, special_args, vy_special, n_adjust, vlen) \
+ do \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ /* special handling except positive normal number */ \
+ IDENTIFY (vclass, 0x3BF, (special_args), (vlen)); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ n_adjust = VMVI_VX (0, (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ VBOOL id_mask; \
+ /* substitute negative arguments with sNaN */ \
+ IDENTIFY (vclass, class_negative, id_mask, (vlen)); \
+ vx = __riscv_vfmerge (vx, fp_sNaN, id_mask, vlen); \
+ /* substitute +0 argument with -0 */ \
+ IDENTIFY (vclass, class_posZero, id_mask, vlen); \
+ vx = __riscv_vfmerge (vx, fp_negZero, id_mask, vlen); \
+ /* eliminate positive denorm input from special_args */ \
+ IDENTIFY (vclass, 0x39F, (special_args), (vlen)); \
+ /* for narrowed set of special arguments, compute vx+vfrec7(vx) */ \
+ vy_special = __riscv_vfrec7 ((special_args), (vx), (vlen)); \
+ vy_special \
+ = __riscv_vfadd ((special_args), vy_special, (vx), (vlen)); \
+ vx = __riscv_vfmerge ((vx), fp_posOne, (special_args), (vlen)); \
+ /* scale up input for positive denormals */ \
+ IDENTIFY (vclass, class_posDenorm, id_mask, (vlen)); \
+ n_adjust = __riscv_vmerge (n_adjust, 64, id_mask, vlen); \
+ VFLOAT vx_normalized = __riscv_vfmul (id_mask, vx, 0x1.0p64, vlen); \
+ vx = __riscv_vmerge (vx, vx_normalized, id_mask, vlen); \
+ } \
+ } \
+ while (0)
+
+#define LOGB_2_HI 0x1.34413509f79ffp-2
+#define LOGB_2_LO -0x1.9dc1da994fd00p-59
+#define LOGB_e_HI 0x1.bcb7b1526e50ep-2
+#define LOGB_e_LO 0x1.95355baaafad3p-57
+
+// Version 1 uses a 128-entry LUT
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, log10) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx, vy, vy_special; \
+ VBOOL special_args; \
+ VINT n_adjust; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx = x; \
+ \
+ /* NaN, Inf, and -ve handling, as well as scaling denormal input by \
+ * 2^64 */ \
+ EXCEPTION_HANDLING_LOG (vx, special_args, vy_special, n_adjust, vlen); \
+ \
+ /* in_arg at this point are positive, finite and not subnormal \
+ // Decompose in_arg into n, B, r: in_arg = 2^n (1/B) (1 + r) \
+ // B is equivalently defined by ind, 0 <= ind < 128 */ \
+ VINT n = U_AS_I (__riscv_vadd ( \
+ __riscv_vsrl (F_AS_U (vx), MAN_LEN - 1, vlen), 1, vlen)); \
+ n = __riscv_vsra (n, 1, vlen); \
+ n = __riscv_vsub (n, EXP_BIAS, vlen); \
+ vx = U_AS_F ( \
+ __riscv_vsrl (__riscv_vsll (F_AS_U (vx), BIT_WIDTH - MAN_LEN, vlen), \
+ BIT_WIDTH - MAN_LEN, vlen)); \
+ vx = U_AS_F ( \
+ __riscv_vadd (F_AS_U (vx), (uint64_t)EXP_BIAS << MAN_LEN, vlen)); \
+ n = __riscv_vsub (n, n_adjust, vlen); \
+ VFLOAT n_flt = __riscv_vfcvt_f (n, vlen); \
+ VFLOAT B = __riscv_vfrec7 (vx, vlen); \
+ /* get 7 msb of mantissa, and left shift by 3 to get address */ \
+ VUINT ind = __riscv_vand (__riscv_vsrl (F_AS_U (vx), MAN_LEN - 10, vlen), \
+ 0x3F8, vlen); \
+ /* adjust B to be 1.0 if ind == 0 */ \
+ VBOOL adjust_B = __riscv_vmseq (ind, 0, vlen); \
+ B = __riscv_vfmerge (B, fp_posOne, adjust_B, vlen); \
+ /* finally get r = B * in_arg - 1.0 */ \
+ VFLOAT r = VFMV_VF (fp_posOne, vlen); \
+ r = __riscv_vfmsac (r, vx, B, vlen); \
+ \
+ /* Base-B log is logB(in_arg) = logB(2^n * 1/B) + logB(1 + r) \
+ // (n + log2(1/B))*logB(2) + log(1+r)*logB(e) \
+ // log2(1/B) is stored in a table \
+ // and log(1+r) is approximated by r + poly \
+ // poly is a polynomial in r in the form r^2 * (p0 + p1 r + ... ) \
+ // To deliver this result accurately, one uses logB(2) and logB(e) \
+ // with extra precision and sums the various terms in an appropriate \
+ order */ \
+ VFLOAT rsq = __riscv_vfmul (r, r, vlen); \
+ VFLOAT rcube = __riscv_vfmul (rsq, r, vlen); \
+ \
+ VFLOAT poly_right = PSTEP ( \
+ 0x1.9999998877038p-3, r, \
+ PSTEP (-0x1.555c54f8b7c6cp-3, 0x1.2499765b3c27ap-3, r, vlen), vlen); \
+ \
+ VFLOAT poly_left = PSTEP ( \
+ -0x1.000000000001cp-1, r, \
+ PSTEP (0x1.55555555555a9p-2, -0x1.fffffff2018cfp-3, r, vlen), vlen); \
+ \
+ VFLOAT poly = __riscv_vfmadd (poly_right, rcube, poly_left, vlen); \
+ poly = __riscv_vfmul (rsq, poly, vlen); \
+ /* log_e(1+r) is r + poly */ \
+ \
+ /* Load table values and get n_flt + T to be A + a */ \
+ VINT T = __riscv_vluxei64 (logD_tbl128_fixedpt, ind, vlen); \
+ VINT T_hi = __riscv_vsll (__riscv_vsra (T, 24, vlen), 24, vlen); \
+ VINT T_lo = __riscv_vsub (T, T_hi, vlen); \
+ VFLOAT T_hi_flt = __riscv_vfcvt_f (T_hi, vlen); \
+ VFLOAT A = __riscv_vfmadd (T_hi_flt, 0x1.0p-63, n_flt, vlen); \
+ VFLOAT a = __riscv_vfcvt_f (T_lo, vlen); \
+ a = __riscv_vfmul (a, 0x1.0p-63, vlen); \
+ \
+ /* Compute (A + a) * (logB_2_hi + logB_2_lo) + (r + P) * (logB_e_hi + \
+ // logB_e_lo) where B can be e, 2, or 10 */ \
+ VFLOAT delta_1 = __riscv_vfmul (A, LOGB_2_LO, vlen); \
+ delta_1 = __riscv_vfmadd (a, LOGB_2_HI, delta_1, vlen); \
+ delta_1 = __riscv_vfmacc (delta_1, LOGB_e_LO, r, vlen); \
+ poly = __riscv_vfmadd (poly, LOGB_e_HI, delta_1, vlen); \
+ \
+ poly = __riscv_vfmacc (poly, LOGB_e_HI, r, vlen); \
+ \
+ vy = __riscv_vfmadd (A, LOGB_2_HI, poly, vlen); \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_log2.c b/sysdeps/riscv/rvd/v_d_log2.c
new file mode 100644
index 0000000000..15a2f98357
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_log2.c
@@ -0,0 +1,189 @@
+/* Double-precision vector log2 function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_LOGD_VSET_CONFIG
+
+#define COMPILE_FOR_LOG2
+
+#define EXCEPTION_HANDLING_LOG(vx, special_args, vy_special, n_adjust, vlen) \
+ do \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ /* special handling except positive normal number */ \
+ IDENTIFY (vclass, 0x3BF, (special_args), (vlen)); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ n_adjust = VMVI_VX (0, (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ VBOOL id_mask; \
+ /* substitute negative arguments with sNaN */ \
+ IDENTIFY (vclass, class_negative, id_mask, (vlen)); \
+ vx = __riscv_vfmerge (vx, fp_sNaN, id_mask, vlen); \
+ /* substitute +0 argument with -0 */ \
+ IDENTIFY (vclass, class_posZero, id_mask, vlen); \
+ vx = __riscv_vfmerge (vx, fp_negZero, id_mask, vlen); \
+ /* eliminate positive denorm input from special_args */ \
+ IDENTIFY (vclass, 0x39F, (special_args), (vlen)); \
+ /* for narrowed set of special arguments, compute vx+vfrec7(vx) */ \
+ vy_special = __riscv_vfrec7 ((special_args), (vx), (vlen)); \
+ vy_special \
+ = __riscv_vfadd ((special_args), vy_special, (vx), (vlen)); \
+ vx = __riscv_vfmerge ((vx), fp_posOne, (special_args), (vlen)); \
+ /* scale up input for positive denormals */ \
+ IDENTIFY (vclass, class_posDenorm, id_mask, (vlen)); \
+ n_adjust = __riscv_vmerge (n_adjust, 64, id_mask, vlen); \
+ VFLOAT vx_normalized = __riscv_vfmul (id_mask, vx, 0x1.0p64, vlen); \
+ vx = __riscv_vmerge (vx, vx_normalized, id_mask, vlen); \
+ } \
+ } \
+ while (0)
+
+#define LOGB_2_HI 0x1.0p0
+#define LOGB_2_LO 0.0
+#define LOGB_e_HI 0x1.71547652b82fep+0
+#define LOGB_e_LO 0x1.777d0ffda0d24p-56
+
+// Version 1 uses a 128-entry LUT
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, log2) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx, vy, vy_special; \
+ VBOOL special_args; \
+ VINT n_adjust; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx = x; \
+ \
+ /* NaN, Inf, and -ve handling, as well as scaling denormal input by \
+ * 2^64 */ \
+ EXCEPTION_HANDLING_LOG (vx, special_args, vy_special, n_adjust, vlen); \
+ \
+ /* in_arg at this point are positive, finite and not subnormal \
+ // Decompose in_arg into n, B, r: in_arg = 2^n (1/B) (1 + r) \
+ // B is equivalently defined by ind, 0 <= ind < 128 */ \
+ VINT n = U_AS_I (__riscv_vadd ( \
+ __riscv_vsrl (F_AS_U (vx), MAN_LEN - 1, vlen), 1, vlen)); \
+ n = __riscv_vsra (n, 1, vlen); \
+ n = __riscv_vsub (n, EXP_BIAS, vlen); \
+ vx = U_AS_F ( \
+ __riscv_vsrl (__riscv_vsll (F_AS_U (vx), BIT_WIDTH - MAN_LEN, vlen), \
+ BIT_WIDTH - MAN_LEN, vlen)); \
+ vx = U_AS_F ( \
+ __riscv_vadd (F_AS_U (vx), (uint64_t)EXP_BIAS << MAN_LEN, vlen)); \
+ n = __riscv_vsub (n, n_adjust, vlen); \
+ VFLOAT n_flt = __riscv_vfcvt_f (n, vlen); \
+ VFLOAT B = __riscv_vfrec7 (vx, vlen); \
+ /* get 7 msb of mantissa, and left shift by 3 to get address */ \
+ VUINT ind = __riscv_vand (__riscv_vsrl (F_AS_U (vx), MAN_LEN - 10, vlen), \
+ 0x3F8, vlen); \
+ /* adjust B to be 1.0 if ind == 0 */ \
+ VBOOL adjust_B = __riscv_vmseq (ind, 0, vlen); \
+ B = __riscv_vfmerge (B, fp_posOne, adjust_B, vlen); \
+ /* finally get r = B * in_arg - 1.0 */ \
+ VFLOAT r = VFMV_VF (fp_posOne, vlen); \
+ r = __riscv_vfmsac (r, vx, B, vlen); \
+ \
+ /* Base-B log is logB(in_arg) = logB(2^n * 1/B) + logB(1 + r) \
+ // (n + log2(1/B))*logB(2) + log(1+r)*logB(e) \
+ // log2(1/B) is stored in a table \
+ // and log(1+r) is approximated by r + poly \
+ // poly is a polynomial in r in the form r^2 * (p0 + p1 r + ... ) \
+ // To deliver this result accurately, one uses logB(2) and logB(e) \
+ // with extra precision and sums the various terms in an appropriate \
+ order */ \
+ VFLOAT rsq = __riscv_vfmul (r, r, vlen); \
+ VFLOAT rcube = __riscv_vfmul (rsq, r, vlen); \
+ \
+ VFLOAT poly_right = PSTEP ( \
+ 0x1.9999998877038p-3, r, \
+ PSTEP (-0x1.555c54f8b7c6cp-3, 0x1.2499765b3c27ap-3, r, vlen), vlen); \
+ \
+ VFLOAT poly_left = PSTEP ( \
+ -0x1.000000000001cp-1, r, \
+ PSTEP (0x1.55555555555a9p-2, -0x1.fffffff2018cfp-3, r, vlen), vlen); \
+ \
+ VFLOAT poly = __riscv_vfmadd (poly_right, rcube, poly_left, vlen); \
+ poly = __riscv_vfmul (rsq, poly, vlen); \
+ /* log_e(1+r) is r + poly */ \
+ \
+ /* Load table values and get n_flt + T to be A + a */ \
+ VINT T = __riscv_vluxei64 (logD_tbl128_fixedpt, ind, vlen); \
+ VINT T_hi = __riscv_vsll (__riscv_vsra (T, 24, vlen), 24, vlen); \
+ VINT T_lo = __riscv_vsub (T, T_hi, vlen); \
+ VFLOAT T_hi_flt = __riscv_vfcvt_f (T_hi, vlen); \
+ VFLOAT A = __riscv_vfmadd (T_hi_flt, 0x1.0p-63, n_flt, vlen); \
+ VFLOAT a = __riscv_vfcvt_f (T_lo, vlen); \
+ a = __riscv_vfmul (a, 0x1.0p-63, vlen); \
+ \
+ /* Compute (A + a) * (logB_2_hi + logB_2_lo) + (r + P) * (logB_e_hi + \
+ // logB_e_lo) where B can be e, 2, or 10 */ \
+ VFLOAT delta_1 = __riscv_vfmul (A, LOGB_2_LO, vlen); \
+ delta_1 = __riscv_vfmadd (a, LOGB_2_HI, delta_1, vlen); \
+ delta_1 = __riscv_vfmacc (delta_1, LOGB_e_LO, r, vlen); \
+ poly = __riscv_vfmadd (poly, LOGB_e_HI, delta_1, vlen); \
+ \
+ poly = __riscv_vfmacc (poly, LOGB_e_HI, r, vlen); \
+ \
+ vy = __riscv_vfadd (A, poly, vlen); \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_pow.c b/sysdeps/riscv/rvd/v_d_pow.c
new file mode 100644
index 0000000000..622499856c
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_pow.c
@@ -0,0 +1,465 @@
+/* Double-precision vector pow function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_21
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_POWD_VSET_CONFIG
+
+#define EXCEPTION_HANDLING_POW(vx, vy, special_args, vz_special, vlen) \
+ VUINT vclass_x = __riscv_vfclass (vx, vlen); \
+ VUINT vclass_y = __riscv_vfclass (vy, vlen); \
+ do \
+ { \
+ /* Exception handling: handle x or y being NaN, Inf, and Zero \
+ * and replace them with 2.0 so that normal computations with them \
+ * do not raise problems. \
+ * Note that we do not call out negative x for special handling. The \
+ * normal computation essentially computes |x|^y, but identify x < 0 \
+ * later on; replacing the answer appropriately depending on whether \
+ * y is an integer (resulting in +-(|x|^y)) or not (resulting in NaN). \
+ * \
+ * In side the special argument handling, we handle 3 cases separately \
+ * x AND y both special, only x is special, and only y is special. \
+ */ \
+ \
+ VBOOL y_special, x_special; \
+ /* 0x399 is NaN/Inf/Zero */ \
+ IDENTIFY (vclass_y, 0x399, y_special, vlen); \
+ IDENTIFY (vclass_x, 0x399, x_special, vlen); \
+ \
+ special_args = __riscv_vmor (x_special, y_special, vlen); \
+ UINT nb_special_args = __riscv_vcpop (special_args, vlen); \
+ \
+ if (nb_special_args > 0) \
+ { \
+ /* Expect this to be taken rarely. We handle separately the three \
+ * mutually exclusive cases of both x and y are special, and only \
+ * one is special \
+ */ \
+ VUINT vclass_z; \
+ VBOOL id_mask; \
+ vz_special = VFMV_VF (fp_posOne, vlen); \
+ VBOOL current_cases = __riscv_vmand (x_special, y_special, vlen); \
+ if (__riscv_vcpop (current_cases, vlen) > 0) \
+ { \
+ /* x AND y are special */ \
+ \
+ /* pow(any, 0) is 1.0 */ \
+ IDENTIFY (vclass_y, class_Zero, id_mask, vlen); \
+ id_mask = __riscv_vmand (id_mask, current_cases, vlen); \
+ vy = __riscv_vfmerge (vy, fp_posOne, id_mask, vlen); \
+ vx = __riscv_vfmerge (vx, fp_posOne, id_mask, vlen); \
+ VBOOL restricted_cases \
+ = __riscv_vmandn (current_cases, id_mask, vlen); \
+ \
+ /* pow(+-Inf,+-Inf) = pow(+Inf,+-Inf), so substitue -Inf by \
+ * +Inf for x \
+ */ \
+ IDENTIFY (vclass_x, class_negInf, id_mask, vlen); \
+ id_mask = __riscv_vmand (id_mask, restricted_cases, vlen); \
+ vx = __riscv_vfmerge (vx, fp_posInf, id_mask, vlen); \
+ \
+ /* pow(0, +-Inf) = +Inf or 0. Substitute x by -Inf to mimic \
+ * log(x) */ \
+ IDENTIFY (vclass_x, class_Zero, id_mask, vlen); \
+ id_mask = __riscv_vmand (id_mask, restricted_cases, vlen); \
+ vx = __riscv_vfmerge (vx, fp_negInf, id_mask, vlen); \
+ \
+ /* multiply the substituted vx * vy that mimics y*log(x) to \
+ * some extent. This product will also generate the necessary \
+ * NaN and invalid operation signal \
+ */ \
+ vz_special = __riscv_vfmul_mu (current_cases, vz_special, vx, \
+ vy, vlen); \
+ vclass_z = __riscv_vfclass (vz_special, vlen); \
+ IDENTIFY (vclass_z, class_negInf, id_mask, vlen); \
+ id_mask = __riscv_vmand (id_mask, current_cases, vlen); \
+ vz_special \
+ = __riscv_vfmerge (vz_special, fp_posZero, id_mask, vlen); \
+ /* end of handling for BOTH x and y are special */ \
+ } \
+ \
+ current_cases = __riscv_vmandn (x_special, y_special, vlen); \
+ if (__riscv_vcpop (current_cases, vlen) > 0) \
+ { \
+ /* x only is special */ \
+ \
+ VINT sign_x = __riscv_vand (F_AS_I (vx), F_AS_I (vx), vlen); \
+ /* Here we change x that is +-Inf into +Inf, and x that is +-0 \
+ * to -Inf \
+ */ \
+ IDENTIFY (vclass_x, class_Zero, id_mask, vlen); \
+ id_mask = __riscv_vmand (id_mask, current_cases, vlen); \
+ vx = __riscv_vfmerge (vx, fp_negInf, id_mask, vlen); \
+ \
+ IDENTIFY (vclass_x, class_Inf, id_mask, vlen); \
+ id_mask = __riscv_vmand (id_mask, current_cases, vlen); \
+ vx = __riscv_vfmerge (vx, fp_posInf, id_mask, vlen); \
+ \
+ /* We need to identify whether y is of integer value and if so \
+ * its parity. We first clip y values to +-2^53, because FP \
+ * value of this magnitude and beyond are always even integers \
+ */ \
+ vy = __riscv_vfmin_mu (current_cases, vy, vy, 0x1.0p53, vlen); \
+ vy = __riscv_vfmax_mu (current_cases, vy, vy, -0x1.0p53, vlen); \
+ VINT y_to_int = __riscv_vfcvt_x (current_cases, vy, vlen); \
+ /* TODO: y_to_int_fp and y_is_int need to be used */ \
+ VFLOAT y_to_int_fp \
+ = __riscv_vfcvt_f (current_cases, y_to_int, vlen); \
+ VBOOL y_is_int \
+ = __riscv_vmfeq (current_cases, vy, y_to_int_fp, vlen); \
+ VINT sign_z = __riscv_vsll (y_to_int, 63, vlen); \
+ /* the parity is used later on to manipulate sign, hence sll 63 \
+ * bits \
+ */ \
+ \
+ /* we have set vx to mimic log(|x|), so we now compute y * \
+ * log(|x|) */ \
+ vz_special = __riscv_vfmul_mu (current_cases, vz_special, vy, \
+ vx, vlen); \
+ /* map -Inf to +0 */ \
+ vclass_z = __riscv_vfclass (vz_special, vlen); \
+ IDENTIFY (vclass_z, class_negInf, id_mask, vlen); \
+ id_mask = __riscv_vmand (id_mask, current_cases, vlen); \
+ vz_special \
+ = __riscv_vfmerge (vz_special, fp_posZero, id_mask, vlen); \
+ /* now must set the sign of vz_special for x in {Zero, Inf} and \
+ * y of integer value */ \
+ \
+ IDENTIFY (vclass_x, class_Inf | class_Zero, id_mask, vlen); \
+ id_mask = __riscv_vmand (current_cases, id_mask, vlen); \
+ VFLOAT vz_tmp \
+ = I_AS_F (__riscv_vand (id_mask, sign_x, sign_z, vlen)); \
+ vz_tmp = __riscv_vfsgnj (id_mask, vz_special, vz_tmp, vlen); \
+ vz_special \
+ = __riscv_vmerge (vz_special, vz_tmp, id_mask, vlen); \
+ } \
+ \
+ current_cases = __riscv_vmandn (y_special, x_special, vlen); \
+ if (__riscv_vcpop (current_cases, vlen) > 0) \
+ { \
+ /* y only is special */ \
+ \
+ /* Here x is finite and non-zero. But x == 1.0 is special \
+ * in that 1.0^anything is 1.0, including when y is a NaN. \
+ * Aside from this case, we need to differentiate |x| <, ==, > \
+ * 1 so as to handle y == +-Inf appropriately. \
+ */ \
+ \
+ /* If |x| == 1.0, replace y with 0.0 */ \
+ VFLOAT vz_tmp \
+ = __riscv_vfsgnj (current_cases, vx, fp_posOne, vlen); \
+ vz_tmp \
+ = __riscv_vfsub (current_cases, vz_tmp, fp_posOne, vlen); \
+ id_mask = __riscv_vmfeq (vz_tmp, fp_posZero, vlen); \
+ id_mask = __riscv_vmand (current_cases, id_mask, vlen); \
+ VBOOL id_mask2; \
+ IDENTIFY (vclass_y, class_Inf | class_Zero, id_mask2, vlen); \
+ id_mask2 = __riscv_vmand (id_mask, id_mask2, vlen); \
+ vy = __riscv_vfmerge (vy, fp_posZero, id_mask2, vlen); \
+ \
+ /* compute (|x|-1) * y yeilding the correct signed infinities \
+ */ \
+ vz_tmp = __riscv_vfmul (current_cases, vz_tmp, vy, vlen); \
+ /* except we need to set this to +0 if x == 1 (even if y is \
+ * NaN) */ \
+ id_mask = __riscv_vmfeq (vx, fp_posOne, vlen); \
+ id_mask = __riscv_vmand (current_cases, id_mask, vlen); \
+ vz_tmp = __riscv_vfmerge (vz_tmp, fp_posZero, id_mask, vlen); \
+ vz_special \
+ = __riscv_vmerge (vz_special, vz_tmp, current_cases, vlen); \
+ \
+ /* map vz_special values of -Inf to 0 and 0 to 1.0 */ \
+ vclass_z = __riscv_vfclass (vz_special, vlen); \
+ IDENTIFY (vclass_z, class_negInf, id_mask, vlen); \
+ id_mask = __riscv_vmand (current_cases, id_mask, vlen); \
+ vz_special \
+ = __riscv_vfmerge (vz_special, fp_posZero, id_mask, vlen); \
+ IDENTIFY (vclass_z, class_Zero, id_mask, vlen); \
+ id_mask = __riscv_vmand (current_cases, id_mask, vlen); \
+ vz_special \
+ = __riscv_vfmerge (vz_special, fp_posOne, id_mask, vlen); \
+ } \
+ \
+ /* finally, substitue 1.0 for x and y when either x or y is special \
+ */ \
+ vx = __riscv_vfmerge (vx, fp_posOne, special_args, vlen); \
+ vy = __riscv_vfmerge (vy, fp_posOne, special_args, vlen); \
+ } \
+ } \
+ while (0)
+
+static const double two_to_neg63 = 0x1.0p-63;
+static const uint64_t bias = 0x3ff0000000000000;
+static const int64_t round_up = 0x0008000000000000;
+static const uint64_t zero_mask_expo = 0x000fffffffffffff;
+static const int64_t mask_T_hi = 0xffffffffff000000;
+static const int64_t mask_T_lo = 0x0000000000ffffff;
+static const double two_to_63 = 0x1.0p63;
+static const double log2_inv = 0x1.71547652b82fep+0;
+static const double log2_hi = 0x1.62e42fefa39efp-1;
+static const double log2_lo = 0x1.abc9e3b39803fp-56;
+static const double log2_inv_hi = 0x1.71547652b82fep+0;
+static const double log2_inv_lo = 0x1.777d0ffda0d24p-56;
+static const double two_to_65 = 0x1.0p65;
+static const double negtwo_to_65 = -0x1.0p65;
+
+// Version 1 is reduction to standard primary interval.
+// Reduced argument is represented as one FP64 variable.
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D2 (lmul, simdlen, pow) (VFLOAT x, VFLOAT y) \
+ { \
+ size_t vlen; \
+ VFLOAT vx, vy, vz, vz_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx = x; \
+ vy = y; \
+ \
+ /* Set results when one of the inputs is NaN/Inf/Zero */ \
+ EXCEPTION_HANDLING_POW (vx, vy, special_args, vz_special, vlen); \
+ \
+ /* Normal computations. Here, both x and y are finite and non-zero. \
+ We compute 2^( y log_2(x) ) on the high level. But when x < 0, \
+ we must handle the cases when y is of integer value, making x^y well \
+ defined. So in essence, we try to compute 2^(y log_2(|x|)) and then \
+ figure out if one should replace this with NaN, or accept this \
+ numerical result with the possible flipping of its sign (x is negative \
+ and y is an odd integer). */ \
+ \
+ /* Decompose in_arg into n, B, r */ \
+ VINT n_adjust, sign_x; \
+ VBOOL id_mask; \
+ n_adjust = __riscv_vxor (n_adjust, n_adjust, vlen); \
+ sign_x = __riscv_vxor (sign_x, sign_x, vlen); \
+ sign_x = F_AS_I (__riscv_vfsgnj (I_AS_F (sign_x), vx, vlen)); \
+ vx = __riscv_vfsgnjx (vx, vx, vlen); \
+ IDENTIFY (vclass_x, class_Denorm, id_mask, vlen); \
+ vx = __riscv_vfmul_mu (id_mask, vx, vx, 0x1.0p65, vlen); \
+ n_adjust = __riscv_vmerge (n_adjust, 65, id_mask, vlen); \
+ \
+ VINT n = __riscv_vadd (F_AS_I (vx), round_up, vlen); \
+ n = __riscv_vsub (n, bias, vlen); \
+ n = __riscv_vsra (n, 52, vlen); \
+ n = __riscv_vsub (n, n_adjust, vlen); \
+ \
+ VFLOAT A = __riscv_vfcvt_f (n, vlen); \
+ \
+ /* To get frec7(X) suffices to get frecp7 of X with its exponent field \
+ set \
+ // to bias The main reason for this step is that should the exponent of X \
+ be \
+ // the largest finite exponent, frec7(X) will be subnormal and carry less \
+ // precision. Moreover, we need to get the 7 mantissa bits of X for table \
+ // lookup later on */ \
+ VUINT ind = __riscv_vand (F_AS_U (vx), zero_mask_expo, vlen); \
+ \
+ /* normalize exponent of vx */ \
+ vx = U_AS_F (__riscv_vor (ind, bias, vlen)); \
+ VFLOAT B = __riscv_vfrec7 (vx, vlen); \
+ ind = __riscv_vsrl (ind, 45, vlen); /* 7 leading mantissa bit */ \
+ ind = __riscv_vsll (ind, 4, vlen); /* left shifted 4 (16-byte table) */ \
+ \
+ /* adjust B to be 1.0 if ind == 0 */ \
+ VBOOL adjust_B = __riscv_vmseq (ind, 0, vlen); \
+ B = __riscv_vfmerge (B, fp_posOne, adjust_B, vlen); \
+ VFLOAT r = VFMV_VF (fp_posOne, vlen); \
+ r = __riscv_vfmsac (r, vx, B, vlen); \
+ \
+ /* with A = n in float format, r, and ind we can carry out floating-point \
+ // computations (A + T) + log_e(1+r) * (1/log_e(2)) compute log_e(1+r) \
+ by a \
+ // polynomial approximation. To obtian an accurate pow(x,y) in the end, \
+ we \
+ // must obtain at least 10 extra bits of precision over FP64. So \
+ log_e(1+r) \
+ // is approximated by a degree-9 polynomial r - r^2/2 + r^3[ (p6 + r p5 + \
+ // r^2 p4 ) + r^3 (p3 + r p2 + r^2 p1 + r^3 p0) ] r - r^2/2 + poly; and \
+ // furthermore, r - r^2/2 is computed as P + p, 1/log(2) is stored as \
+ // log2_inv_hi, log2_inv_lo, and T is broken into T_hi, T_lo So, we need \
+ (A \
+ // + T_hi) + log2_inv_hi * P + log2_inv_hi * poly + T_lo + log2_inv_lo*P \
+ // Note that log_2(|x|) needs be to represented in 2 FP64 variables as \
+ // we need to have log_2(|x|) in extra precision. \
+ // */ \
+ VFLOAT rcube = __riscv_vfmul (r, r, vlen); \
+ rcube = __riscv_vfmul (rcube, r, vlen); \
+ \
+ VFLOAT poly_right = PSTEP ( \
+ -0x1.555555483d731p-3, r, \
+ PSTEP (0x1.2492453584b8ep-3, r, \
+ PSTEP (-0x1.0005fa6ef2342p-3, 0x1.c7fe32d120e6bp-4, r, vlen), \
+ vlen), \
+ vlen); \
+ \
+ VFLOAT poly_left = PSTEP ( \
+ 0x1.5555555555555p-2, r, \
+ PSTEP (-0x1.000000000003cp-2, 0x1.99999999a520ep-3, r, vlen), vlen); \
+ \
+ VFLOAT poly = __riscv_vfmadd (poly_right, rcube, poly_left, vlen); \
+ /* poly is (p6 + r p5 + r^2 p4 ) + r^3 (p3 + r p2 + r^2 p1 + r^3 p0) */ \
+ \
+ VFLOAT r_prime = __riscv_vfmul (r, -0x1.0p-1, vlen); /* exact product */ \
+ VFLOAT P = __riscv_vfmadd (r_prime, r, r, vlen); \
+ VFLOAT p = __riscv_vfsub (r, P, vlen); \
+ p = __riscv_vfmacc (p, r_prime, r, vlen); \
+ /* P + p is r - r^2/2 to extra precision */ \
+ poly = __riscv_vfmadd (poly, rcube, p, vlen); \
+ /* Now P + poly is log_e(1+r) to extra precision */ \
+ \
+ /* Load table values and get n_flt + T to be A + a */ \
+ VFLOAT T_hi_flt = __riscv_vluxei64 (logtbl_4_powD_128_hi_lo, ind, vlen); \
+ ind = __riscv_vadd (ind, 8, vlen); \
+ VFLOAT T_lo_flt = __riscv_vluxei64 (logtbl_4_powD_128_hi_lo, ind, vlen); \
+ \
+ A = __riscv_vfadd (A, T_hi_flt, vlen); \
+ /* (A + T_hi) + log2_inv_hi * P + log2_inv_hi * poly + log2_inv_lo*P + \
+ T_lo \
+ // is log2(|x|) to extra precision */ \
+ VFLOAT log2x_hi = __riscv_vfmadd (P, log2_inv_hi, A, vlen); \
+ VFLOAT log2x_lo = __riscv_vfsub (A, log2x_hi, vlen); \
+ log2x_lo = __riscv_vfmacc (log2x_lo, log2_inv_hi, P, vlen); \
+ \
+ T_lo_flt = __riscv_vfmacc (T_lo_flt, log2_inv_lo, P, vlen); \
+ log2x_lo = __riscv_vfadd (log2x_lo, T_lo_flt, vlen); \
+ log2x_lo = __riscv_vfmacc (log2x_lo, log2_inv_hi, poly, vlen); \
+ VFLOAT log2x = __riscv_vfadd (log2x_hi, log2x_lo, vlen); \
+ T_lo_flt = __riscv_vfsub (log2x_hi, log2x, vlen); \
+ log2x_lo = __riscv_vfadd (T_lo_flt, log2x_lo, vlen); \
+ /* log2x + log2x_lo is log2(|x|) to extra precision */ \
+ \
+ /* The final stage involves computing 2^(y * log2x) */ \
+ VFLOAT vy_tmp = __riscv_vfmin (vy, 0x1.0p53, vlen); \
+ vy_tmp = __riscv_vfmax (vy_tmp, -0x1.0p53, vlen); \
+ VINT y_to_int = __riscv_vfcvt_x (vy_tmp, vlen); \
+ VFLOAT vy_rnd_int = __riscv_vfcvt_f (y_to_int, vlen); \
+ VBOOL y_is_int = __riscv_vmfeq (vy_tmp, vy_rnd_int, vlen); \
+ y_to_int = __riscv_vsll (y_to_int, 63, vlen); \
+ /* if y is of integer value, y_to_int is the parity of y in the sign bit \
+ // position To compute y * (log2x + log2x_lo) we first clip y to +-2^65 \
+ */ \
+ vy = __riscv_vfmin (vy, two_to_65, vlen); \
+ vy = __riscv_vfmax (vy, negtwo_to_65, vlen); \
+ vy_tmp = __riscv_vfmul (vy, log2x, vlen); \
+ r = __riscv_vfmsub (vy, log2x, vy_tmp, vlen); \
+ r = __riscv_vfmacc (r, vy, log2x_lo, vlen); \
+ /* vy_tmp + r is the product, clip at +-1100 */ \
+ vy_tmp = __riscv_vfmin (vy_tmp, 0x1.13p10, vlen); \
+ vy_tmp = __riscv_vfmax (vy_tmp, -0x1.13p10, vlen); \
+ r = __riscv_vfmin (r, 0x1.0p-35, vlen); \
+ r = __riscv_vfmax (r, -0x1.0p-35, vlen); \
+ \
+ /* Argument reduction */ \
+ VFLOAT n_flt = __riscv_vfmul (vy_tmp, 0x1.0p6, vlen); \
+ n = __riscv_vfcvt_x (n_flt, vlen); \
+ n_flt = __riscv_vfcvt_f (n, vlen); \
+ \
+ vy_tmp = __riscv_vfnmsac (vy_tmp, 0x1.0p-6, n_flt, vlen); \
+ r = __riscv_vfadd (vy_tmp, r, vlen); \
+ r = __riscv_vfmul (r, log2_hi, vlen); \
+ \
+ /* Polynomial computation, we have a degree 5 \
+ // We break this up into 2 pieces \
+ // Ideally the compiler will interleave the computations of the segments \
+ */ \
+ poly_right = PSTEP (0x1.5555722e87735p-5, 0x1.1107f5fc29bb7p-7, r, vlen); \
+ poly_left = PSTEP (0x1.fffffffffe1f5p-2, 0x1.55555556582a8p-3, r, vlen); \
+ \
+ VFLOAT r_sq = __riscv_vfmul (r, r, vlen); \
+ poly = __riscv_vfmadd (poly_right, r_sq, poly_left, vlen); \
+ \
+ poly = __riscv_vfmadd (poly, r_sq, r, vlen); \
+ poly = __riscv_vfmul (poly, two_to_63, vlen); \
+ VINT P_fixedpt = __riscv_vfcvt_x (poly, vlen); \
+ \
+ VINT j = __riscv_vand (n, 0x3f, vlen); \
+ j = __riscv_vsll (j, 3, vlen); \
+ VINT T = __riscv_vluxei64 (expD_tbl64_fixedpt, I_AS_U (j), vlen); \
+ \
+ P_fixedpt = __riscv_vsmul (P_fixedpt, T, 1, vlen); \
+ P_fixedpt = __riscv_vsadd (P_fixedpt, T, vlen); \
+ vz = __riscv_vfcvt_f (P_fixedpt, vlen); \
+ /* at this point, vz ~=~ 2^62 * exp(r) */ \
+ \
+ n = __riscv_vsra (n, 6, vlen); \
+ /* Need to compute 2^(n-62) * exp(r). \
+ // Although most of the time, it suffices to add n to the exponent field \
+ of \
+ // exp(r) this will fail n is just a bit too positive or negative, \
+ // corresponding to 2^n * exp(r) causing over or underflow. So we have to \
+ // decompose n into n1 + n2 where n1 = n >> 1 2^n1 * exp(r) can be \
+ // performed by adding n to exp(r)'s exponent field But we need to create \
+ // the floating point value scale = 2^n2 and perform a multiplication to \
+ // finish the task. */ \
+ \
+ n = __riscv_vsub (n, 62, vlen); \
+ FAST_LDEXP (vz, n, vlen); \
+ \
+ VBOOL invalid = __riscv_vmsne (sign_x, 0, vlen); \
+ invalid = __riscv_vmandn (invalid, y_is_int, vlen); \
+ vz = __riscv_vfmerge (vz, fp_sNaN, invalid, vlen); \
+ vz = __riscv_vfadd (vz, fp_posZero, vlen); \
+ \
+ sign_x = __riscv_vand (sign_x, y_to_int, vlen); \
+ vz = __riscv_vfsgnj_mu (y_is_int, vz, vz, I_AS_F (sign_x), vlen); \
+ \
+ vz = __riscv_vmerge (vz, vz_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vz; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_sin.c b/sysdeps/riscv/rvd/v_d_sin.c
new file mode 100644
index 0000000000..925fbf89f1
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_sin.c
@@ -0,0 +1,203 @@
+/* Double-precision vector sin function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_SIND_VSET_CONFIG
+
+#define COMPILE_FOR_SIN
+#include "rvvlm_trigD.h"
+
+// This versions reduces argument to [-pi/4, pi/4] and computes sin(r) or
+// cos(r) by merging the appropriate coefficients into a vector register
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, sin) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx_orig, vx, vy, vy_special; \
+ VBOOL special_args; \
+ VUINT expo_x; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx_orig = x; \
+ vx = __riscv_vfsgnj (vx_orig, fp_posOne, vlen); \
+ expo_x = __riscv_vsrl (F_AS_U (vx), MAN_LEN, vlen); \
+ \
+ /* Set results for input of NaN and Inf and also for |x| very small */ \
+ EXCEPTION_HANDLING_TRIG (vx_orig, expo_x, special_args, vy_special, \
+ vlen); \
+ \
+ VBOOL x_large \
+ = __riscv_vmsgeu (expo_x, EXP_BIAS + 24, vlen); /* |x| >= 2^(24) */ \
+ VFLOAT vx_copy = vx; \
+ vx = __riscv_vfmerge (vx, fp_posZero, x_large, vlen); \
+ \
+ VFLOAT n_flt = __riscv_vfmul (vx, PIBY2_INV, vlen); \
+ VINT n = __riscv_vfcvt_x (n_flt, vlen); \
+ n_flt = __riscv_vfcvt_f (n, vlen); \
+ VFLOAT r_hi = __riscv_vfnmsac (vx, PIBY2_HI, n_flt, vlen); \
+ VUINT expo_r = __riscv_vsrl (F_AS_U (r_hi), MAN_LEN, vlen); \
+ expo_r = __riscv_vand (expo_r, 0x7FF, vlen); \
+ VBOOL r_small = __riscv_vmsleu (expo_r, EXP_BIAS - 16, \
+ vlen); /* |r_hi| < 2^(-15) */ \
+ UINT nb_r_small = __riscv_vcpop (r_small, vlen); \
+ VFLOAT r = __riscv_vfnmsac (r_hi, PIBY2_MID, n_flt, vlen); \
+ VFLOAT r_delta = __riscv_vfsub (r_hi, r, vlen); \
+ r_delta = __riscv_vfnmsac (r_delta, PIBY2_MID, n_flt, vlen); \
+ /* At this point, r + r_delta is an accurate reduced argument PROVIDED \
+ // |r_hi| >= 2^(-15) */ \
+ if (nb_r_small > 0) \
+ { \
+ VFLOAT A = __riscv_vfmul (n_flt, PIBY2_MID, vlen); \
+ VFLOAT a = __riscv_vfmsub (n_flt, PIBY2_MID, A, vlen); \
+ /* A + a is n * piby2_mid exactly */ \
+ VFLOAT S = __riscv_vfsub (r_hi, A, vlen); \
+ VFLOAT s = __riscv_vfsub (r_hi, S, vlen); \
+ s = __riscv_vfsub (s, A, vlen); \
+ s = __riscv_vfnmsac (s, PIBY2_LO, n_flt, vlen); \
+ r = __riscv_vmerge (r, S, r_small, vlen); \
+ r_delta = __riscv_vmerge (r_delta, s, r_small, vlen); \
+ } \
+ \
+ if (__riscv_vcpop (x_large, vlen) > 0) \
+ { \
+ VFLOAT r_xlarge, r_delta_xlarge; \
+ VINT n_xlarge; \
+ LARGE_ARGUMENT_REDUCTION_Piby2 (vx_copy, vlen, x_large, n_xlarge, \
+ r_xlarge, r_delta_xlarge); \
+ r = __riscv_vmerge (r, r_xlarge, x_large, vlen); \
+ r_delta = __riscv_vmerge (r_delta, r_delta_xlarge, x_large, vlen); \
+ n = __riscv_vmerge (n, n_xlarge, x_large, vlen); \
+ } \
+ \
+ VUINT n_lsb = __riscv_vand (I_AS_U (n), 0x1, vlen); \
+ VBOOL pick_c = __riscv_vmsne (n_lsb, 0, vlen); \
+ \
+ /* Instead of always computing both sin(r) and cos(r) for |r| <= pi/4 \
+ // We merge the sin and cos case together in picking the correct \
+ // polynomial coefficients. This way we save on the bulk of the poly \
+ // computation except for a couple of terms. \
+ \ \
+ // This standard algorithm either computes sin(r+r_delta) or \
+ // cos(r+r_delta), depending on the parity of n \
+ // Note that sin(t) = t + t^3(s_poly(t^2)) \
+ // and cos(t) = 1 - t^2/2 + t^4(c_poly(t^2)) \
+ // where s_poly and c_poly are of the same degree. Hence \
+ // it suffices to load the coefficient vector with the correct \
+ // coefficients for s_poly or c_poly. We compute the needed s_poly or \
+ c_poly \
+ // without wasteful operations. (That is, computing s_poly for all r \
+ // and c_poly for all r and in general discarding half of these results.) \
+ // */ \
+ \
+ /* sin(r+r_delta) ~=~ sin(r) + r_delta(1 - r^2/2) \
+ // sin(r) is approximated by 7 terms, starting from x, x^3, ..., x^13 \
+ // cos(r+r_delta) ~=~ cos(r) - r * r_delta \
+ // */ \
+ VFLOAT rsq, rcube, r_to_6, s_corr, c_corr, r_prime, One, C; \
+ One = VFMV_VF (fp_posOne, vlen); \
+ rsq = __riscv_vfmul (r, r, vlen); \
+ rcube = __riscv_vfmul (rsq, r, vlen); \
+ r_to_6 = __riscv_vfmul (rcube, rcube, vlen); \
+ \
+ r_prime = __riscv_vfmul (r, -0x1.0p-1, vlen); \
+ C = __riscv_vfmacc (One, r_prime, r, vlen); \
+ s_corr = __riscv_vfmul (r_delta, C, vlen); \
+ \
+ c_corr = __riscv_vfsub (One, C, vlen); \
+ c_corr = __riscv_vfmacc (c_corr, r, r_prime, vlen); \
+ c_corr = __riscv_vfnmsac (c_corr, r, r_delta, vlen); \
+ \
+ VFLOAT poly_right = VFMV_VF (0x1.5d8b5ae12066ap-33, vlen); \
+ poly_right \
+ = __riscv_vfmerge (poly_right, -0x1.8f5dd75850673p-37, pick_c, vlen); \
+ poly_right = PSTEP_ab ( \
+ pick_c, -0x1.27e4f72551e3dp-22, 0x1.71de35553ddb6p-19, rsq, \
+ PSTEP_ab (pick_c, 0x1.1ee950032f74cp-29, -0x1.ae5e4b94836f8p-26, rsq, \
+ poly_right, vlen), \
+ vlen); \
+ \
+ VFLOAT poly_left = VFMV_VF (-0x1.a01a019be932ap-13, vlen); \
+ poly_left \
+ = __riscv_vfmerge (poly_left, 0x1.a01a019b77545p-16, pick_c, vlen); \
+ poly_left \
+ = PSTEP_ab (pick_c, 0x1.5555555555546p-5, -0x1.5555555555548p-3, rsq, \
+ PSTEP_ab (pick_c, -0x1.6c16c16c1450cp-10, \
+ 0x1.111111110f730p-7, rsq, poly_left, vlen), \
+ vlen); \
+ \
+ poly_right = __riscv_vfmadd (poly_right, r_to_6, poly_left, vlen); \
+ \
+ VFLOAT t = __riscv_vfmul (rsq, rsq, vlen); \
+ t = __riscv_vmerge (rcube, t, pick_c, vlen); \
+ /* t is r^3 for sin(r) and r^4 for cos(r) */ \
+ \
+ VFLOAT A = __riscv_vmerge (r, C, pick_c, vlen); \
+ VFLOAT a = __riscv_vmerge (s_corr, c_corr, pick_c, vlen); \
+ vy = __riscv_vfmadd (poly_right, t, a, vlen); \
+ vy = __riscv_vfadd (A, vy, vlen); \
+ \
+ n = __riscv_vsll (n, BIT_WIDTH - 2, vlen); \
+ vy = __riscv_vfsgnjx (vy, I_AS_F (n), vlen); \
+ \
+ vy = __riscv_vfsgnjx (vy, vx_orig, vlen); \
+ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_sinh.c b/sysdeps/riscv/rvd/v_d_sinh.c
new file mode 100644
index 0000000000..743f8e4431
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_sinh.c
@@ -0,0 +1,189 @@
+/* Double-precision vector sinh function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_COSHD_VSET_CONFIG
+
+#define COMPILE_FOR_SINH
+#include "rvvlm_hyperbolicsD.h"
+
+// This versions reduces argument to [-log2/2, log2/2]
+// Exploit common expressions exp(R) and exp(-R), and uses purely
+// floating point method to preserve precision
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, sinh) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx_orig, vx, vy, vy_special; \
+ VBOOL special_args; \
+ VUINT expo_x; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx_orig = x; \
+ expo_x = __riscv_vand (__riscv_vsrl (F_AS_U (vx_orig), MAN_LEN, vlen), \
+ 0x7FF, vlen); \
+ \
+ /* Set results for input of NaN and Inf and also for |x| very small */ \
+ EXCEPTION_HANDLING_HYPER (vx_orig, expo_x, special_args, vy_special, \
+ vlen); \
+ \
+ /* Both sinh and cosh have sign symmetry; suffices to work on |x|. \
+ // For sinh(x) = sign(x) * sinh(|x|) and cosh(x) = cosh(|x|).*/ \
+ vx = __riscv_vfsgnj (vx_orig, fp_posOne, vlen); \
+ \
+ /* Suffices to clip |x| to 714.0, which is bigger than 1030 log(2) */ \
+ vx = __riscv_vfmin (vx, 0x1.65p9, vlen); \
+ VINT n; \
+ VFLOAT r, r_delta; \
+ ARGUMENT_REDUCTION (vx, n, r, r_delta, vlen); \
+ \
+ /* At this point exp(x) = 2^n exp(r'), where r' = r + delta_r \
+ // sinh(x) or cosh(x) is 2^(n-1) ( exp(r') -/+ 2^(-2n) exp(-r') ) \
+ // Note that n >= 0. Moreover, the factor 2^(-2n) can be replaced by \
+ // s = 2^(-m), m = min(2n, 60) \
+ // sinh(x) / cosh(x) = 2^(n-1)(exp(r') -/+ s exp(-r')) \
+ \ \
+ // exp(r') and exp(-r') will be computed purely in floating point \
+ // using extra-precision simulation when needed \
+ // Note exp(t) is approximated by \
+ // 1 + t + t^2/2 + t^3(p_even(t^2) + t*p_odd(t^2)) \
+ // and thus exp(-t) is approximated \
+ // 1 - t + t^2/2 - t^3(p_even(t^2) - t*p_odd(t^2)) \
+ // So we compute the common expressions p_even and p_odd separately. \
+ // Moreover, they can be evaluated as r*r alone, not needing r_delta \
+ // because they are at least a factor of (log(2)/2)^2/6 smaller than the \
+ // final result of interest. */ \
+ \
+ VFLOAT rsq = __riscv_vfmul (r, r, vlen); \
+ VFLOAT rcube = __riscv_vfmul (rsq, r, vlen); \
+ \
+ VFLOAT p_even \
+ = PSTEP (0x1.555555555555ap-3, rsq, \
+ PSTEP (0x1.111111110ef6ap-7, rsq, \
+ PSTEP (0x1.a01a01b32b633p-13, rsq, \
+ PSTEP (0x1.71ddef82f4beep-19, \
+ 0x1.af6eacd796f0bp-26, rsq, vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VFLOAT p_odd = PSTEP (0x1.5555555553aefp-5, rsq, \
+ PSTEP (0x1.6c16c17a09506p-10, rsq, \
+ PSTEP (0x1.a019b37a2b3dfp-16, \
+ 0x1.289788d8bdadfp-22, rsq, vlen), \
+ vlen), \
+ vlen); \
+ \
+ VFLOAT p_pos = __riscv_vfmadd (p_odd, r, p_even, vlen); \
+ VFLOAT p_neg = __riscv_vfnmsub (p_odd, r, p_even, vlen); \
+ p_pos = __riscv_vfmul (p_pos, rcube, vlen); \
+ p_neg = __riscv_vfmul (p_neg, rcube, vlen); \
+ \
+ /* exp( r') is approximated by 1 + r' + (r')^2/2 + p_pos */ \
+ /* exp(-r') is approximated by 1 - r' + (r')^2/2 - p_neg */ \
+ \
+ VINT m = __riscv_vmin (__riscv_vadd (n, n, vlen), 60, vlen); \
+ VFLOAT s = U_AS_F (__riscv_vsll ( \
+ I_AS_U (__riscv_vrsub (m, EXP_BIAS, vlen)), MAN_LEN, vlen)); \
+ VFLOAT poly = __riscv_vfmacc (p_pos, s, p_neg, vlen); \
+ /* sinh / cosh = (1 -/+ s) + ([r' + (r'2)^2/2] +/- s [r' - (r')^2/2]) + \
+ poly \
+ // We need r' +/- (r')^2/2 and their sum/diff to high precision \
+ // and 1 -/+ s to high precision */ \
+ VFLOAT r_half = __riscv_vfmul (r, 0x1.0p-1, vlen); \
+ VFLOAT B_plus = __riscv_vfmadd (r, r_half, r, vlen); \
+ VFLOAT b_plus \
+ = __riscv_vfmacc (__riscv_vfsub (r, B_plus, vlen), r, r_half, vlen); \
+ VFLOAT delta_b_plus = __riscv_vfmadd (r, r_delta, r_delta, vlen); \
+ b_plus = __riscv_vfadd (b_plus, delta_b_plus, vlen); \
+ VFLOAT B_minus = __riscv_vfnmsub (r, r_half, r, vlen); \
+ VFLOAT b_minus = __riscv_vfnmsac (__riscv_vfsub (r, B_minus, vlen), r, \
+ r_half, vlen); \
+ VFLOAT delta_b_minus = __riscv_vfnmsub (r, r_delta, r_delta, vlen); \
+ b_minus = __riscv_vfadd (b_minus, delta_b_minus, vlen); \
+ VFLOAT B = __riscv_vfmadd (B_minus, s, B_plus, vlen); \
+ VFLOAT b \
+ = __riscv_vfmacc (__riscv_vfsub (B_plus, B, vlen), s, B_minus, vlen); \
+ b = __riscv_vfadd (b, __riscv_vfmadd (b_minus, s, b_plus, vlen), vlen); \
+ VBOOL n_large = __riscv_vmsge (n, 50, vlen); \
+ VFLOAT s_hi = s; \
+ VFLOAT s_lo; \
+ s_lo = U_AS_F (__riscv_vxor (F_AS_U (s_lo), F_AS_U (s_lo), vlen)); \
+ s_hi = __riscv_vfmerge (s_hi, fp_posZero, n_large, vlen); \
+ s_lo = __riscv_vmerge (s_lo, s, n_large, vlen); \
+ VFLOAT A = __riscv_vfrsub (s_hi, fp_posOne, vlen); \
+ s_lo = __riscv_vfsgnjn (s_lo, s_lo, vlen); \
+ b = __riscv_vfadd (b, s_lo, vlen); \
+ VFLOAT Z_hi, Z_lo; \
+ FAST2SUM (B, poly, Z_hi, Z_lo, vlen); \
+ b = __riscv_vfadd (b, Z_lo, vlen); \
+ B = Z_hi; \
+ FAST2SUM (A, B, Z_hi, Z_lo, vlen); \
+ b = __riscv_vfadd (b, Z_lo, vlen); \
+ vy = __riscv_vfadd (Z_hi, b, vlen); \
+ \
+ /* scale vy by 2^(n-1) */ \
+ n = __riscv_vsub (n, 1, vlen); \
+ FAST_LDEXP (vy, n, vlen); \
+ \
+ vy = __riscv_vfsgnj (vy, vx_orig, vlen); \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_sinpi.c b/sysdeps/riscv/rvd/v_d_sinpi.c
new file mode 100644
index 0000000000..c409f7fa91
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_sinpi.c
@@ -0,0 +1,182 @@
+/* Double-precision vector sinpi function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_COSPID_VSET_CONFIG
+
+#define COMPILE_FOR_SINPI
+#include "rvvlm_trigD.h"
+
+// This versions reduces argument to [-pi/4, pi/4] and computes sin(r) or
+// cos(r) by merging the appropriate coefficients into a vector register
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, sinpi) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx_orig, vx, vy, vy_special; \
+ VBOOL special_args; \
+ VUINT expo_x; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx_orig = x; \
+ vx = __riscv_vfsgnj (vx_orig, fp_posOne, vlen); \
+ expo_x = __riscv_vsrl (F_AS_U (vx), MAN_LEN, vlen); \
+ \
+ /* Set results for input of NaN and Inf and also for |x| very small */ \
+ EXCEPTION_HANDLING_TRIG (vx_orig, expo_x, special_args, vy_special, \
+ vlen); \
+ \
+ VBOOL x_large \
+ = __riscv_vmsgeu (expo_x, EXP_BIAS + 53, vlen); /* |x| >= 2^(53) */ \
+ vx = __riscv_vfmerge (vx, fp_posZero, x_large, vlen); \
+ \
+ /* Usual argument reduction \
+ // N = rint(2x); rem := 2x - N, |rem| <= 1/2 and x = (N/2) + (rem/2); \
+ // x pi = N (pi/2) + rem * (pi/2) */ \
+ VFLOAT two_x = __riscv_vfadd (vx, vx, vlen); \
+ VINT n = __riscv_vfcvt_x (two_x, vlen); \
+ VFLOAT n_flt = __riscv_vfcvt_f (n, vlen); \
+ VFLOAT rem = __riscv_vfsub (two_x, n_flt, vlen); \
+ VBOOL x_is_n_piby2 = __riscv_vmseq (F_AS_U (rem), 0, vlen); \
+ /* Now rem * pi_by_2 as r + r_delta */ \
+ VFLOAT r = __riscv_vfmul (rem, PIBY2_HI, vlen); \
+ VFLOAT r_delta = __riscv_vfmsac (r, PIBY2_HI, rem, vlen); \
+ r_delta = __riscv_vfmacc (r_delta, PIBY2_MID, rem, vlen); \
+ /* At this point, r + r_delta is an accurate reduced argument PROVIDED */ \
+ \
+ VUINT n_lsb = __riscv_vand (I_AS_U (n), 0x1, vlen); \
+ VBOOL pick_c = __riscv_vmsne (n_lsb, 0, vlen); \
+ \
+ VBOOL exact_zero = __riscv_vmandn (x_is_n_piby2, pick_c, vlen); \
+ \
+ /* Instead of always computing both sin(r) and cos(r) for |r| <= pi/4 \
+ // We merge the sin and cos case together in picking the correct \
+ // polynomial coefficients. This way we save on the bulk of the poly \
+ // computation except for a couple of terms. \
+ \ \
+ // This standard algorithm either computes sin(r+r_delta) or \
+ // cos(r+r_delta), depending on the parity of n \
+ // Note that sin(t) = t + t^3(s_poly(t^2)) \
+ // and cos(t) = 1 - t^2/2 + t^4(c_poly(t^2)) \
+ // where s_poly and c_poly are of the same degree. Hence \
+ // it suffices to load the coefficient vector with the correct \
+ // coefficients for s_poly or c_poly. We compute the needed s_poly or \
+ c_poly \
+ // without wasteful operations. (That is, computing s_poly for all r \
+ // and c_poly for all r and in general discarding half of these results.) \
+ // \
+ \ \
+ // sin(r+r_delta) ~=~ sin(r) + r_delta(1 - r^2/2) \
+ // sin(r) is approximated by 7 terms, starting from x, x^3, ..., x^13 \
+ // cos(r+r_delta) ~=~ cos(r) - r * r_delta \
+ // */ \
+ VFLOAT rsq, rcube, r_to_6, s_corr, c_corr, r_prime, One, C; \
+ One = VFMV_VF (fp_posOne, vlen); \
+ rsq = __riscv_vfmul (r, r, vlen); \
+ rcube = __riscv_vfmul (rsq, r, vlen); \
+ r_to_6 = __riscv_vfmul (rcube, rcube, vlen); \
+ \
+ r_prime = __riscv_vfmul (r, -0x1.0p-1, vlen); \
+ C = __riscv_vfmacc (One, r_prime, r, vlen); \
+ s_corr = __riscv_vfmul (r_delta, C, vlen); \
+ \
+ c_corr = __riscv_vfsub (One, C, vlen); \
+ c_corr = __riscv_vfmacc (c_corr, r, r_prime, vlen); \
+ c_corr = __riscv_vfnmsac (c_corr, r, r_delta, vlen); \
+ \
+ VFLOAT poly_right = VFMV_VF (0x1.5d8b5ae12066ap-33, vlen); \
+ poly_right \
+ = __riscv_vfmerge (poly_right, -0x1.8f5dd75850673p-37, pick_c, vlen); \
+ poly_right = PSTEP_ab ( \
+ pick_c, -0x1.27e4f72551e3dp-22, 0x1.71de35553ddb6p-19, rsq, \
+ PSTEP_ab (pick_c, 0x1.1ee950032f74cp-29, -0x1.ae5e4b94836f8p-26, rsq, \
+ poly_right, vlen), \
+ vlen); \
+ \
+ VFLOAT poly_left = VFMV_VF (-0x1.a01a019be932ap-13, vlen); \
+ poly_left \
+ = __riscv_vfmerge (poly_left, 0x1.a01a019b77545p-16, pick_c, vlen); \
+ poly_left \
+ = PSTEP_ab (pick_c, 0x1.5555555555546p-5, -0x1.5555555555548p-3, rsq, \
+ PSTEP_ab (pick_c, -0x1.6c16c16c1450cp-10, \
+ 0x1.111111110f730p-7, rsq, poly_left, vlen), \
+ vlen); \
+ \
+ poly_right = __riscv_vfmadd (poly_right, r_to_6, poly_left, vlen); \
+ \
+ VFLOAT t = __riscv_vfmul (rsq, rsq, vlen); \
+ t = __riscv_vmerge (rcube, t, pick_c, vlen); \
+ /* t is r^3 for sin(r) and r^4 for cos(r) */ \
+ \
+ VFLOAT A = __riscv_vmerge (r, C, pick_c, vlen); \
+ VFLOAT a = __riscv_vmerge (s_corr, c_corr, pick_c, vlen); \
+ vy = __riscv_vfmadd (poly_right, t, a, vlen); \
+ vy = __riscv_vfadd (A, vy, vlen); \
+ \
+ n = __riscv_vsll (n, BIT_WIDTH - 2, vlen); \
+ vy = __riscv_vfsgnjx (vy, I_AS_F (n), vlen); \
+ \
+ vy = __riscv_vmerge (vy, VFMV_VF (fp_posZero, vlen), exact_zero, vlen); \
+ \
+ vy = __riscv_vfsgnjx (vy, vx_orig, vlen); \
+ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_tan.c b/sysdeps/riscv/rvd/v_d_tan.c
new file mode 100644
index 0000000000..d4d03b4d27
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_tan.c
@@ -0,0 +1,268 @@
+/* Double-precision vector tan function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_TAND_VSET_CONFIG
+
+#define COMPILE_FOR_TAN
+#include "rvvlm_trigD.h"
+
+// This versions reduces argument to [-pi/4, pi/4] and computes sin(r) or
+// cos(r) tan(x) is either sin(r)/cos(r) or -cos(r)/sin(r)
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, tan) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx_orig, vx, vy, vy_special; \
+ VBOOL special_args; \
+ VUINT expo_x; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx_orig = x; \
+ vx = __riscv_vfsgnj (vx_orig, fp_posOne, vlen); \
+ expo_x = __riscv_vsrl (F_AS_U (vx), MAN_LEN, vlen); \
+ \
+ /* Set results for input of NaN and Inf and also for |x| very small */ \
+ EXCEPTION_HANDLING_TRIG (vx_orig, expo_x, special_args, vy_special, \
+ vlen); \
+ \
+ VBOOL x_large \
+ = __riscv_vmsgeu (expo_x, EXP_BIAS + 24, vlen); /* |x| >= 2^(24) */ \
+ VFLOAT vx_copy = vx; \
+ vx = __riscv_vfmerge (vx, fp_posZero, x_large, vlen); \
+ \
+ VFLOAT n_flt = __riscv_vfmul (vx, PIBY2_INV, vlen); \
+ VINT n = __riscv_vfcvt_x (n_flt, vlen); \
+ n_flt = __riscv_vfcvt_f (n, vlen); \
+ VFLOAT r_hi = __riscv_vfnmsac (vx, PIBY2_HI, n_flt, vlen); \
+ VUINT expo_r = __riscv_vsrl (F_AS_U (r_hi), MAN_LEN, vlen); \
+ expo_r = __riscv_vand (expo_r, 0x7FF, vlen); \
+ VBOOL r_small = __riscv_vmsleu (expo_r, EXP_BIAS - 16, \
+ vlen); /* |r_hi| < 2^(-15) */ \
+ UINT nb_r_small = __riscv_vcpop (r_small, vlen); \
+ VFLOAT r = __riscv_vfnmsac (r_hi, PIBY2_MID, n_flt, vlen); \
+ VFLOAT r_delta = __riscv_vfsub (r_hi, r, vlen); \
+ r_delta = __riscv_vfnmsac (r_delta, PIBY2_MID, n_flt, vlen); \
+ /* At this point, r + r_delta is an accurate reduced argument PROVIDED \
+ // |r_hi| >= 2^(-15) */ \
+ if (nb_r_small > 0) \
+ { \
+ VFLOAT A = __riscv_vfmul (n_flt, PIBY2_MID, vlen); \
+ VFLOAT a = __riscv_vfmsub (n_flt, PIBY2_MID, A, vlen); \
+ /* A + a is n * piby2_mid exactly */ \
+ VFLOAT S = __riscv_vfsub (r_hi, A, vlen); \
+ VFLOAT s = __riscv_vfsub (r_hi, S, vlen); \
+ s = __riscv_vfsub (s, A, vlen); \
+ s = __riscv_vfnmsac (s, PIBY2_LO, n_flt, vlen); \
+ r = __riscv_vmerge (r, S, r_small, vlen); \
+ r_delta = __riscv_vmerge (r_delta, s, r_small, vlen); \
+ } \
+ \
+ if (__riscv_vcpop (x_large, vlen) > 0) \
+ { \
+ VFLOAT r_xlarge, r_delta_xlarge; \
+ VINT n_xlarge; \
+ LARGE_ARGUMENT_REDUCTION_Piby2 (vx_copy, vlen, x_large, n_xlarge, \
+ r_xlarge, r_delta_xlarge); \
+ r = __riscv_vmerge (r, r_xlarge, x_large, vlen); \
+ r_delta = __riscv_vmerge (r_delta, r_delta_xlarge, x_large, vlen); \
+ n = __riscv_vmerge (n, n_xlarge, x_large, vlen); \
+ } \
+ \
+ VUINT n_lsb = __riscv_vand (I_AS_U (n), 0x1, vlen); \
+ VBOOL numer_pick_c = __riscv_vmsne (n_lsb, 0, vlen); \
+ VBOOL denom_pick_c = __riscv_vmnot (numer_pick_c, vlen); \
+ \
+ /* \
+ // sin(r) is approximated by 8 terms corresponding to x, x^3, ..., x^15 \
+ // cos(r) is approximated by 8 terms corresponding to 1, x^2, ..., x^14 \
+ // This "r" is more precise than FP64; it suffices to use the \
+ FP64-precise \
+ // value for the last 6 terms for sin and cos. We only need to use the \
+ // extra precise values for the first two terms for each of the above. \
+ // Our strategy here is to use extra precision simulation with \
+ // floating-point computation \
+ // \
+ // For sin(r), the first 2 terms are r + p r^3 where p is basically -1/6 \
+ // We decompose r into r = r_head + t_tail where r_head is r with the \
+ lower \
+ // 36 bits set to 0. This way, r_head^3 can be computed exactly. r + p \
+ r^3 = \
+ // r + r_head^3 * p + (r^3 - r_head^3) * p r + r_head^3 * p can be \
+ computed \
+ // by sin_hi := r + r_head^3 * p (FMA) sin_corr := (r - sin_hi) + \
+ r_head^3 \
+ // * p (subtract and FMA) sin_hi + sin_corr is is r + r_head^3 * p to \
+ // doubled FP64 precision (way more than needed) Next we need to add (r^3 \
+ - \
+ // r_head^3) * p which is r_tail * (r^2 + r * r_head + r_head^2) * p \
+ because \
+ // r_tail is small, rounding error in computing this is immaterial to the \
+ // final result Finally, we need also to add r_delta * (1 - r^2/2) to \
+ // sin_corr because sin(r + r_delta) ~=~ sin(r) + r_delta * cos(r) ~=~ \
+ // sin(r) + r_delta * (1 - r^2/2). Note that the term 1 - r^2/2 will be \
+ // computed in the course of our computation of cos(r), discussed next. \
+ // \
+ // For cos(r), the first 2 terms are 1 - r^2/2. This can be easily \
+ computed \
+ // to high precision. r_prime := r * 1/2; cos_hi := 1 - r * r_prime \
+ (FMA); \
+ // cos_corr := (1 - cos_hi) - r * r_prime cos_hi can be used above to \
+ // compute r_delta * (1 - r^2/2). Because cos(r + r_delta) ~=~ cos(r) - \
+ // r_delta * sin(r) ~=~ cos(r) - r_delta * r we add the term -r_delta * r \
+ to \
+ // cos_corr \
+ // \
+ // So in a nutshell sin(r) is approximated by sin_hi + sin_lo, \
+ // sin_lo is the sum of sin_corr and a polynomial starting at r^5 \
+ // \
+ // And cos(r) is approximated by cos_hi + cos_lo, \
+ // cos_lo is the sum of cos_corr and a polynomial starting at r^4 \
+ // \
+ // By suitably merging the two, we have numer_hi, numer_lo and denom_hi, \
+ // denom_lo \
+ // */ \
+ \
+ VFLOAT rsq = __riscv_vfmul (r, r, vlen); \
+ \
+ UINT mask_r_head = 1; \
+ mask_r_head = ~((mask_r_head << 36) - 1); \
+ VFLOAT r_head = U_AS_F (__riscv_vand (F_AS_U (r), mask_r_head, vlen)); \
+ VFLOAT r_tail = __riscv_vfsub (r, r_head, vlen); \
+ \
+ UINT exp_m1 = 1; \
+ exp_m1 = (exp_m1 << 52); \
+ VFLOAT r_prime = U_AS_F (__riscv_vsub (F_AS_U (r), exp_m1, vlen)); \
+ /* |r| is never too small, so subtracting 1 from exponent is division by \
+ * 2 */ \
+ \
+ VFLOAT ONE = VFMV_VF (fp_posOne, vlen); \
+ VFLOAT cos_hi = __riscv_vfnmsac (ONE, r, r_prime, vlen); \
+ VFLOAT cos_corr = __riscv_vfsub (ONE, cos_hi, vlen); \
+ cos_corr = __riscv_vfnmsac (cos_corr, r, r_prime, vlen); \
+ cos_corr = __riscv_vfnmsac (cos_corr, r_delta, r, vlen); \
+ \
+ double coeff = -0x1.5555555555555p-3; \
+ VFLOAT r_head_cube = __riscv_vfmul (r_head, r_head, vlen); \
+ r_head_cube = __riscv_vfmul (r_head_cube, r_head, vlen); \
+ VFLOAT sin_hi = __riscv_vfmadd (r_head_cube, coeff, r, vlen); \
+ VFLOAT sin_corr = __riscv_vfsub (r, sin_hi, vlen); \
+ sin_corr = __riscv_vfmacc (sin_corr, coeff, r_head_cube, vlen); \
+ VFLOAT tmp = __riscv_vfmadd (r_head, r_head, rsq, vlen); \
+ VFLOAT tmp2 = __riscv_vfmul (r_tail, coeff, vlen); \
+ tmp = __riscv_vfmacc (tmp, r_head, r, vlen); \
+ sin_corr = __riscv_vfmacc (sin_corr, tmp, tmp2, vlen); \
+ sin_corr = __riscv_vfmacc (sin_corr, r_delta, cos_hi, vlen); \
+ \
+ VFLOAT poly_s = PSTEP ( \
+ 0x1.1111111111069p-7, rsq, \
+ PSTEP (-0x1.a01a019ffe527p-13, rsq, \
+ PSTEP (0x1.71de3a33a62c6p-19, rsq, \
+ PSTEP (-0x1.ae642c52fc493p-26, rsq, \
+ PSTEP (0x1.6109be886e15cp-33, \
+ -0x1.9ffe1dd295e78p-41, rsq, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VFLOAT poly_c = PSTEP ( \
+ 0x1.5555555555546p-5, rsq, \
+ PSTEP (-0x1.6c16c16c1450cp-10, rsq, \
+ PSTEP (0x1.a01a019b77545p-16, rsq, \
+ PSTEP (-0x1.27e4f72551e3dp-22, rsq, \
+ PSTEP (0x1.1ee950032f74cp-29, \
+ -0x1.8f5dd75850673p-37, rsq, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VFLOAT r_to_4 = __riscv_vfmul (rsq, rsq, vlen); \
+ VFLOAT r_to_5 = __riscv_vfmul (r_to_4, r, vlen); \
+ \
+ poly_c = __riscv_vfmadd (poly_c, r_to_4, cos_corr, vlen); \
+ poly_s = __riscv_vfmadd (poly_s, r_to_5, sin_corr, vlen); \
+ \
+ VFLOAT S, s, C, c; \
+ FAST2SUM (sin_hi, poly_s, S, s, vlen); \
+ FAST2SUM (cos_hi, poly_c, C, c, vlen); \
+ \
+ VFLOAT numer_hi, numer_lo, denom_hi, denom_lo; \
+ numer_hi = S; \
+ numer_hi = __riscv_vmerge (numer_hi, C, numer_pick_c, vlen); \
+ numer_lo = s; \
+ numer_lo = __riscv_vmerge (numer_lo, c, numer_pick_c, vlen); \
+ \
+ denom_hi = S; \
+ denom_hi = __riscv_vmerge (denom_hi, C, denom_pick_c, vlen); \
+ denom_lo = s; \
+ denom_lo = __riscv_vmerge (denom_lo, c, denom_pick_c, vlen); \
+ \
+ DIV_N2D2 (numer_hi, numer_lo, denom_hi, denom_lo, vy, vlen); \
+ \
+ /* need to put the correct sign */ \
+ n = __riscv_vsll (n, BIT_WIDTH - 1, vlen); \
+ vy = __riscv_vfsgnjx (vy, I_AS_F (n), vlen); \
+ vy = __riscv_vfsgnjx (vy, vx_orig, vlen); \
+ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_tanh.c b/sysdeps/riscv/rvd/v_d_tanh.c
new file mode 100644
index 0000000000..385c8520e4
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_tanh.c
@@ -0,0 +1,205 @@
+/* Double-precision vector tanh function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_TANHD_VSET_CONFIG
+
+#define COMPILE_FOR_TANH
+#include "rvvlm_hyperbolicsD.h"
+
+// This versions reduces argument to [-log2/2, log2/2]
+// Exploit common expressions exp(R) and exp(-R),
+// and uses purely floating-point computation
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, tanh) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx_orig, vx, vy, vy_special; \
+ VBOOL special_args; \
+ VUINT expo_x; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx_orig = x; \
+ expo_x = __riscv_vand (__riscv_vsrl (F_AS_U (vx_orig), MAN_LEN, vlen), \
+ 0x7FF, vlen); \
+ \
+ /* Set results for input of NaN and Inf and also for |x| very small */ \
+ EXCEPTION_HANDLING_HYPER (vx_orig, expo_x, special_args, vy_special, \
+ vlen); \
+ \
+ /* tanh(x) = sign(x) * tanh(|x|); suffices to work on |x| for the main \
+ * part */ \
+ vx = __riscv_vfsgnj (vx_orig, fp_posOne, vlen); \
+ \
+ /* Suffices to clip |x| to 20, which is bigger than 28 log(2) */ \
+ vx = __riscv_vfmin (vx, 0x1.4p4, vlen); \
+ VINT n; \
+ VFLOAT r, r_delta; \
+ /* tanh(x) = (1 - exp(-2x)) / (1 + exp(-2x)); so we compute exp(-2x) \
+ // by replacing x by -2x */ \
+ vx = __riscv_vfmul (vx, -0x1.0p1, vlen); \
+ ARGUMENT_REDUCTION (vx, n, r, r_delta, vlen); \
+ \
+ /* exp(x) = 2^n exp(r'), r' = r + r_delta and thus we compute 1 +/- \
+ exp(x) \
+ // as 1 +/- 2^(n)(1 + r' + (r')^2/2 + r^3 p(r)) (1 +/- s) +/- s(r' + \
+ // (r')^2/2) +/- s r^3 p(r) To maintain good precision, 1 +/- s and r' + \
+ // (r')^2/2 are computed to extra precision in a leading term and a \
+ // correctional term. This leads to representing 1 +/- exp(x) in a \
+ leading \
+ // and correctional term. */ \
+ \
+ VFLOAT s = I_AS_F ( \
+ __riscv_vsll (__riscv_vadd (n, EXP_BIAS, vlen), MAN_LEN, vlen)); \
+ VBOOL s_is_small = __riscv_vmsle (n, -(MAN_LEN + 1), vlen); \
+ VBOOL s_not_small = __riscv_vmnot (s_is_small, vlen); \
+ /* 1 +/- s is exact when s is not small */ \
+ VFLOAT s_head = __riscv_vfmerge (s, fp_posZero, s_is_small, vlen); \
+ VFLOAT s_tail = __riscv_vfmerge (s, fp_posZero, s_not_small, vlen); \
+ /* s_head + s_tail = s; and 1 +/- s is (1 +/- s_head) +/- s_tail */ \
+ \
+ /* exp(r') is approximated by 1 + r' + (r')^2/2 + r^3(p_even(r^2) + \
+ // r*p_odd(r^2)) using r without delta_r sufficies from the third order \
+ // onwards */ \
+ VFLOAT rsq = __riscv_vfmul (r, r, vlen); \
+ VFLOAT rcube = __riscv_vfmul (rsq, r, vlen); \
+ \
+ VFLOAT p_even \
+ = PSTEP (0x1.555555555555ap-3, rsq, \
+ PSTEP (0x1.111111110ef6ap-7, rsq, \
+ PSTEP (0x1.a01a01b32b633p-13, rsq, \
+ PSTEP (0x1.71ddef82f4beep-19, \
+ 0x1.af6eacd796f0bp-26, rsq, vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VFLOAT p_odd = PSTEP (0x1.5555555553aefp-5, rsq, \
+ PSTEP (0x1.6c16c17a09506p-10, rsq, \
+ PSTEP (0x1.a019b37a2b3dfp-16, \
+ 0x1.289788d8bdadfp-22, rsq, vlen), \
+ vlen), \
+ vlen); \
+ \
+ VFLOAT poly = __riscv_vfmadd (p_odd, r, p_even, vlen); \
+ /* r^3 * poly will be r^3(...) \
+ // we delay this multiplication with r^3 for now */ \
+ \
+ /* Compute r' + (r')^2/2 extra precisely */ \
+ VFLOAT r_prime = __riscv_vfmul (r, 0x1.0p-1, vlen); \
+ VFLOAT B = __riscv_vfmadd (r, r_prime, r, vlen); \
+ VFLOAT b = __riscv_vfsub (r, B, vlen); \
+ b = __riscv_vfmacc (b, r, r_prime, vlen); \
+ /* B + b is r' + (r')^2/2 extra precisely \
+ // incoporate r_delta in R + R^2/2 */ \
+ VFLOAT c = __riscv_vfmadd (r, r_delta, r_delta, vlen); \
+ b = __riscv_vfadd (b, c, vlen); \
+ poly = __riscv_vfmadd (poly, rcube, b, vlen); \
+ /* B + poly is r' + (r')^2/2 + r^3(.....) \
+ // and exp(r') is well approximated by s*(1 + B + poly) */ \
+ \
+ /* We compute the denominator 1 + exp(R) first as \
+ // we will need to recipricate afterwards, the latency of which \
+ // can be hidden somewhat by proceeding with the numerator \
+ // at that time */ \
+ VFLOAT Z = __riscv_vfadd (s_head, fp_posOne, vlen); \
+ VFLOAT D_tmp = __riscv_vfmadd (B, s, Z, vlen); \
+ VFLOAT d_tmp = __riscv_vfsub (Z, D_tmp, vlen); \
+ d_tmp = __riscv_vfmacc (d_tmp, s, B, vlen); \
+ d_tmp = __riscv_vfadd (d_tmp, s_tail, vlen); \
+ d_tmp = __riscv_vfmacc (d_tmp, s, poly, vlen); \
+ /* D_tmp + d_tmp is 1 + exp(R) to high precision, but we have to \
+ // normalize this representation so that the leading term \
+ // has full FP64 precision of this sum */ \
+ VFLOAT D, d; \
+ FAST2SUM (D_tmp, d_tmp, D, d, vlen); \
+ /* VFLOAT D = __riscv_vfadd(D_tmp, d, vlen); */ \
+ /* Z = __riscv_vfsub(D_tmp, D, vlen); */ \
+ /* d = __riscv_vfadd(Z, d, vlen); */ \
+ \
+ /* Now start to compute 1/(D+d) as E + e */ \
+ VFLOAT One = VFMV_VF (fp_posOne, vlen); \
+ VFLOAT E, e; \
+ DIV_N1D2 (One, D, d, E, e, vlen); \
+ /* E + e is 1/(D+d) to extra precision */ \
+ \
+ /* Overlap much of the 1/(D+d) computation with \
+ // computing 1 - s(1 + B + poly) */ \
+ Z = __riscv_vfrsub (s_head, fp_posOne, vlen); \
+ \
+ VFLOAT Numer = __riscv_vfnmsub (B, s, Z, vlen); \
+ VFLOAT numer = __riscv_vfsub (Z, Numer, vlen); \
+ numer = __riscv_vfnmsac (numer, s, B, vlen); \
+ \
+ /* Numer + numer = Z - s * B accurately */ \
+ numer = __riscv_vfsub (numer, s_tail, vlen); \
+ numer = __riscv_vfnmsac (numer, s, poly, vlen); \
+ \
+ /* (Numer + numer) * (E + e) \
+ // Numer * E + ( numer * E + (Numer * e + (e*numer)) ) */ \
+ vy = __riscv_vfmul (e, numer, vlen); \
+ vy = __riscv_vfmacc (vy, Numer, e, vlen); \
+ vy = __riscv_vfmacc (vy, numer, E, vlen); \
+ vy = __riscv_vfmacc (vy, Numer, E, vlen); \
+ \
+ vy = __riscv_vfsgnj (vy, vx_orig, vlen); \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_tanpi.c b/sysdeps/riscv/rvd/v_d_tanpi.c
new file mode 100644
index 0000000000..bb5b6c5abf
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_tanpi.c
@@ -0,0 +1,264 @@
+/* Double-precision vector tanpi function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_TAND_VSET_CONFIG
+
+#define COMPILE_FOR_TANPI
+#include "rvvlm_trigD.h"
+
+// This versions reduces argument to [-pi/4, pi/4] and computes sin(r) or
+// cos(r) tan(x) is either sin(r)/cos(r) or -cos(r)/sin(r)
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, tanpi) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx_orig, vx, vy, vy_special; \
+ VBOOL special_args; \
+ VUINT expo_x; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx_orig = x; \
+ vx = __riscv_vfsgnj (vx_orig, fp_posOne, vlen); \
+ expo_x = __riscv_vsrl (F_AS_U (vx), MAN_LEN, vlen); \
+ \
+ /* Set results for input of NaN and Inf and also for |x| very small */ \
+ EXCEPTION_HANDLING_TRIG (vx_orig, expo_x, special_args, vy_special, \
+ vlen); \
+ \
+ VBOOL x_large \
+ = __riscv_vmsgeu (expo_x, EXP_BIAS + 53, vlen); /* |x| >= 2^(53) */ \
+ vx = __riscv_vfmerge (vx, fp_posZero, x_large, vlen); \
+ \
+ /* Usual argument reduction \
+ // N = rint(2x); rem := 2x - N, |rem| <= 1/2 and x = (N/2) + (rem/2); \
+ // x pi = N (pi/2) + rem * (pi/2) */ \
+ VFLOAT two_x = __riscv_vfadd (vx, vx, vlen); \
+ VINT n = __riscv_vfcvt_x (two_x, vlen); \
+ VFLOAT n_flt = __riscv_vfcvt_f (n, vlen); \
+ VFLOAT rem = __riscv_vfsub (two_x, n_flt, vlen); \
+ VBOOL x_is_n_piby2 = __riscv_vmseq (F_AS_U (rem), 0, vlen); \
+ /* Now rem * pi_by_2 as r + r_delta \
+ // tanpi can be exactly 0 or Inf when x_is_n_piby2 \
+ // Furthermore, the signs of these 0 and Inf are as follows. \
+ // tanpi(-X) = -tanpi(X). Thus only consider X >= 0. \
+ // tanpi(n * pi/2) = (-1)^floor(n/2) {0 if n even; Inf otherwise} */ \
+ if (__riscv_vcpop (x_is_n_piby2, vlen) > 0) \
+ { \
+ VBOOL n_even = __riscv_vmseq (__riscv_vand (n, 0x1, vlen), 0, vlen); \
+ VBOOL set_inf = __riscv_vmandn (x_is_n_piby2, n_even, vlen); \
+ VFLOAT Zero_or_Inf = VFMV_VF (fp_posZero, vlen); \
+ Zero_or_Inf = __riscv_vmerge ( \
+ Zero_or_Inf, __riscv_vfrec7 (set_inf, Zero_or_Inf, vlen), \
+ set_inf, vlen); \
+ Zero_or_Inf = __riscv_vfsgnj ( \
+ Zero_or_Inf, \
+ U_AS_F (__riscv_vsll (I_AS_U (n), BIT_WIDTH - 2, vlen)), vlen); \
+ Zero_or_Inf = __riscv_vfsgnjx (Zero_or_Inf, vx_orig, vlen); \
+ vy_special \
+ = __riscv_vmerge (vy_special, Zero_or_Inf, x_is_n_piby2, vlen); \
+ special_args = __riscv_vmor (special_args, x_is_n_piby2, vlen); \
+ n = __riscv_vmerge (n, 0, x_is_n_piby2, vlen); \
+ } \
+ VFLOAT r = __riscv_vfmul (rem, PIBY2_HI, vlen); \
+ VFLOAT r_delta = __riscv_vfmsac (r, PIBY2_HI, rem, vlen); \
+ r_delta = __riscv_vfmacc (r_delta, PIBY2_MID, rem, vlen); \
+ /* At this point, r + r_delta is an accurate reduced argument PROVIDED */ \
+ \
+ VUINT n_lsb = __riscv_vand (I_AS_U (n), 0x1, vlen); \
+ VBOOL numer_pick_c = __riscv_vmsne (n_lsb, 0, vlen); \
+ VBOOL denom_pick_c = __riscv_vmnot (numer_pick_c, vlen); \
+ \
+ /* \
+ // sin(r) is approximated by 8 terms corresponding to x, x^3, ..., x^15 \
+ // cos(r) is approximated by 8 terms corresponding to 1, x^2, ..., x^14 \
+ // This "r" is more precise than FP64; it suffices to use the \
+ FP64-precise \
+ // value for the last 6 terms for sin and cos. We only need to use the \
+ // extra precise values for the first two terms for each of the above. \
+ // Our strategy here is to use extra precision simulation with \
+ // floating-point computation \
+ // \
+ // For sin(r), the first 2 terms are r + p r^3 where p is basically -1/6 \
+ // We decompose r into r = r_head + t_tail where r_head is r with the \
+ lower \
+ // 36 bits set to 0. This way, r_head^3 can be computed exactly. r + p \
+ r^3 = \
+ // r + r_head^3 * p + (r^3 - r_head^3) * p r + r_head^3 * p can be \
+ computed \
+ // by sin_hi := r + r_head^3 * p (FMA) sin_corr := (r - sin_hi) + \
+ r_head^3 \
+ // * p (subtract and FMA) sin_hi + sin_corr is is r + r_head^3 * p to \
+ // doubled FP64 precision (way more than needed) Next we need to add (r^3 \
+ - \
+ // r_head^3) * p which is r_tail * (r^2 + r * r_head + r_head^2) * p \
+ because \
+ // r_tail is small, rounding error in computing this is immaterial to the \
+ // final result Finally, we need also to add r_delta * (1 - r^2/2) to \
+ // sin_corr because sin(r + r_delta) ~=~ sin(r) + r_delta * cos(r) ~=~ \
+ // sin(r) + r_delta * (1 - r^2/2). Note that the term 1 - r^2/2 will be \
+ // computed in the course of our computation of cos(r), discussed next. \
+ // \
+ // For cos(r), the first 2 terms are 1 - r^2/2. This can be easily \
+ computed \
+ // to high precision. r_prime := r * 1/2; cos_hi := 1 - r * r_prime \
+ (FMA); \
+ // cos_corr := (1 - cos_hi) - r * r_prime cos_hi can be used above to \
+ // compute r_delta * (1 - r^2/2). Because cos(r + r_delta) ~=~ cos(r) - \
+ // r_delta * sin(r) ~=~ cos(r) - r_delta * r we add the term -r_delta * r \
+ to \
+ // cos_corr \
+ // \
+ // So in a nutshell sin(r) is approximated by sin_hi + sin_lo, \
+ // sin_lo is the sum of sin_corr and a polynomial starting at r^5 \
+ // \
+ // And cos(r) is approximated by cos_hi + cos_lo, \
+ // cos_lo is the sum of cos_corr and a polynomial starting at r^4 \
+ // \
+ // By suitably merging the two, we have numer_hi, numer_lo and denom_hi, \
+ // denom_lo \
+ // */ \
+ \
+ VFLOAT rsq = __riscv_vfmul (r, r, vlen); \
+ \
+ UINT mask_r_head = 1; \
+ mask_r_head = ~((mask_r_head << 36) - 1); \
+ VFLOAT r_head = U_AS_F (__riscv_vand (F_AS_U (r), mask_r_head, vlen)); \
+ VFLOAT r_tail = __riscv_vfsub (r, r_head, vlen); \
+ \
+ UINT exp_m1 = 1; \
+ exp_m1 = (exp_m1 << 52); \
+ VFLOAT r_prime = U_AS_F (__riscv_vsub (F_AS_U (r), exp_m1, vlen)); \
+ /* |r| is never too small, so subtracting 1 from exponent is division by \
+ * 2 */ \
+ \
+ VFLOAT ONE = VFMV_VF (fp_posOne, vlen); \
+ VFLOAT cos_hi = __riscv_vfnmsac (ONE, r, r_prime, vlen); \
+ VFLOAT cos_corr = __riscv_vfsub (ONE, cos_hi, vlen); \
+ cos_corr = __riscv_vfnmsac (cos_corr, r, r_prime, vlen); \
+ cos_corr = __riscv_vfnmsac (cos_corr, r_delta, r, vlen); \
+ \
+ double coeff = -0x1.5555555555555p-3; \
+ VFLOAT r_head_cube = __riscv_vfmul (r_head, r_head, vlen); \
+ r_head_cube = __riscv_vfmul (r_head_cube, r_head, vlen); \
+ VFLOAT sin_hi = __riscv_vfmadd (r_head_cube, coeff, r, vlen); \
+ VFLOAT sin_corr = __riscv_vfsub (r, sin_hi, vlen); \
+ sin_corr = __riscv_vfmacc (sin_corr, coeff, r_head_cube, vlen); \
+ VFLOAT tmp = __riscv_vfmadd (r_head, r_head, rsq, vlen); \
+ VFLOAT tmp2 = __riscv_vfmul (r_tail, coeff, vlen); \
+ tmp = __riscv_vfmacc (tmp, r_head, r, vlen); \
+ sin_corr = __riscv_vfmacc (sin_corr, tmp, tmp2, vlen); \
+ sin_corr = __riscv_vfmacc (sin_corr, r_delta, cos_hi, vlen); \
+ \
+ VFLOAT poly_s = PSTEP ( \
+ 0x1.1111111111069p-7, rsq, \
+ PSTEP (-0x1.a01a019ffe527p-13, rsq, \
+ PSTEP (0x1.71de3a33a62c6p-19, rsq, \
+ PSTEP (-0x1.ae642c52fc493p-26, rsq, \
+ PSTEP (0x1.6109be886e15cp-33, \
+ -0x1.9ffe1dd295e78p-41, rsq, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VFLOAT poly_c = PSTEP ( \
+ 0x1.5555555555546p-5, rsq, \
+ PSTEP (-0x1.6c16c16c1450cp-10, rsq, \
+ PSTEP (0x1.a01a019b77545p-16, rsq, \
+ PSTEP (-0x1.27e4f72551e3dp-22, rsq, \
+ PSTEP (0x1.1ee950032f74cp-29, \
+ -0x1.8f5dd75850673p-37, rsq, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VFLOAT r_to_4 = __riscv_vfmul (rsq, rsq, vlen); \
+ VFLOAT r_to_5 = __riscv_vfmul (r_to_4, r, vlen); \
+ \
+ poly_c = __riscv_vfmadd (poly_c, r_to_4, cos_corr, vlen); \
+ poly_s = __riscv_vfmadd (poly_s, r_to_5, sin_corr, vlen); \
+ \
+ VFLOAT S, s, C, c; \
+ FAST2SUM (sin_hi, poly_s, S, s, vlen); \
+ FAST2SUM (cos_hi, poly_c, C, c, vlen); \
+ \
+ VFLOAT numer_hi, numer_lo, denom_hi, denom_lo; \
+ numer_hi = S; \
+ numer_hi = __riscv_vmerge (numer_hi, C, numer_pick_c, vlen); \
+ numer_lo = s; \
+ numer_lo = __riscv_vmerge (numer_lo, c, numer_pick_c, vlen); \
+ \
+ denom_hi = S; \
+ denom_hi = __riscv_vmerge (denom_hi, C, denom_pick_c, vlen); \
+ denom_lo = s; \
+ denom_lo = __riscv_vmerge (denom_lo, c, denom_pick_c, vlen); \
+ \
+ DIV_N2D2 (numer_hi, numer_lo, denom_hi, denom_lo, vy, vlen); \
+ \
+ /* need to put the correct sign */ \
+ n = __riscv_vsll (n, BIT_WIDTH - 1, vlen); \
+ vy = __riscv_vfsgnjx (vy, I_AS_F (n), vlen); \
+ vy = __riscv_vfsgnjx (vy, vx_orig, vlen); \
+ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_d_tgamma.c b/sysdeps/riscv/rvd/v_d_tgamma.c
new file mode 100644
index 0000000000..4f646c5318
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_d_tgamma.c
@@ -0,0 +1,515 @@
+/* Double-precision vector tgamma function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "rvvlm.h"
+#include "v_math.h"
+#include <riscv_vector.h>
+
+#define API_SIGNATURE API_SIGNATURE_11
+#define STRIDE UNIT_STRIDE
+
+#include RVVLM_TGAMMAD_VSET_CONFIG
+
+#include "rvvlm_gammafuncsD.h"
+
+//---Approximate log(x) by w + w^3 poly(w^2)
+// w = 2(x-1)/(x+1), x roughly in [1/rt(2), rt(2)]
+#define P_log_0 0x5555555555555090 // Q_66
+#define P_log_1 0x666666666686863a // Q_69
+#define P_log_2 0x49249248fc99ba4b // Q_71
+#define P_log_3 0x71c71ca402e164fa // Q_74
+#define P_log_4 0x5d1733e3ae94dde0 // Q_76
+#define P_log_5 0x4ec8b69784234032 // Q_78
+#define P_log_6 0x43cc44a056dc3c93 // Q_80
+#define P_log_7 0x4432439bb76e7d74 // Q_82
+
+#define LOG2_HI 0x1.62e42fefa4000p-1
+#define LOG2_LO -0x1.8432a1b0e2634p-43
+
+//---Approximate exp(R) by 1 + R + R^2*poly(R)
+#define P_exp_0 0x400000000000004e // Q_63
+#define P_exp_1 0x1555555555555b6e // Q_63
+#define P_exp_2 0x555555555553378 // Q_63
+#define P_exp_3 0x1111111110ec10d // Q_63
+#define P_exp_4 0x2d82d82d87a9b5 // Q_63
+#define P_exp_5 0x6806806ce6d6f // Q_63
+#define P_exp_6 0xd00d00841fcf // Q_63
+#define P_exp_7 0x171ddefda54b // Q_63
+#define P_exp_8 0x24fcc01d627 // Q_63
+#define P_exp_9 0x35ed8bbd24 // Q_63
+#define P_exp_10 0x477745b6c // Q_63
+
+//---Approximate Stirling correction by P(t)/Q(t)
+// Gamma(x) = (x/e)^(x-1/2) * P(t)/Q(t), t = 1/x, x in [2, 180]
+#define P_corr_0 0x599ecf7a9368327 // Q_78
+#define P_corr_1 0x120a4be8e3d8673d // Q_78
+#define P_corr_2 0x2ab73aec63e90213 // Q_78
+#define P_corr_3 0x32f903e18454e088 // Q_78
+#define P_corr_4 0x29f463d533d0a4b5 // Q_78
+#define P_corr_5 0x1212989fdf61f6c1 // Q_78
+#define P_corr_6 0x48706d4f75a0491 // Q_78
+#define P_corr_7 0x5591439d2d51a6 // Q_78
+
+#define Q_corr_0 0x75e5053ce715a76 // Q_79
+#define Q_corr_1 0x171e2068d3ef7453 // Q_79
+#define Q_corr_2 0x363d736690f2373f // Q_79
+#define Q_corr_3 0x3e793a1cc19bbc32 // Q_79
+#define Q_corr_4 0x31dc2fbf92ec978c // Q_79
+#define Q_corr_5 0x138c2244d1c1e0b1 // Q_79
+#define Q_corr_6 0x450a7392d81c20f // Q_79
+#define Q_corr_7 0x1ed9c605221435 // Q_79
+
+//---Approximate sin(pi x)/pi as x + x^3 poly(x^2)
+#define P_sin_0 -0x694699894c1f4ae7 // Q_62
+#define P_sin_1 0x33f396805788034f // Q_62
+#define P_sin_2 -0xc3547239048c220 // Q_62
+#define P_sin_3 0x1ac6805cc1cecf4 // Q_62
+#define P_sin_4 -0x26702d2fd5a3e6 // Q_62
+#define P_sin_5 0x26e8d360232c6 // Q_62
+#define P_sin_6 -0x1d3e4d9787ba // Q_62
+#define P_sin_7 0x107298fc107 // Q_62
+
+//---Compute log(x/e) to 2^(-65) absolute accuracy
+// for Stirlings formula (x/e)^x sqrt(2pi/x) = (x/e)^(x-1/2) sqrt(2pi/e)
+#define TGAMMA_LOG(x_hi, x_lo, y_hi, y_lo, vlen) \
+ do \
+ { \
+ VFLOAT x_in_hi = (x_hi); \
+ VFLOAT x_in_lo = (x_lo); \
+ VINT n = __riscv_vadd ( \
+ __riscv_vsra (F_AS_I (x_in_hi), MAN_LEN - 8, (vlen)), 0x96, vlen); \
+ n = __riscv_vsub (__riscv_vsra (n, 8, vlen), EXP_BIAS, vlen); \
+ VFLOAT scale = I_AS_F (__riscv_vsll ( \
+ __riscv_vrsub (n, EXP_BIAS, (vlen)), MAN_LEN, (vlen))); \
+ x_in_hi = __riscv_vfmul (x_in_hi, scale, (vlen)); \
+ x_in_lo = __riscv_vfmul (x_in_lo, scale, (vlen)); \
+ /* x is scaled, and log(x) is 2 atanh(w/2); w = 2(x-1)/(x+1) */ \
+ \
+ VFLOAT numer, denom, denom_delta; \
+ numer = __riscv_vfsub (x_in_hi, fp_posOne, (vlen)); /* exact */ \
+ denom = __riscv_vfadd (x_in_hi, fp_posOne, (vlen)); \
+ denom_delta = __riscv_vfadd (__riscv_vfrsub (denom, fp_posOne, (vlen)), \
+ x_in_hi, (vlen)); \
+ denom_delta = __riscv_vfadd (denom_delta, x_in_lo, (vlen)); \
+ VFLOAT w_hi, w_lo; \
+ ACC_DIV2_N2D2 (numer, x_in_lo, denom, denom_delta, w_hi, w_lo, vlen); \
+ /* w_hi + w_lo is at this point (x-1)/(x+1) */ \
+ /* Next get 2(x-1)/(x+1) in Q64 fixed point */ \
+ VINT W \
+ = __riscv_vfcvt_x (__riscv_vfmul (w_hi, 0x1.0p65, (vlen)), (vlen)); \
+ W = __riscv_vadd ( \
+ W, \
+ __riscv_vfcvt_x (__riscv_vfmul (w_lo, 0x1.0p65, (vlen)), (vlen)), \
+ (vlen)); \
+ /* W is in Q64 because W is 2(x-1)/(x+1) */ \
+ \
+ VFLOAT n_flt = __riscv_vfcvt_f (n, (vlen)); \
+ VINT W2 = __riscv_vsmul (W, W, 1, (vlen)); /* Q65 */ \
+ \
+ VINT P_right, P_left, W8; \
+ P_right = PSTEP_I_SRA (P_log_6, P_log_7, 4, W2, (vlen)); \
+ P_right = PSTEP_I_SRA (P_log_5, W2, 4, P_right, (vlen)); \
+ P_right = PSTEP_I_SRA (P_log_4, W2, 4, P_right, (vlen)); \
+ /* P_right in Q76 */ \
+ P_left = PSTEP_I_SRA (P_log_2, P_log_3, 5, W2, (vlen)); \
+ P_left = PSTEP_I_SRA (P_log_1, W2, 4, P_left, (vlen)); \
+ P_left = PSTEP_I_SRA (P_log_0, W2, 5, P_left, (vlen)); \
+ /* P_left in Q66 */ \
+ W8 = __riscv_vsmul (W2, W2, 1, (vlen)); /* Q67 */ \
+ W8 = __riscv_vsmul (W8, W8, 1, (vlen)); /* Q71 */ \
+ P_right = __riscv_vsmul (P_right, W8, 1, (vlen)); /* Q84 */ \
+ P_right = __riscv_vsra (P_right, 18, (vlen)); /* Q66 */ \
+ P_left = __riscv_vadd (P_left, P_right, (vlen)); /* Q66 */ \
+ \
+ VINT W3 = __riscv_vsmul (W2, W, 1, (vlen)); /* Q66 */ \
+ P_left = __riscv_vsmul (P_left, W3, 1, (vlen)); /* Q69 */ \
+ VFLOAT poly_hi = __riscv_vfcvt_f (P_left, (vlen)); \
+ P_left \
+ = __riscv_vsub (P_left, __riscv_vfcvt_x (poly_hi, (vlen)), (vlen)); \
+ VFLOAT poly_lo = __riscv_vfcvt_f (P_left, (vlen)); \
+ poly_hi = __riscv_vfmul (poly_hi, 0x1.0p-69, (vlen)); \
+ poly_lo = __riscv_vfmul (poly_lo, 0x1.0p-69, (vlen)); \
+ \
+ /* n*log(2) - 1 + w + poly is the desired result */ \
+ VFLOAT A, B; \
+ A = __riscv_vfmul (n_flt, LOG2_HI, (vlen)); /* exact */ \
+ A = __riscv_vfsub (A, fp_posOne, (vlen)); /* exact due to A's range */ \
+ w_hi = __riscv_vfadd (w_hi, w_hi, (vlen)); \
+ w_lo = __riscv_vfadd (w_lo, w_lo, (vlen)); \
+ FAST2SUM (A, w_hi, B, (y_lo), (vlen)); \
+ w_lo = __riscv_vfadd ((y_lo), w_lo, (vlen)); \
+ w_lo = __riscv_vfmacc (w_lo, LOG2_LO, n_flt, (vlen)); \
+ poly_lo = __riscv_vfadd (poly_lo, w_lo, (vlen)); \
+ FAST2SUM (B, poly_hi, (y_hi), (y_lo), (vlen)); \
+ (y_lo) = __riscv_vfadd ((y_lo), poly_lo, (vlen)); \
+ } \
+ while (0)
+
+//---Compute exp for Stirlings formula used in tgamma
+// computes exp(x_hi + x_lo) as 2^n * EXP, EXP is fixed-point Q62
+#define TGAMMA_EXP(x_hi, x_lo, n, EXP, vlen) \
+ do \
+ { \
+ VFLOAT n_flt = __riscv_vfmul ((x_hi), 0x1.71547652b82fep+0, (vlen)); \
+ (n) = __riscv_vfcvt_x (n_flt, (vlen)); \
+ n_flt = __riscv_vfcvt_f ((n), (vlen)); \
+ VFLOAT r_hi = __riscv_vfnmsub (n_flt, LOG2_HI, (x_hi), (vlen)); \
+ VFLOAT r_lo = __riscv_vfnmsub (n_flt, LOG2_LO, (x_lo), (vlen)); \
+ r_hi = __riscv_vfmul (r_hi, 0x1.0p63, (vlen)); \
+ r_lo = __riscv_vfmul (r_lo, 0x1.0p63, (vlen)); \
+ VINT R = __riscv_vfcvt_x (r_hi, (vlen)); \
+ R = __riscv_vadd (R, __riscv_vfcvt_x (r_lo, (vlen)), (vlen)); \
+ /* R is reduced argument in Q63 */ \
+ \
+ VINT P_right = PSTEP_I ( \
+ P_exp_5, R, \
+ PSTEP_I (P_exp_6, R, \
+ PSTEP_I (P_exp_7, R, \
+ PSTEP_I (P_exp_8, R, \
+ PSTEP_I (P_exp_9, P_exp_10, R, (vlen)), \
+ (vlen)), \
+ (vlen)), \
+ (vlen)), \
+ (vlen)); \
+ VINT R_sq = __riscv_vsmul (R, R, 1, (vlen)); \
+ VINT R_to_5 = __riscv_vsmul (R_sq, R_sq, 1, (vlen)); \
+ R_to_5 = __riscv_vsmul (R_sq, R_sq, 1, (vlen)); \
+ R_to_5 = __riscv_vsmul (R_to_5, R, 1, (vlen)); \
+ VINT P_left = PSTEP_I ( \
+ P_exp_0, R, \
+ PSTEP_I (P_exp_1, R, \
+ PSTEP_I (P_exp_2, R, \
+ PSTEP_I (P_exp_3, P_exp_4, R, (vlen)), (vlen)), \
+ (vlen)), \
+ (vlen)); \
+ P_right = __riscv_vsmul (P_right, R_to_5, 1, (vlen)); \
+ P_left = __riscv_vadd (P_right, P_left, (vlen)); \
+ P_left = __riscv_vsmul (P_left, R_sq, 1, (vlen)); \
+ P_left = __riscv_vadd (P_left, R, (vlen)); \
+ (EXP) = __riscv_vsra (P_left, 1, (vlen)); \
+ INT ONE = (1LL) << 62; \
+ (EXP) = __riscv_vadd ((EXP), ONE, (vlen)); \
+ } \
+ while (0)
+
+// Compute the term (x/e)^(x-1/2) for 2 <= x <= 180.
+// Return integer n and Q62 fixed point EXP, 2^n value_of(EXP) is (x/e)^(x-1/2)
+#define STIRLING_POWER(x_hi, x_lo, n, EXP, vlen) \
+ do \
+ { \
+ VFLOAT y_hi, y_lo; \
+ TGAMMA_LOG ((x_hi), (x_lo), y_hi, y_lo, (vlen)); \
+ VFLOAT x_m_half = __riscv_vfsub ((x_hi), 0x1.0p-1, (vlen)); \
+ /* compute (x_m_half, x_lo) * (y_hi, y_lo) */ \
+ VFLOAT z_hi, z_lo; \
+ PROD_X1Y1 (x_m_half, y_hi, z_hi, z_lo, (vlen)); \
+ z_lo = __riscv_vfmacc (z_lo, x_m_half, y_lo, (vlen)); \
+ z_lo = __riscv_vfmacc (z_lo, (x_lo), y_hi, (vlen)); \
+ TGAMMA_EXP (z_hi, z_lo, (n), (EXP), (vlen)); \
+ } \
+ while (0)
+
+// Gamma based on Stirling formula is Gamma(x) ~ (x/e)^x sqrt(2 pi / x)
+// poly(1/x) To incoporate the 1/sqrt(x) into the power calculation
+// (x/e)^(x-1/2) sqrt(2 pi / e ) poly(1/x)
+// This poly(1/x) is in essense a correction term
+#define STIRLING_CORRECTION(x_hi, x_lo, P_SC, Q_SC, vlen) \
+ do \
+ { \
+ /* 2 <= x < 180. Use Q62 to represent 1/x in fixed point */ \
+ VFLOAT y_hi = __riscv_vfrdiv ((x_hi), fp_posOne, (vlen)); \
+ VFLOAT y_lo = VFMV_VF (fp_posOne, (vlen)); \
+ y_lo = __riscv_vfnmsub ((x_hi), y_hi, y_lo, (vlen)); \
+ y_lo = __riscv_vfnmsac (y_lo, (x_lo), y_hi, (vlen)); \
+ y_lo = __riscv_vfmul (y_hi, y_lo, (vlen)); \
+ y_hi = __riscv_vfmul (y_hi, 0x1.0p62, (vlen)); \
+ y_lo = __riscv_vfmul (y_lo, 0x1.0p62, (vlen)); \
+ VINT R = __riscv_vfcvt_x (y_hi, (vlen)); \
+ R = __riscv_vadd (R, __riscv_vfcvt_x (y_lo, (vlen)), (vlen)); \
+ /* R is 1/(x_hi+x_lo) in Q62 */ \
+ (P_SC) = PSTEP_I_SLL (P_corr_6, P_corr_7, 1, R, (vlen)); \
+ (P_SC) = PSTEP_I_SLL (P_corr_5, R, 1, (P_SC), (vlen)); \
+ (P_SC) = PSTEP_I_SLL (P_corr_4, R, 1, (P_SC), (vlen)); \
+ (P_SC) = PSTEP_I_SLL (P_corr_3, R, 1, (P_SC), (vlen)); \
+ (P_SC) = PSTEP_I_SLL (P_corr_2, R, 1, (P_SC), (vlen)); \
+ (P_SC) = PSTEP_I_SLL (P_corr_1, R, 1, (P_SC), (vlen)); \
+ (P_SC) = PSTEP_I_SLL (P_corr_0, R, 1, (P_SC), (vlen)); \
+ \
+ (Q_SC) = PSTEP_I_SLL (Q_corr_6, Q_corr_7, 1, R, (vlen)); \
+ (Q_SC) = PSTEP_I_SLL (Q_corr_5, R, 1, (Q_SC), (vlen)); \
+ (Q_SC) = PSTEP_I_SLL (Q_corr_4, R, 1, (Q_SC), (vlen)); \
+ (Q_SC) = PSTEP_I_SLL (Q_corr_3, R, 1, (Q_SC), (vlen)); \
+ (Q_SC) = PSTEP_I_SLL (Q_corr_2, R, 1, (Q_SC), (vlen)); \
+ (Q_SC) = PSTEP_I_SLL (Q_corr_1, R, 1, (Q_SC), (vlen)); \
+ (Q_SC) = PSTEP_I_SLL (Q_corr_0, R, 1, (Q_SC), (vlen)); \
+ } \
+ while (0)
+
+// When input x to gamma(x) is negative, a factor of sin(pi x)/pi
+// is needed. When x is an exact negative integer, we need to return
+// +-inf as special values and also raise the divide-by-zero signal
+// The input to TGAMMA_SIN is actually |x| clipped to [2^(-60), 179.5]
+#define TGAMMA_SIN(x, P_SIN, SIN_scale, n, vy_special, special_args, vlen) \
+ do \
+ { \
+ VFLOAT n_flt; \
+ (n) = __riscv_vfcvt_x ((x), (vlen)); \
+ n_flt = __riscv_vfcvt_f ((n), (vlen)); \
+ VFLOAT r = __riscv_vfsub ((x), n_flt, (vlen)); \
+ VINT m = __riscv_vsra (F_AS_I (r), MAN_LEN, (vlen)); \
+ m = __riscv_vrsub (__riscv_vand (m, 0x7FF, (vlen)), EXP_BIAS, (vlen)); \
+ /* r = 2^(-m) * val, val in [1, 2). Note that 1 <= m <= 60 */ \
+ VFLOAT scale = I_AS_F (__riscv_vsll ( \
+ __riscv_vadd (m, EXP_BIAS + 61, (vlen)), MAN_LEN, (vlen))); \
+ VINT R = __riscv_vfcvt_x (__riscv_vfmul (r, scale, (vlen)), (vlen)); \
+ /* R is fixed point in scale 61+m */ \
+ VFLOAT rsq = __riscv_vfmul (r, r, (vlen)); \
+ VFLOAT rsq_lo = __riscv_vfmsub (r, r, rsq, (vlen)); \
+ VINT Rsq \
+ = __riscv_vfcvt_x (__riscv_vfmul (rsq, 0x1.0p63, (vlen)), (vlen)); \
+ Rsq = __riscv_vadd ( \
+ Rsq, \
+ __riscv_vfcvt_x (__riscv_vfmul (rsq_lo, 0x1.0p63, (vlen)), (vlen)), \
+ (vlen)); \
+ VINT P_right = PSTEP_I ( \
+ P_sin_4, Rsq, \
+ PSTEP_I (P_sin_5, Rsq, PSTEP_I (P_sin_6, P_sin_7, Rsq, (vlen)), \
+ (vlen)), \
+ (vlen)); \
+ VINT R8 = __riscv_vsmul (Rsq, Rsq, 1, (vlen)); \
+ R8 = __riscv_vsmul (R8, R8, 1, (vlen)); \
+ VINT P_left = PSTEP_I ( \
+ P_sin_0, Rsq, \
+ PSTEP_I (P_sin_1, Rsq, PSTEP_I (P_sin_2, P_sin_3, Rsq, (vlen)), \
+ (vlen)), \
+ (vlen)); \
+ P_right = __riscv_vsmul (P_right, R8, 1, (vlen)); \
+ P_left = __riscv_vadd (P_left, P_right, (vlen)); \
+ P_left = __riscv_vsmul (P_left, Rsq, 1, (vlen)); \
+ P_left = __riscv_vsmul (P_left, R, 1, (vlen)); \
+ (P_SIN) = __riscv_vadd (R, __riscv_vsll (P_left, 1, (vlen)), (vlen)); \
+ (SIN_scale) = __riscv_vadd (m, 61, (vlen)); \
+ VBOOL pole = __riscv_vmseq (R, 0, (vlen)); \
+ if (__riscv_vcpop (pole, (vlen)) > 0) \
+ { \
+ VFLOAT pm_inf = __riscv_vfrec7 (pole, I_AS_F (R), (vlen)); \
+ pm_inf = __riscv_vfsgnjn ( \
+ pm_inf, I_AS_F (__riscv_vsll ((n), 63, (vlen))), (vlen)); \
+ (vy_special) = __riscv_vmerge ((vy_special), pm_inf, pole, (vlen)); \
+ (special_args) = __riscv_vmor ((special_args), pole, (vlen)); \
+ (P_SIN) = __riscv_vmerge ((P_SIN), 0x8000, pole, (vlen)); \
+ } \
+ } \
+ while (0)
+
+#define V_NAME_FUNCTION(lmul, simdlen) \
+ VFLOAT V_NAME_D1 (lmul, simdlen, tgamma) (VFLOAT x) \
+ { \
+ size_t vlen; \
+ VFLOAT vx, vx_orig, vy, vy_special; \
+ VBOOL special_args; \
+ \
+ SET_ROUNDTONEAREST; \
+ /* stripmining over input arguments */ \
+ vlen = VSET (simdlen); \
+ vx = x; \
+ \
+ /* Handle Inf and NaN and |vx| < 2^(-60) */ \
+ EXCEPTION_HANDLING_TGAMMA (vx, special_args, vy_special, vlen); \
+ vx_orig = vx; \
+ \
+ vx = __riscv_vfabs (vx, vlen); \
+ vx = __riscv_vfmin (vx, 0x1.67p+7, vlen); \
+ vx_orig = __riscv_vfsgnj (vx, vx_orig, vlen); \
+ \
+ VFLOAT vx_hi = vx; \
+ VFLOAT vx_lo = VFMV_VF (fp_posZero, vlen); \
+ VBOOL x_lt_0 = __riscv_vmflt (vx_orig, fp_posZero, vlen); \
+ \
+ /* VINT P_SIN, SIN_scale, lsb; \
+ // TGAMMA_SIN(vx_orig, P_SIN, SIN_scale, lsb, vy_special, special_args, \
+ // vlen); */ \
+ \
+ if (__riscv_vcpop (x_lt_0, vlen) > 0) \
+ { \
+ /* add 1 to argument */ \
+ VFLOAT a_tmp_hi = __riscv_vfadd (vx_hi, fp_posOne, vlen); \
+ VFLOAT a_tmp_lo = __riscv_vfrsub (a_tmp_hi, fp_posOne, vlen); \
+ a_tmp_lo = __riscv_vfadd (a_tmp_lo, vx_hi, vlen); \
+ vx_hi = __riscv_vmerge (vx_hi, a_tmp_hi, x_lt_0, vlen); \
+ vx_lo = __riscv_vmerge (vx_lo, a_tmp_lo, x_lt_0, vlen); \
+ } \
+ \
+ VINT n, EXP; \
+ \
+ VINT XF = __riscv_vsll (VMVI_VX (1, vlen), 61, vlen); \
+ VINT XF_scale = VMVI_VX (61, vlen); \
+ \
+ VBOOL x_lt_2 = __riscv_vmflt (vx_hi, 0x1.0p1, vlen); \
+ \
+ if (__riscv_vcpop (x_lt_2, vlen) > 0) \
+ { \
+ /* So x = 2^(-m) val, val is in [1, 2) \
+ // Create fixed point X in scale Q(61+m) */ \
+ VINT m = __riscv_vsra (F_AS_I (vx_hi), MAN_LEN, vlen); \
+ m = __riscv_vrsub (__riscv_vand (m, 0x7FF, vlen), EXP_BIAS, vlen); \
+ /* at this point, m >= 0, x = 2^(-m) val, val in [1, 2) */ \
+ VINT XF_scale_1 = __riscv_vadd (m, 61, vlen); \
+ VINT scale_m = __riscv_vsll ( \
+ __riscv_vadd (XF_scale_1, EXP_BIAS, vlen), MAN_LEN, vlen); \
+ VFLOAT x_tmp = __riscv_vfmul (vx_hi, I_AS_F (scale_m), vlen); \
+ VINT X = __riscv_vfcvt_x (x_tmp, vlen); \
+ /* X is vx_hi in fixed-point, Q(61+m) */ \
+ x_tmp = __riscv_vfmul (vx_lo, I_AS_F (scale_m), vlen); \
+ X = __riscv_vadd (X, __riscv_vfcvt_x (x_tmp, vlen), vlen); \
+ /* X is (vx_hi + vx_lo) in fixed-point, Q(61+m) */ \
+ VINT One_plus_X \
+ = __riscv_vadd (XF, __riscv_vsra (X, I_AS_U (m), vlen), vlen); \
+ /* One_plus_X is 1+x in Q61 */ \
+ VFLOAT b = VFMV_VF (fp_posZero, vlen); \
+ \
+ /* if 1 <= x < 2, gamma(x) = (1/x) gamma(x+1) \
+ // if 0 < x < 1, gamma(x) = (1/(x(x+1))) gamma(x+2) */ \
+ VBOOL x_ge_1 = __riscv_vmfge (vx_hi, fp_posOne, vlen); \
+ VBOOL cond = __riscv_vmand (x_lt_2, x_ge_1, vlen); \
+ /* cond is 1 <= x < 2 */ \
+ XF_scale = __riscv_vmerge (XF_scale, XF_scale_1, cond, vlen); \
+ XF = __riscv_vmerge (XF, X, cond, vlen); \
+ b = __riscv_vfmerge (b, fp_posOne, cond, vlen); \
+ /* at this point, if input x is between [1, 2), XF is x in scale 61+m \
+ // which is 61 (as m is 0). */ \
+ \
+ cond = __riscv_vmandn (x_lt_2, x_ge_1, vlen); \
+ /* cond is 0 < x < 1 */ \
+ X = __riscv_vsmul (X, One_plus_X, 1, vlen); \
+ XF_scale_1 = __riscv_vadd (m, 59, vlen); \
+ XF_scale = __riscv_vmerge (XF_scale, XF_scale_1, cond, vlen); \
+ XF = __riscv_vmerge (XF, X, cond, vlen); \
+ b = __riscv_vfmerge (b, 0x1.0p1, cond, vlen); \
+ /* at this point, XF is either 1, x, or x(x+1) in fixed point \
+ // scale given in XF_scale which is either 62, 61+m, or 59+m */ \
+ \
+ /* now set (vx_hi, vx_lo) to x + b, b = 0, 1, or 2 */ \
+ x_tmp = __riscv_vfadd (b, vx_hi, vlen); \
+ VFLOAT x_tmp2 = __riscv_vfsub (b, x_tmp, vlen); \
+ x_tmp2 = __riscv_vfadd (x_tmp2, vx_hi, vlen); \
+ vx_hi = x_tmp; \
+ vx_lo = __riscv_vfadd (vx_lo, x_tmp2, vlen); \
+ } \
+ \
+ STIRLING_POWER (vx_hi, vx_lo, n, EXP, vlen); \
+ /* Stirling factor is 2^n * e, EXP is e in Q62 */ \
+ \
+ VINT P_SC, Q_SC, Numer_tail, Denom_tail; \
+ STIRLING_CORRECTION (vx_hi, vx_lo, P_SC, Q_SC, vlen); \
+ /* correction term is 2 * P_SC / Q_SC, P_SC is Q78, Q_SC is Q79 */ \
+ \
+ /* 2^(n-61) * EXP * P_SC / Q_SC is gamma(x) for x >= 2 */ \
+ VINT P = __riscv_vsmul (EXP, P_SC, 1, vlen); \
+ /* P is Q77 */ \
+ \
+ /* now incoporate XF into Q_SC */ \
+ VINT Q = __riscv_vsmul (XF, Q_SC, 1, vlen); \
+ /* scale of Q is 79 - 63 + XF_scale = 16 + XF_scale */ \
+ \
+ /* difference is 16 + XF_scale - 77, which is XF_scale - 61 */ \
+ XF_scale = __riscv_vsub (XF_scale, 61, vlen); \
+ n = __riscv_vadd (n, XF_scale, vlen); \
+ /* 2^n P / Q is the answer if input is positive */ \
+ /* For negative input, the answer is the reciprocal times pi/sin(pi x) */ \
+ \
+ VINT Numer = P; \
+ VINT Denom = Q; \
+ VINT vy_sign; \
+ vy_sign = __riscv_vxor (vy_sign, vy_sign, vlen); \
+ \
+ if (__riscv_vcpop (x_lt_0, vlen) > 0) \
+ { \
+ /* we first recipricate and change n to negative n */ \
+ Numer = __riscv_vmerge (Numer, Q, x_lt_0, vlen); \
+ Denom = __riscv_vmerge (Denom, P, x_lt_0, vlen); \
+ \
+ VINT P_SIN, SIN_scale, lsb; \
+ TGAMMA_SIN (vx_orig, P_SIN, SIN_scale, lsb, vy_special, special_args, \
+ vlen); \
+ \
+ vy_sign = __riscv_vmerge (vy_sign, lsb, x_lt_0, vlen); \
+ \
+ P_SIN = __riscv_vsmul (P_SIN, Denom, 1, vlen); \
+ Denom = __riscv_vmerge (Denom, P_SIN, x_lt_0, vlen); \
+ \
+ SIN_scale = __riscv_vsub (SIN_scale, 63, vlen); \
+ VINT n_prime = __riscv_vrsub (n, 0, vlen); \
+ n_prime = __riscv_vadd (n_prime, SIN_scale, vlen); \
+ n = __riscv_vmerge (n, n_prime, x_lt_0, vlen); \
+ } \
+ \
+ VFLOAT numer_hi, numer_lo, denom_hi, denom_lo; \
+ numer_hi = __riscv_vfcvt_f (Numer, vlen); \
+ Numer_tail \
+ = __riscv_vsub (Numer, __riscv_vfcvt_x (numer_hi, vlen), vlen); \
+ numer_lo = __riscv_vfcvt_f (Numer_tail, vlen); \
+ \
+ denom_hi = __riscv_vfcvt_f (Denom, vlen); \
+ Denom_tail \
+ = __riscv_vsub (Denom, __riscv_vfcvt_x (denom_hi, vlen), vlen); \
+ denom_lo = __riscv_vfcvt_f (Denom_tail, vlen); \
+ \
+ DIV_N2D2 (numer_hi, numer_lo, denom_hi, denom_lo, vy, vlen); \
+ FAST_LDEXP (vy, n, vlen); \
+ \
+ vy_sign = __riscv_vsll (vy_sign, 63, vlen); \
+ vy = __riscv_vfsgnjx (vy, I_AS_F (vy_sign), vlen); \
+ \
+ vy = __riscv_vmerge (vy, vy_special, special_args, vlen); \
+ \
+ RESTORE_FRM; \
+ return vy; \
+ }
+
+#undef LMUL
+#define LMUL 1
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+
+#undef LMUL
+#define LMUL 2
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+V_NAME_FUNCTION (LMUL, 2)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+
+#undef LMUL
+#define LMUL 4
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+V_NAME_FUNCTION (LMUL, 4)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+
+#undef LMUL
+#define LMUL 8
+#undef MAKE_VBOOL
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+V_NAME_FUNCTION (LMUL, 8)
+V_NAME_FUNCTION (LMUL, 16)
+V_NAME_FUNCTION (LMUL, 32)
diff --git a/sysdeps/riscv/rvd/v_math.h b/sysdeps/riscv/rvd/v_math.h
new file mode 100644
index 0000000000..65ca8c060b
--- /dev/null
+++ b/sysdeps/riscv/rvd/v_math.h
@@ -0,0 +1,27 @@
+/* Utilities for Advanced SIMD libmvec routines.
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#ifndef _V_MATH_H
+#define _V_MATH_H
+
+#include <riscv_vector.h>
+
+#define V_NAME_D1(lmul, simdlen, fun) _ZGV##lmul##N##simdlen##v_##fun
+#define V_NAME_D2(lmul, simdlen, fun) _ZGV##lmul##N##simdlen##vv_##fun
+
+#endif
diff --git a/sysdeps/riscv/rvd/veclibm/include/rvvlm.h b/sysdeps/riscv/rvd/veclibm/include/rvvlm.h
new file mode 100644
index 0000000000..6507a89bc7
--- /dev/null
+++ b/sysdeps/riscv/rvd/veclibm/include/rvvlm.h
@@ -0,0 +1,538 @@
+/*
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#pragma once
+
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+
+#ifdef __cplusplus
+extern "C"
+{
+#endif
+
+ union sui32_fp32
+ {
+ int32_t si;
+ uint32_t ui;
+ float f;
+ };
+ union sui64_fp64
+ {
+ int64_t si;
+ uint64_t ui;
+ double f;
+ uint32_t ui_hilo[2];
+ };
+
+#define ui_hilo_HI 1
+#define ui_hilo_LO 0
+ // so that union sui64_f64 X will have X.hilo[HI] as the high bits
+ // (containing expoent) and X.hilo[LO] has the lower order bits (containing
+ // the lsb for example)
+
+#define API_SIGNAUTRE_11 1
+#define API_SIGNATURE_21 2
+#define API_SIGNATURE_12 3
+#define API_SIGNATURE_22 4
+
+#define UNIT_STRIDE 1
+#define GENERAL_STRIDE 2
+
+#ifndef FE_TONEAREST
+#define FE_TONEAREST 0x000
+#endif
+
+#define read_frm() \
+ ({ \
+ unsigned long __value; \
+ __asm__ __volatile__ ("frrm %0" : "=r"(__value)::"memory"); \
+ __value; \
+ })
+
+#define write_frm(value) \
+ ({ \
+ unsigned long __value; \
+ __asm__ __volatile__ ("fsrm %0, %1" \
+ : "=r"(__value) \
+ : "r"(value) \
+ : "memory"); \
+ __value; \
+ })
+
+#define SET_ROUNDTONEAREST \
+ int __original_frm = read_frm (); \
+ if (__original_frm != FE_TONEAREST) \
+ { \
+ write_frm (FE_TONEAREST); \
+ }
+
+#define RESTORE_FRM \
+ do \
+ { \
+ if (__original_frm != FE_TONEAREST) \
+ { \
+ write_frm (__original_frm); \
+ } \
+ } \
+ while (0)
+
+#define VSRL_I_AS_U(x, nbits, vlen) \
+ U_AS_I (__riscv_vsrl (I_AS_U ((x)), (nbits), (vlen)))
+
+#define PSTEP(coeff_j, x, poly, vlen) \
+ __riscv_vfmadd ((poly), (x), VFMV_VF ((coeff_j), (vlen)), (vlen))
+
+#define PSTEP_ab(pick_a, coeff_a, coeff_b, x, poly, vlen) \
+ __riscv_vfmadd ((poly), (x), \
+ __riscv_vfmerge (VFMV_VF ((coeff_b), (vlen)), (coeff_a), \
+ (pick_a), (vlen)), \
+ (vlen))
+
+#define PSTEP_I(COEFF_j, X, POLY, vlen) \
+ __riscv_vsadd (__riscv_vsmul ((POLY), (X), 1, (vlen)), (COEFF_j), (vlen))
+
+#define PSTEP_I_SLL(COEFF_j, X, K, POLY, vlen) \
+ __riscv_vsadd ( \
+ __riscv_vsll (__riscv_vsmul ((POLY), (X), 1, (vlen)), (K), (vlen)), \
+ (COEFF_j), (vlen))
+
+#define PSTEP_I_SRA(COEFF_j, X, K, POLY, vlen) \
+ __riscv_vsadd ( \
+ __riscv_vsra (__riscv_vsmul ((POLY), (X), 1, (vlen)), (K), (vlen)), \
+ (COEFF_j), (vlen))
+
+#define PSTEP_I_HI_SRA(COEFF_j, X, K, POLY, vlen) \
+ __riscv_vadd ( \
+ __riscv_vsra (__riscv_vmulh ((POLY), (X), (vlen)), (K), (vlen)), \
+ (COEFF_j), (vlen))
+
+#define PSTEP_I_HI(COEFF_j, X, POLY, vlen) \
+ __riscv_vadd (__riscv_vmulh ((POLY), (X), (vlen)), (COEFF_j), (vlen))
+
+#define PSTEPN_I(COEFF_j, X, POLY, vlen) \
+ __riscv_vrsub (__riscv_vsmul ((POLY), (X), 1, (vlen)), (COEFF_j), (vlen))
+
+#define PSTEP_I_ab(pick_a, COEFF_a, COEFF_b, X, POLY, vlen) \
+ __riscv_vsadd (__riscv_vsmul ((POLY), (X), 1, (vlen)), \
+ __riscv_vmerge (VMVI_VX ((COEFF_b), (vlen)), (COEFF_a), \
+ (pick_a), (vlen)), \
+ (vlen))
+
+#define FAST2SUM(X, Y, S, s, vlen) \
+ do \
+ { \
+ (S) = __riscv_vfadd ((X), (Y), (vlen)); \
+ (s) = __riscv_vfsub ((X), (S), (vlen)); \
+ (s) = __riscv_vfadd ((s), (Y), (vlen)); \
+ } \
+ while (0)
+
+#define POS2SUM(X, Y, S, s, vlen) \
+ do \
+ { \
+ VFLOAT _first = __riscv_vfmax ((X), (Y), (vlen)); \
+ VFLOAT _second = __riscv_vfmin ((X), (Y), (vlen)); \
+ S = __riscv_vfadd ((X), (Y), (vlen)); \
+ s = __riscv_vfadd (__riscv_vfsub (_first, (S), (vlen)), _second, \
+ (vlen)); \
+ } \
+ while (0)
+
+#define KNUTH2SUM(X, Y, S, s, vlen) \
+ do \
+ { \
+ (S) = __riscv_vfadd ((X), (Y), (vlen)); \
+ VFLOAT X_hat = __riscv_vfsub ((S), (Y), (vlen)); \
+ (s) = __riscv_vfadd ( \
+ __riscv_vfsub ((X), X_hat, (vlen)), \
+ __riscv_vfsub ((Y), __riscv_vfsub ((S), X_hat, (vlen)), (vlen)), \
+ (vlen)); \
+ } \
+ while (0)
+
+#define FIX2FLT(X, scale, y_hi, y_lo, vlen) \
+ do \
+ { \
+ (y_hi) = __riscv_vfcvt_f ((X), (vlen)); \
+ (y_lo) = __riscv_vfcvt_f ( \
+ __riscv_vsub ((X), __riscv_vfcvt_x ((y_hi), (vlen)), (vlen)), \
+ (vlen)); \
+ (y_hi) = __riscv_vfmul ((y_hi), (scale), (vlen)); \
+ (y_lo) = __riscv_vfmul ((y_lo), (scale), (vlen)); \
+ } \
+ while (0)
+
+#define FLT2FIX(x_hi, x_lo, scale, Y, vlen) \
+ do \
+ { \
+ (Y) = __riscv_vfcvt_x (__riscv_vfmul ((x_hi), (scale), (vlen)), \
+ (vlen)); \
+ (Y) = __riscv_vadd ( \
+ (Y), \
+ __riscv_vfcvt_x (__riscv_vfmul ((x_lo), (scale), (vlen)), (vlen)), \
+ (vlen)); \
+ } \
+ while (0)
+
+#define PROD_X1Y1(x, y, prod_hi, prod_lo, vlen) \
+ do \
+ { \
+ (prod_hi) = __riscv_vfmul ((x), (y), (vlen)); \
+ (prod_lo) = __riscv_vfmsub ((x), (y), (prod_hi), (vlen)); \
+ } \
+ while (0)
+
+#define PROD_X1Y2(x, y_hi, y_lo, prod_hi, prod_lo, vlen) \
+ do \
+ { \
+ (prod_hi) = __riscv_vfmul ((x), (y_hi), (vlen)); \
+ (prod_lo) = __riscv_vfmsub ((x), (y_hi), (prod_hi), (vlen)); \
+ (prod_lo) = __riscv_vfmacc ((prod_lo), (x), (y_lo), (vlen)); \
+ } \
+ while (0)
+
+#define PROD_X2Y2(x_hi, x_lo, y_hi, y_lo, prod_hi, prod_lo, vlen) \
+ do \
+ { \
+ (prod_hi) = __riscv_vfmul ((x_hi), (y_hi), (vlen)); \
+ (prod_lo) = __riscv_vfmsub ((x_hi), (y_hi), (prod_hi), (vlen)); \
+ (prod_lo) = __riscv_vfmacc ((prod_lo), (x_hi), (y_lo), (vlen)); \
+ (prod_lo) = __riscv_vfmacc ((prod_lo), (x_lo), (y_hi), (vlen)); \
+ } \
+ while (0)
+
+#define SQR_X2(x_hi, x_lo, prod_hi, prod_lo, vlen) \
+ do \
+ { \
+ (prod_hi) = __riscv_vfmul ((x_hi), (x_hi), (vlen)); \
+ (prod_lo) = __riscv_vfmsub ((x_hi), (x_hi), (prod_hi), (vlen)); \
+ (prod_lo) = __riscv_vfmacc ((prod_lo), (x_hi), (x_lo), (vlen)); \
+ (prod_lo) = __riscv_vfmacc ((prod_lo), (x_lo), (x_hi), (vlen)); \
+ } \
+ while (0)
+
+#define DIV_N1D2(numer, denom, delta_d, Q, q, vlen) \
+ do \
+ { \
+ Q = __riscv_vfdiv ((numer), (denom), (vlen)); \
+ q = __riscv_vfnmsub ((Q), (denom), (numer), (vlen)); \
+ q = __riscv_vfnmsac ((q), (Q), (delta_d), (vlen)); \
+ q = __riscv_vfmul (q, __riscv_vfrec7 ((denom), (vlen)), (vlen)); \
+ } \
+ while (0)
+
+#define DIV_N2D2(numer, delta_n, denom, delta_d, Q, vlen) \
+ do \
+ { \
+ VFLOAT _q; \
+ (Q) = __riscv_vfdiv ((numer), (denom), (vlen)); \
+ _q = __riscv_vfnmsub ((Q), (denom), (numer), (vlen)); \
+ _q = __riscv_vfnmsac (_q, (Q), (delta_d), (vlen)); \
+ _q = __riscv_vfadd (_q, (delta_n), (vlen)); \
+ _q = __riscv_vfmul (_q, __riscv_vfrec7 ((denom), (vlen)), (vlen)); \
+ (Q) = __riscv_vfadd ((Q), _q, (vlen)); \
+ } \
+ while (0)
+
+#define DIV2_N2D2(numer, delta_n, denom, delta_d, Q, delta_Q, vlen) \
+ do \
+ { \
+ VFLOAT _q; \
+ (Q) = __riscv_vfdiv ((numer), (denom), (vlen)); \
+ _q = __riscv_vfnmsub ((Q), (denom), (numer), (vlen)); \
+ _q = __riscv_vfnmsac (_q, (Q), (delta_d), (vlen)); \
+ _q = __riscv_vfadd (_q, (delta_n), (vlen)); \
+ (delta_Q) \
+ = __riscv_vfmul (_q, __riscv_vfrec7 ((denom), (vlen)), (vlen)); \
+ } \
+ while (0)
+
+#define ACC_DIV2_N1D2(numer, denom, delta_d, Q, delta_Q, vlen) \
+ do \
+ { \
+ VFLOAT _recip, _q; \
+ _recip = __riscv_vfrdiv ((denom), 0x1.0p0, (vlen)); \
+ (Q) = __riscv_vfmul ((numer), _recip, (vlen)); \
+ _q = __riscv_vfnmsub ((Q), (denom), (numer), (vlen)); \
+ _q = __riscv_vfnmsac (_q, (Q), (delta_d), (vlen)); \
+ (delta_Q) = __riscv_vfmul (_q, _recip, (vlen)); \
+ } \
+ while (0)
+
+#define ACC_DIV2_N2D2(numer, delta_n, denom, delta_d, Q, delta_Q, vlen) \
+ do \
+ { \
+ VFLOAT _recip, _q; \
+ _recip = __riscv_vfrdiv ((denom), 0x1.0p0, (vlen)); \
+ (Q) = __riscv_vfmul ((numer), _recip, (vlen)); \
+ _q = __riscv_vfnmsub ((Q), (denom), (numer), (vlen)); \
+ _q = __riscv_vfnmsac (_q, (Q), (delta_d), (vlen)); \
+ _q = __riscv_vfadd (_q, (delta_n), (vlen)); \
+ (delta_Q) = __riscv_vfmul (_q, _recip, (vlen)); \
+ } \
+ while (0)
+
+#define SQRT2_X2(x, delta_x, r, delta_r, vlen) \
+ do \
+ { \
+ VFLOAT xx = __riscv_vfadd ((x), (delta_x), (vlen)); \
+ VBOOL x_eq_0 = __riscv_vmfeq (xx, fp_posZero, (vlen)); \
+ xx = __riscv_vfmerge (xx, fp_posOne, x_eq_0, (vlen)); \
+ (r) = __riscv_vfsqrt (xx, (vlen)); \
+ (delta_r) = __riscv_vfnmsub ((r), (r), (x), (vlen)); \
+ (delta_r) = __riscv_vfadd ((delta_r), (delta_x), (vlen)); \
+ (delta_r) \
+ = __riscv_vfmul ((delta_r), __riscv_vfrec7 (xx, (vlen)), (vlen)); \
+ /* (delta_r) = __riscv_vfdiv((delta_r), xx, (vlen)); */ \
+ (delta_r) = __riscv_vfmul ((delta_r), 0x1.0p-1, (vlen)); \
+ (delta_r) = __riscv_vfmul ((delta_r), (r), (vlen)); \
+ (r) = __riscv_vfmerge ((r), fp_posZero, x_eq_0, (vlen)); \
+ (delta_r) = __riscv_vfmerge ((delta_r), fp_posZero, x_eq_0, (vlen)); \
+ } \
+ while (0)
+
+#define IDENTIFY(vclass, stencil, identity_mask, vlen) \
+ identity_mask = __riscv_vmsgtu (__riscv_vand ((vclass), (stencil), (vlen)), \
+ 0, (vlen))
+
+#define FCLIP(vx, x_min, x_max, vlen) \
+ __riscv_vfmin (__riscv_vfmax ((vx), X_MIN, (vlen)), X_MAX, (vlen))
+
+#define FAST_LDEXP(num, exp, vlen) \
+ do \
+ { \
+ VINT _n1 = __riscv_vsra ((exp), 1, (vlen)); \
+ VINT _n2 = __riscv_vsub ((exp), _n1, (vlen)); \
+ _n1 = __riscv_vsll (_n1, MAN_LEN, (vlen)); \
+ (num) = I_AS_F (__riscv_vadd (F_AS_I ((num)), _n1, (vlen))); \
+ _n2 = __riscv_vadd (_n2, EXP_BIAS, (vlen)); \
+ _n2 = __riscv_vsll (_n2, MAN_LEN, (vlen)); \
+ (num) = __riscv_vfmul ((num), I_AS_F (_n2), (vlen)); \
+ } \
+ while (0)
+
+// Some of the functions have multiple implementations using different
+// algorithms or styles. The following configure the name of each of these
+// variations, thus allowing one to be set to the standard libm name.
+
+// FP64 acos function configuration
+#define RVVLM_ACOSD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_ACOSDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_ACOSPID_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_ACOSPIDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 asin function configuration
+#define RVVLM_ASIND_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_ASINDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_ASINPID_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_ASINPIDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 atan function configuration
+#define RVVLM_ATAND_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_ATANDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_ATANPID_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_ATANPIDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 atan2 function configuration
+#define RVVLM_ATAN2D_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_ATAN2DI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_ATAN2PID_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_ATAN2PIDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 acosh function configuration
+#define RVVLM_ACOSHD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_ACOSHDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 asinh function configuration
+#define RVVLM_ASINHD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_ASINHDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 atanh function configuration
+#define RVVLM_ATANHD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_ATANHDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 cbrt function configuration
+#define RVVLM_CBRTD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_CBRTDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 cdfnorm function configuration
+#define RVVLM_CDFNORMD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_CDFNORMDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 cdfnorminv function configuration
+#define RVVLM_CDFNORMINVD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_CDFNORMINVDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 erf function configuration
+#define RVVLM_ERFD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_ERFDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 erfc function configuration
+#define RVVLM_ERFCD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_ERFCDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 erfinv function configuration
+#define RVVLM_ERFINVD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_ERFINVDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 erfcinv function configuration
+#define RVVLM_ERFCINVD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_ERFCINVDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 exp function configuration
+#define RVVLM_EXPD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_EXPDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 exp2 function configuration
+#define RVVLM_EXP2D_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_EXP2DI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 exp10 function configuration
+#define RVVLM_EXP10D_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_EXP10DI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 expm1 function configuration
+#define RVVLM_EXPM1D_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_EXPM1DI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 expint1 function configuration
+#define RVVLM_EXPINT1D_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_EXPINT1DI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 log function configuration
+#define RVVLM_LOGD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_LOGDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_LOG2D_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_LOG2DI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_LOG10D_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_LOG10DI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 log1p function configuration
+#define RVVLM_LOG1PD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_LOG1PDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 pow function configuration
+#define RVVLM_POWD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_POWDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 cos function configuration
+#define RVVLM_COSD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_COSDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_COSPID_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_COSPIDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 sin function configuration
+#define RVVLM_SIND_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_SINDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_SINPID_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_SINPIDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 sincos function configuration
+#define RVVLM_SINCOSD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_SINCOSDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_SINCOSPID_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_SINCOSPIDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 tan function configuration
+#define RVVLM_TAND_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_TANDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_TANPID_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_TANPIDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 lgamma function configuration
+#define RVVLM_LGAMMAD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_LGAMMADI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 tgamma function configuration
+#define RVVLM_TGAMMAD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_TGAMMADI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 cosh function configuration
+#define RVVLM_COSHD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_COSHDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 sinh function configuration
+#define RVVLM_SINHD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_SINHDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// FP64 tanh function configuration
+#define RVVLM_TANHD_VSET_CONFIG "rvvlm_fp64m1.h"
+
+#define RVVLM_TANHDI_VSET_CONFIG "rvvlm_fp64m1.h"
+
+// Define the various tables for table-driven implementations
+extern const int64_t expD_tbl64_fixedpt[64];
+extern const int64_t logD_tbl128_fixedpt[128];
+extern const double logtbl_4_powD_128_hi_lo[256];
+extern const double dbl_2ovpi_tbl[28];
+
+#ifdef __cplusplus
+}
+#endif
diff --git a/sysdeps/riscv/rvd/veclibm/include/rvvlm_errorfuncsD.h b/sysdeps/riscv/rvd/veclibm/include/rvvlm_errorfuncsD.h
new file mode 100644
index 0000000000..2f1fd83007
--- /dev/null
+++ b/sysdeps/riscv/rvd/veclibm/include/rvvlm_errorfuncsD.h
@@ -0,0 +1,196 @@
+/*
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+//
+
+#if defined(COMPILE_FOR_ERFC)
+#define Y_AT_posINF fp_posZero
+#define Y_AT_negINF 0x1.0p1
+#elif defined(COMPILE_FOR_ERF)
+#define Y_AT_posINF 0x1.0p0
+#define Y_AT_negINF -0x1.0p0
+#elif defined(COMPILE_FOR_CDFNORM)
+#define Y_AT_posINF 0x1.0p0
+#define Y_AT_negINF fp_posZero
+#else
+static_assert (false, "Must define COMPILE_FOR_{ERFC,ERF,CDFNORM}" __FILE__);
+#endif
+
+#define EXCEPTION_HANDLING(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ IDENTIFY (vclass, class_NaN | class_Inf, (special_args), (vlen)); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ VBOOL id_mask; \
+ IDENTIFY (vclass, class_NaN, id_mask, (vlen)); \
+ (vy_special) = __riscv_vfadd (id_mask, (vx), (vx), (vlen)); \
+ IDENTIFY (vclass, class_posInf, id_mask, (vlen)); \
+ (vy_special) \
+ = __riscv_vfmerge ((vy_special), Y_AT_posINF, id_mask, (vlen)); \
+ IDENTIFY (vclass, class_negInf, id_mask, (vlen)); \
+ (vy_special) \
+ = __riscv_vfmerge ((vy_special), Y_AT_negINF, id_mask, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posZero, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+
+// ALPHA is 2/sqrt(pi), erf'(0) derivative of erf(x) at x=0
+#define ALPHA_HI 0x1.20dd750429b6dp+0
+#define ALPHA_LO 0x1.1ae3a914fed80p-56
+
+#define EXCEPTION_HANDLING_ERF(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ VUINT expo_x = __riscv_vsrl (F_AS_U ((vx)), MAN_LEN, (vlen)); \
+ expo_x = __riscv_vand (expo_x, 0x7FF, (vlen)); \
+ IDENTIFY (vclass, class_NaN | class_Inf, (special_args), (vlen)); \
+ VBOOL expo_small = __riscv_vmsltu (expo_x, EXP_BIAS - 30, (vlen)); \
+ (special_args) = __riscv_vmor ((special_args), expo_small, (vlen)); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ VBOOL id_mask; \
+ IDENTIFY (vclass, class_NaN, id_mask, (vlen)); \
+ (vy_special) = __riscv_vfadd (id_mask, (vx), (vx), (vlen)); \
+ IDENTIFY (vclass, class_posInf, id_mask, (vlen)); \
+ (vy_special) \
+ = __riscv_vfmerge ((vy_special), Y_AT_posINF, id_mask, (vlen)); \
+ IDENTIFY (vclass, class_negInf, id_mask, (vlen)); \
+ (vy_special) \
+ = __riscv_vfmerge ((vy_special), Y_AT_negINF, id_mask, (vlen)); \
+ VFLOAT vy_small \
+ = __riscv_vfmul (expo_small, (vx), ALPHA_LO, (vlen)); \
+ vy_small = __riscv_vfmacc (expo_small, vy_small, ALPHA_HI, (vx), \
+ (vlen)); \
+ (vy_special) \
+ = __riscv_vmerge ((vy_special), vy_small, expo_small, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posZero, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+
+#define LOG2_HI 0x1.62e42fefa39efp-1
+#define LOG2_LO 0x1.abc9e3b39803fp-56
+#define NEG_LOG2_INV -0x1.71547652b82fep+0
+
+// compute exp(-A*B) as 2^n * Z, Z is a Q62 fixed-point value
+// A, B are non-negative, and |A*B| <= 1200 log(2)
+#define EXP_negAB(va, vb, n, Z, vlen) \
+ do \
+ { \
+ VFLOAT r = __riscv_vfmul ((va), (vb), (vlen)); \
+ VFLOAT delta_r = __riscv_vfmsub ((va), (vb), r, (vlen)); \
+ VFLOAT n_flt = __riscv_vfmul (r, NEG_LOG2_INV, (vlen)); \
+ (n) = __riscv_vfcvt_x (n_flt, (vlen)); \
+ n_flt = __riscv_vfcvt_f ((n), (vlen)); \
+ r = __riscv_vfnmacc (r, LOG2_HI, n_flt, (vlen)); \
+ delta_r = __riscv_vfnmacc (delta_r, LOG2_LO, n_flt, (vlen)); \
+ VINT R = __riscv_vfcvt_x (__riscv_vfmul (r, 0x1.0p63, (vlen)), (vlen)); \
+ VINT DELTA_R = __riscv_vfcvt_x ( \
+ __riscv_vfmul (delta_r, 0x1.0p63, (vlen)), (vlen)); \
+ R = __riscv_vadd (R, DELTA_R, (vlen)); \
+ VINT P_RIGHT = PSTEP_I ( \
+ 0x16c16c185646e2, R, \
+ PSTEP_I (0x3403401a3f740, R, \
+ PSTEP_I (0x680665cc2958, R, \
+ PSTEP_I (0xb8efdcde680, R, \
+ PSTEP_I (0x128afc94c08, 0x1acc4c50c4, R, \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ \
+ VINT RSQ = __riscv_vsmul (R, R, 1, (vlen)); \
+ VINT R6 = __riscv_vsmul (RSQ, RSQ, 1, (vlen)); \
+ R6 = __riscv_vsmul (R6, RSQ, 1, (vlen)); \
+ \
+ VINT P_LEFT = PSTEP_I ( \
+ 0x4000000000000000, R, \
+ PSTEP_I (0x40000000000000ed, R, \
+ PSTEP_I (0x2000000000001659, R, \
+ PSTEP_I (0xaaaaaaaaaaa201b, R, \
+ PSTEP_I (0x2aaaaaaaaa03367, \
+ 0x888888889fe9c4, R, vlen), \
+ vlen), \
+ vlen), \
+ vlen), \
+ vlen); \
+ P_RIGHT = __riscv_vsmul (P_RIGHT, R6, 1, (vlen)); \
+ Z = __riscv_vadd (P_LEFT, P_RIGHT, (vlen)); \
+ } \
+ while (0)
+
+// Transform x into (x-a)/(x+b) return as Q63 fixed point
+// x is non-negative and x < 32; the result is strictly below 1
+// in magnitude and thus we can use Q63 fixed point
+// On input, we have x, -2^63 a, and b in floating point
+// Both a and b are scalar between 3 and 5 and just a few bits
+// thus we can use fast sum with a and b as the dominant term
+// to get 2^63 x + neg_a_scaled, and x + b to extra precision
+#define X_TRANSFORM(vx, neg_a_scaled, b, R, vlen) \
+ do \
+ { \
+ VFLOAT numer, d_numer, denom, d_denom; \
+ denom = __riscv_vfadd ((vx), (b), (vlen)); \
+ d_denom = __riscv_vfrsub (denom, (b), (vlen)); \
+ d_denom = __riscv_vfadd (d_denom, (vx), (vlen)); \
+ VFLOAT one = VFMV_VF (fp_posOne, (vlen)); \
+ VFLOAT recip, d_recip; \
+ DIV_N1D2 (one, denom, d_denom, recip, d_recip, (vlen)); \
+ numer = __riscv_vfmul ((vx), 0x1.0p63, (vlen)); \
+ numer = __riscv_vfadd (numer, (neg_a_scaled), (vlen)); \
+ d_numer = __riscv_vfrsub (numer, (neg_a_scaled), (vlen)); \
+ d_numer = __riscv_vfmacc (d_numer, 0x1.0p63, (vx), (vlen)); \
+ /* (numer + d_numer) * (recip + d_recip) */ \
+ VFLOAT r, d_r; \
+ r = __riscv_vfmul (numer, recip, (vlen)); \
+ d_r = __riscv_vfmsub (numer, recip, r, (vlen)); \
+ d_r = __riscv_vfmacc (d_r, numer, d_recip, (vlen)); \
+ d_r = __riscv_vfmacc (d_r, d_numer, recip, (vlen)); \
+ (R) = __riscv_vfcvt_x (r, (vlen)); \
+ (R) = __riscv_vadd ((R), __riscv_vfcvt_x (d_r, (vlen)), (vlen)); \
+ } \
+ while (0)
+
+// Compute 1/(1+2x) as Q_m, m >= 62 fixed point. x >= 0
+// If x < 1, m is 62, otherwise, m is 62+k+1, 2^k <= x < 2^(k+1)
+#define RECIP_SCALE(vx, B, m, vlen) \
+ do \
+ { \
+ VFLOAT one = VFMV_VF (fp_posOne, (vlen)); \
+ VFLOAT denom = __riscv_vfmadd ((vx), 0x1.0p1, one, (vlen)); \
+ VFLOAT d_denom = __riscv_vfsub (one, denom, (vlen)); \
+ d_denom = __riscv_vfmacc (d_denom, 0x1.0p1, (vx), (vlen)); \
+ VFLOAT recip, d_recip; \
+ DIV_N1D2 (one, denom, d_denom, recip, d_recip, (vlen)); \
+ (m) = __riscv_vsra (F_AS_I ((vx)), MAN_LEN, (vlen)); \
+ (m) = __riscv_vmax ((m), EXP_BIAS - 1, (vlen)); \
+ (m) = __riscv_vadd ((m), 63, (vlen)); \
+ VFLOAT scale = I_AS_F (__riscv_vsll ((m), MAN_LEN, (vlen))); \
+ (m) = __riscv_vsub ((m), EXP_BIAS, (vlen)); \
+ (B) = __riscv_vfcvt_x (__riscv_vfmul (recip, scale, (vlen)), (vlen)); \
+ d_recip = __riscv_vfmul (d_recip, scale, (vlen)); \
+ (B) = __riscv_vadd ((B), __riscv_vfcvt_x (d_recip, (vlen)), (vlen)); \
+ } \
+ while (0)
diff --git a/sysdeps/riscv/rvd/veclibm/include/rvvlm_fp.inc.h b/sysdeps/riscv/rvd/veclibm/include/rvvlm_fp.inc.h
new file mode 100644
index 0000000000..3593baad52
--- /dev/null
+++ b/sysdeps/riscv/rvd/veclibm/include/rvvlm_fp.inc.h
@@ -0,0 +1,273 @@
+/*
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+//
+
+#ifndef __VECLIBM_RVVLM_FP_INC_H
+#define __VECLIBM_RVVLM_FP_INC_H
+#else
+#warning "are you sure you want to include this file multiple times?"
+#endif
+
+#include <assert.h>
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#define _NEED_UNDEF_GNU_SOURCE
+#endif
+
+#ifndef LMUL
+static_assert (false, "Must assign an LMUL before including " __FILE__);
+#endif
+#ifndef BIT_WIDTH
+static_assert (false, "Must assign BIT_WIDTH before including " __FILE__);
+#endif
+#ifndef API_SIGNATURE
+static_assert (false, "Must assign API_SIGNATURE before including " __FILE__);
+#endif
+#ifndef STRIDE
+static_assert (false, "Must assign STRIDE before including " __FILE__);
+#endif
+
+#include <math.h>
+#ifndef NAN
+_Static_assert (0, "NaN not available on this architecture")
+#endif
+
+#define __PASTE2_BASE(A, B) A##B
+#define __PASTE2(A, B) __PASTE2_BASE (A, B)
+#define __PASTE3_BASE(A, B, C) A##B##C
+#define __PASTE3(A, B, C) __PASTE3_BASE (A, B, C)
+#define __PASTE4_BASE(A, B, C, D) A##B##C##D
+#define __PASTE4(A, B, C, D) __PASTE4_BASE (A, B, C, D)
+#define __PASTE5_BASE(A, B, C, D, E) A##B##C##D##E
+#define __PASTE5(A, B, C, D, E) __PASTE5_BASE (A, B, C, D, E)
+#define __PASTE6_BASE(A, B, C, D, E, F) A##B##C##D##E##F
+#define __PASTE6(A, B, C, D, E, F) __PASTE5_BASE (A, B, C, D, E, F)
+
+#define MAKE_VTYPE(A) __PASTE3 (A, BIT_WIDTH, __PASTE3 (m, LMUL, _t))
+#define MAKE_TYPE(A) __PASTE3 (A, BIT_WIDTH, _t)
+#define MAKE_FUNC(A) __PASTE3 (A, BIT_WIDTH, __PASTE2 (m, LMUL))
+#define MAKE_VLOAD(A) \
+ __PASTE3 (__PASTE3 (__riscv_vle, BIT_WIDTH, _v_), A, \
+ __PASTE3 (BIT_WIDTH, m, LMUL))
+#define MAKE_VSLOAD(A) \
+ __PASTE3 (__PASTE3 (__riscv_vlse, BIT_WIDTH, _v_), A, \
+ __PASTE3 (BIT_WIDTH, m, LMUL))
+
+#if (BIT_WIDTH == 64)
+#define NATIVE_TYPE double
+#define TYPE_SIZE 8
+#else
+static_assert (false, "requested BIT_WIDTH unsupported" __FILE__);
+#endif
+
+#define API_11_US \
+ size_t _inarg_n, const NATIVE_TYPE *_inarg1, NATIVE_TYPE *_outarg1
+#define API_11_GS \
+ size_t _inarg_n, const NATIVE_TYPE *_inarg1, size_t _inarg1_stride, \
+ NATIVE_TYPE *_outarg1, size_t _outarg1_stride
+#define API_21_US \
+ size_t _inarg_n, const NATIVE_TYPE *_inarg1, const NATIVE_TYPE *_inarg2, \
+ NATIVE_TYPE *_outarg1
+#define API_21_GS \
+ size_t _inarg_n, const NATIVE_TYPE *_inarg1, size_t _inarg1_stride, \
+ const NATIVE_TYPE *_inarg2, size_t _inarg2_stride, \
+ NATIVE_TYPE *_outarg1, size_t _outarg1_stride
+#define API_12_US \
+ size_t _inarg_n, const NATIVE_TYPE *_inarg1, NATIVE_TYPE *_outarg1, \
+ NATIVE_TYPE *_outarg2
+#define API_12_GS \
+ size_t _inarg_n, const NATIVE_TYPE *_inarg1, size_t _inarg1_stride, \
+ NATIVE_TYPE *_outarg1, size_t _outarg1_stride, NATIVE_TYPE *_outarg2, \
+ size_t _outarg2_stride
+
+#if (API_SIGNATURE == API_SIGNATURE_11)
+#if (STRIDE == UNIT_STRIDE)
+#define API API_11_US
+#else
+#define API API_11_GS
+#endif
+#elif (API_SIGNATURE == API_SIGNATURE_21)
+#if (STRIDE == UNIT_STRIDE)
+#define API API_21_US
+#else
+#define API API_21_GS
+#endif
+#elif (API_SIGNATURE == API_SIGNATURE_12)
+#if (STRIDE == UNIT_STRIDE)
+#define API API_12_US
+#else
+#define API API_12_GS
+#endif
+#else
+static_assert (false, "API_SIGNATURE ill or undefined" __FILE__);
+#endif
+
+#if (STRIDE == UNIT_STRIDE)
+#define VFLOAD_INARG1(vlen) MAKE_VLOAD (f) (_inarg1, (vlen))
+#define VFLOAD_INARG2(vlen) MAKE_VLOAD (f) (_inarg2, (vlen))
+#define VFSTORE_OUTARG1(vy, vlen) \
+ __PASTE2 (__riscv_vse, BIT_WIDTH) (_outarg1, (vy), (vlen))
+#define VFSTORE_OUTARG2(vy, vlen) \
+ __PASTE2 (__riscv_vse, BIT_WIDTH) (_outarg2, (vy), (vlen))
+#define INCREMENT_INARG1(vlen) \
+ do \
+ { \
+ _inarg1 += (vlen); \
+ } \
+ while (0)
+#define INCREMENT_INARG2(vlen) \
+ do \
+ { \
+ _inarg2 += (vlen); \
+ } \
+ while (0)
+#define INCREMENT_OUTARG1(vlen) \
+ do \
+ { \
+ _outarg1 += (vlen); \
+ } \
+ while (0)
+#define INCREMENT_OUTARG2(vlen) \
+ do \
+ { \
+ _outarg2 += (vlen); \
+ } \
+ while (0)
+#else
+#define VFLOAD_INARG1(vlen) \
+ MAKE_VSLOAD (f) (_inarg1, _inarg1_stride * TYPE_SIZE, (vlen))
+#define VFLOAD_INARG2(vlen) \
+ MAKE_VSLOAD (f) (_inarg2, _inarg2_stride * TYPE_SIZE, (vlen))
+#define VFSTORE_OUTARG1(vy, vlen) \
+ __PASTE2 (__riscv_vsse, BIT_WIDTH) \
+ (_outarg1, _outarg1_stride * TYPE_SIZE, (vy), (vlen))
+#define VFSTORE_OUTARG2(vy, vlen) \
+ __PASTE2 (__riscv_vsse, BIT_WIDTH) \
+ (_outarg2, _outarg2_stride * TYPE_SIZE, (vy), (vlen))
+#define INCREMENT_INARG1(vlen) \
+ do \
+ { \
+ _inarg1 += _inarg1_stride * (vlen); \
+ } \
+ while (0)
+#define INCREMENT_INARG2(vlen) \
+ do \
+ { \
+ _inarg2 += _inarg2_stride * (vlen); \
+ } \
+ while (0)
+#define INCREMENT_OUTARG1(vlen) \
+ do \
+ { \
+ _outarg1 += _outarg1_stride * (vlen); \
+ } \
+ while (0)
+#define INCREMENT_OUTARG2(vlen) \
+ do \
+ { \
+ _outarg2 += _outarg2_stride * (vlen); \
+ } \
+ while (0)
+#endif
+
+// For MAKE_VBOOL, the value is 64/LMUL
+#if (LMUL == 1)
+#define MAKE_VBOOL(A) __PASTE3 (A, 64, _t)
+#elif (LMUL == 2)
+#define MAKE_VBOOL(A) __PASTE3 (A, 32, _t)
+#elif (LMUL == 4)
+#define MAKE_VBOOL(A) __PASTE3 (A, 16, _t)
+#elif (LMUL == 8)
+#define MAKE_VBOOL(A) __PASTE3 (A, 8, _t)
+#endif
+#define VSET __PASTE2 (__riscv_vsetvl_e, __PASTE3 (BIT_WIDTH, m, LMUL))
+#define VSE __PASTE2 (__riscv_vse, BIT_WIDTH)
+#define VSSE __PASTE2 (__riscv_vsse, BIT_WIDTH)
+#define MAKE_REINTERPRET(A, B) \
+ __PASTE5 (__riscv_vreinterpret_v_, A, __PASTE4 (BIT_WIDTH, m, LMUL, _), B, \
+ __PASTE3 (BIT_WIDTH, m, LMUL))
+
+#define FLOAT MAKE_TYPE (float)
+#define VFLOAT MAKE_VTYPE (vfloat)
+#define INT MAKE_TYPE (int)
+#define VINT MAKE_VTYPE (vint)
+#define UINT MAKE_TYPE (uint)
+#define VUINT MAKE_VTYPE (vuint)
+#define VBOOL MAKE_VBOOL (vbool)
+
+#define F_AS_I MAKE_REINTERPRET (f, i)
+#define F_AS_U MAKE_REINTERPRET (f, u)
+#define I_AS_F MAKE_REINTERPRET (i, f)
+#define U_AS_F MAKE_REINTERPRET (u, f)
+#define I_AS_U MAKE_REINTERPRET (i, u)
+#define U_AS_I MAKE_REINTERPRET (u, i)
+
+#define VFLOAD MAKE_VLOAD (f)
+#define VILOAD MAKE_VLOAD (i)
+#define VULOAD MAKE_VLOAD (u)
+#define VFSLOAD MAKE_VSLOAD (f)
+#define VMVI_VX MAKE_FUNC (__riscv_vmv_v_x_i)
+#define VMVU_VX MAKE_FUNC (__riscv_vmv_v_x_u)
+#define VFMV_VF MAKE_FUNC (__riscv_vfmv_v_f_f)
+
+ static const INT int_Zero
+ = 0;
+static const UINT uint_Zero = 0;
+
+#if (BIT_WIDTH == 64)
+#define EXP_BIAS 1023
+#define MAN_LEN 52
+static const uint64_t class_sNaN = 0x100;
+static const uint64_t class_qNaN = 0x200;
+static const uint64_t class_NaN = 0x300;
+static const uint64_t class_negInf = 0x1;
+static const uint64_t class_posInf = 0x80;
+static const uint64_t class_Inf = 0x81;
+static const uint64_t class_negZero = 0x8;
+static const uint64_t class_posZero = 0x10;
+static const uint64_t class_Zero = 0x18;
+static const uint64_t class_negDenorm = 0x4;
+static const uint64_t class_posDenorm = 0x20;
+static const uint64_t class_Denorm = 0x24;
+static const uint64_t class_negNormal = 0x2;
+static const uint64_t class_posNormal = 0x40;
+static const uint64_t class_Normal = 0x42;
+static const uint64_t class_negative = 0x7;
+static const uint64_t class_positive = 0xe0;
+static const uint64_t class_finite_neg = 0x06;
+static const uint64_t class_finite_pos = 0x60;
+
+static const double fp_sNaN = __builtin_nans ("");
+static const double fp_qNaN = __builtin_nan ("");
+static const double fp_posInf = __builtin_inf ();
+static const double fp_negInf = -__builtin_inf ();
+static const double fp_negZero = -0.;
+
+static const double fp_posZero = 0.0;
+static const double fp_posOne = 0x1.0p0;
+static const double fp_negOne = -0x1.0p0;
+static const double fp_posHalf = 0x1.0p-1;
+static const double fp_negHalf = -0x1.0p-1;
+#endif
+
+#ifdef _NEED_UNDEF_GNU_SOURCE
+#undef _GNU_SOURCE
+#undef _NEED_UNDEF_GNU_SOURCE
+#endif
diff --git a/sysdeps/riscv/rvd/veclibm/include/rvvlm_fp64m1.h b/sysdeps/riscv/rvd/veclibm/include/rvvlm_fp64m1.h
new file mode 100644
index 0000000000..a70ca44f39
--- /dev/null
+++ b/sysdeps/riscv/rvd/veclibm/include/rvvlm_fp64m1.h
@@ -0,0 +1,26 @@
+/*
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+//
+
+#ifndef __RVVLM_FP64M1_H__
+#define __RVVLM_FP64M1_H__
+#define LMUL 1
+#define BIT_WIDTH 64
+#include "rvvlm_fp.inc.h"
+#endif
diff --git a/sysdeps/riscv/rvd/veclibm/include/rvvlm_fp64m2.h b/sysdeps/riscv/rvd/veclibm/include/rvvlm_fp64m2.h
new file mode 100644
index 0000000000..fba345818a
--- /dev/null
+++ b/sysdeps/riscv/rvd/veclibm/include/rvvlm_fp64m2.h
@@ -0,0 +1,26 @@
+/*
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+//
+
+#ifndef __RVVLM_FP64M2_H__
+#define __RVVLM_FP64M2_H__
+#define LMUL 2
+#define BIT_WIDTH 64
+#include "rvvlm_fp.inc.h"
+#endif
diff --git a/sysdeps/riscv/rvd/veclibm/include/rvvlm_fp64m4.h b/sysdeps/riscv/rvd/veclibm/include/rvvlm_fp64m4.h
new file mode 100644
index 0000000000..25abd57c2f
--- /dev/null
+++ b/sysdeps/riscv/rvd/veclibm/include/rvvlm_fp64m4.h
@@ -0,0 +1,26 @@
+/*
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+//
+
+#ifndef __RVVLM_FP64M4_H__
+#define __RVVLM_FP64M4_H__
+#define LMUL 4
+#define BIT_WIDTH 64
+#include "rvvlm_fp.inc.h"
+#endif
diff --git a/sysdeps/riscv/rvd/veclibm/include/rvvlm_gammafuncsD.h b/sysdeps/riscv/rvd/veclibm/include/rvvlm_gammafuncsD.h
new file mode 100644
index 0000000000..6ec4873574
--- /dev/null
+++ b/sysdeps/riscv/rvd/veclibm/include/rvvlm_gammafuncsD.h
@@ -0,0 +1,48 @@
+/*
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+//
+
+// gamma(+inf) = +inf; gamma(-inf/sNaN) is qNaN with invalid
+// gamma(qNaN) is qNaN
+// gamma(+-0) is +-inf and divide by 0
+// gamma(tiny) is 1/tiny
+#define EXCEPTION_HANDLING_TGAMMA(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT expo_x = __riscv_vand ( \
+ __riscv_vsrl (F_AS_U ((vx)), MAN_LEN, (vlen)), 0x7FF, (vlen)); \
+ VBOOL x_small = __riscv_vmsltu (expo_x, EXP_BIAS - 60, (vlen)); \
+ VBOOL x_InfNaN = __riscv_vmseq (expo_x, 0x7FF, (vlen)); \
+ (special_args) = __riscv_vmor (x_small, x_InfNaN, (vlen)); \
+ if (__riscv_vcpop ((special_args), (vlen)) > 0) \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ VBOOL x_negInf; \
+ IDENTIFY (vclass, class_negInf, x_negInf, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_sNaN, x_negInf, (vlen)); \
+ VFLOAT y_tmp = __riscv_vfadd (x_InfNaN, (vx), (vx), (vlen)); \
+ (vy_special) \
+ = __riscv_vmerge ((vy_special), y_tmp, x_InfNaN, (vlen)); \
+ y_tmp = __riscv_vfrdiv (x_small, (vx), fp_posOne, (vlen)); \
+ (vy_special) \
+ = __riscv_vmerge ((vy_special), y_tmp, x_small, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posOne, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
diff --git a/sysdeps/riscv/rvd/veclibm/include/rvvlm_hyperbolicsD.h b/sysdeps/riscv/rvd/veclibm/include/rvvlm_hyperbolicsD.h
new file mode 100644
index 0000000000..2d1c7f92c1
--- /dev/null
+++ b/sysdeps/riscv/rvd/veclibm/include/rvvlm_hyperbolicsD.h
@@ -0,0 +1,88 @@
+/*
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+//
+
+#if defined(COMPILE_FOR_SINH) || defined(COMPILE_FOR_TANH)
+#define GEN_EXCEPTIONS(special_args, vx, vlen) \
+ __riscv_vfmadd ((special_args), (vx), 0x1.0p-60, (vx), (vlen))
+#else
+#define GEN_EXCEPTIONS(special_args, vx, vlen) \
+ __riscv_vfmadd ((special_args), (vx), (vx), VFMV_VF (0x1.0p0, (vlen)), \
+ (vlen))
+#endif
+
+#if defined(COMPILE_FOR_SINH) || defined(COMPILE_FOR_COSH)
+#define EXCEPTION_HANDLING_HYPER(vx, expo_x, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ VBOOL NaN_Inf; \
+ IDENTIFY (vclass, class_NaN | class_Inf, NaN_Inf, (vlen)); \
+ VBOOL small_x \
+ = __riscv_vmsleu ((expo_x), EXP_BIAS - MAN_LEN - 5, (vlen)); \
+ (special_args) = __riscv_vmor (NaN_Inf, small_x, (vlen)); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ (vy_special) = GEN_EXCEPTIONS ((special_args), (vx), (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posZero, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+#else
+#define EXCEPTION_HANDLING_HYPER(vx, expo_x, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ VBOOL NaN_Inf; \
+ IDENTIFY (vclass, class_NaN | class_Inf, NaN_Inf, (vlen)); \
+ VBOOL small_x \
+ = __riscv_vmsleu ((expo_x), EXP_BIAS - MAN_LEN - 5, (vlen)); \
+ (special_args) = __riscv_vmor (NaN_Inf, small_x, (vlen)); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ (vy_special) = GEN_EXCEPTIONS ((special_args), (vx), (vlen)); \
+ VBOOL Inf; \
+ IDENTIFY (vclass, class_Inf, Inf, (vlen)); \
+ VFLOAT one = VFMV_VF (fp_posOne, (vlen)); \
+ one = __riscv_vfsgnj (one, (vx), (vlen)); \
+ (vy_special) = __riscv_vmerge ((vy_special), one, Inf, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posZero, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+#endif
+
+#define LOG2_INV 0x1.71547652b82fep+0
+#define LOG2_HI 0x1.62e42fefa39efp-1
+#define LOG2_LO 0x1.abc9e3b39803fp-56
+
+#define ARGUMENT_REDUCTION(vx, n, r, r_delta, vlen) \
+ do \
+ { \
+ VFLOAT n_flt = __riscv_vfmul ((vx), LOG2_INV, (vlen)); \
+ (n) = __riscv_vfcvt_x (n_flt, (vlen)); \
+ n_flt = __riscv_vfcvt_f ((n), (vlen)); \
+ (r_delta) = __riscv_vfnmsac ((vx), LOG2_HI, n_flt, (vlen)); \
+ (r) = __riscv_vfnmsac ((r_delta), LOG2_LO, n_flt, (vlen)); \
+ (r_delta) = __riscv_vfsub ((r_delta), (r), (vlen)); \
+ (r_delta) = __riscv_vfnmsac ((r_delta), LOG2_LO, n_flt, (vlen)); \
+ } \
+ while (0)
diff --git a/sysdeps/riscv/rvd/veclibm/include/rvvlm_inverrorfuncsD.h b/sysdeps/riscv/rvd/veclibm/include/rvvlm_inverrorfuncsD.h
new file mode 100644
index 0000000000..02ade3e8da
--- /dev/null
+++ b/sysdeps/riscv/rvd/veclibm/include/rvvlm_inverrorfuncsD.h
@@ -0,0 +1,451 @@
+/*
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+//
+
+#define RTPI_BY2_HI 0x1.c5bf891b4ef6ap-1
+#define RTPI_BY2_LO 0x1.4f38760a41abbp-54
+#define NEG_LOG2_HI -0x1.62e42fefa4000p-1
+#define NEG_LOG2_LO 0x1.8432a1b0e2634p-43
+
+// P_coefficients in asending order, in varying scales. p0_delta is in floating
+// point
+#define P_tiny_0 -0x8593442e139d // scale 66
+#define P_tiny_1 -0x1fcf7055ac5f03 // scale 64
+#define P_tiny_2 -0x106dde33d8dc179 // scale 61
+#define P_tiny_3 -0xc31d6e09935118a // scale 60
+#define P_tiny_4 -0x3560de73cb5bbcc0 // scale 59
+#define P_tiny_5 -0x1eb4c7e14b254de8 // scale 57
+#define P_tiny_6 0x1fdf5a9d23430bd7 // scale 56
+#define P_tiny_7 0x62c4020631de121b // scale 56
+#define P_tiny_8 0x4be1ed5d031773f1 // scale 58
+#define P_tiny_9 0x55a9f8b9538981a1 // scale 60
+#define DELTA_P0_tiny 0x1.ba4d0b79d16e6p-2 // scale 66
+
+// Q_coefficients in asending order, in varying scales. q0_delta is in floating
+// point
+#define Q_tiny_0 -0x85933cda2d6d // sacle 66
+#define Q_tiny_1 -0x1fcf792da7d51d // sacle 64
+#define Q_tiny_2 -0x106ec6e0ed13ae1 // sacle 61
+#define Q_tiny_3 -0x61b925a39a461aa // sacle 59
+#define Q_tiny_4 -0x35ebf9dc72fab062 // sacle 59
+#define Q_tiny_5 -0x2131cf7760e82873 // sacle 57
+#define Q_tiny_6 0x1860ae67db2a6609 // sacle 56
+#define Q_tiny_7 0x5e9a123701d89289 // sacle 56
+#define Q_tiny_8 0x417b35aab14ac49d // sacle 56
+#define Q_tiny_9 0x5e4a26a7c1415755 // sacle 57
+#define DELTA_Q0_tiny 0x1.8a7adad44d65ap-4 // scale 66
+
+#if defined(COMPILE_FOR_ERFCINV)
+// Using [P,Q]_tiny_[HI,LO]_k, HI in Q50, LO in Q84
+#define P_tiny_HI_0 -0x8593442eL
+#define P_tiny_LO_0 -0x4e7245b3L
+#define P_tiny_HI_1 -0x7f3dc156b1L
+#define P_tiny_LO_1 -0x1f0300096L
+#define P_tiny_HI_2 -0x20dbbc67b1b8L
+#define P_tiny_LO_2 -0xbc59b742L
+#define P_tiny_HI_3 -0x30c75b8264d44L
+#define P_tiny_LO_3 -0x18a421ab9L
+#define P_tiny_HI_4 -0x1ab06f39e5addeL
+#define P_tiny_LO_4 -0x180f2a477L
+#define P_tiny_HI_5 -0x3d698fc2964a9cL
+#define P_tiny_LO_5 0xc3d4ab0bL
+#define P_tiny_HI_6 0x7f7d6a748d0c2fL
+#define P_tiny_LO_6 0x1729754e9L
+#define P_tiny_HI_7 0x18b100818c77848L
+#define P_tiny_LO_7 0x1aca73439L
+#define P_tiny_HI_8 0x4be1ed5d031774L
+#define P_tiny_LO_8 -0x3b6c5afbL
+#define P_tiny_HI_9 0x156a7e2e54e260L
+#define P_tiny_LO_9 0x1a0c336beL
+
+#define Q_tiny_HI_0 -0x85933cdaL
+#define Q_tiny_LO_0 -0xb5b39d61L
+#define Q_tiny_HI_1 -0x7f3de4b69fL
+#define Q_tiny_LO_1 -0x151d1cd35L
+#define Q_tiny_HI_2 -0x20dd8dc1da27L
+#define Q_tiny_LO_2 -0x1706945d7L
+#define Q_tiny_HI_3 -0x30dc92d1cd231L
+#define Q_tiny_LO_3 0xabde03f9L
+#define Q_tiny_HI_4 -0x1af5fcee397d58L
+#define Q_tiny_LO_4 -0xc3530d28L
+#define Q_tiny_HI_5 -0x42639eeec1d051L
+#define Q_tiny_LO_5 0x662b41ecL
+#define Q_tiny_HI_6 0x6182b99f6ca998L
+#define Q_tiny_LO_6 0x938a5e35L
+#define Q_tiny_HI_7 0x17a6848dc07624aL
+#define Q_tiny_LO_7 0x8a0484b7L
+#define Q_tiny_HI_8 0x105ecd6aac52b12L
+#define Q_tiny_LO_8 0x1d1e38258L
+#define Q_tiny_HI_9 0xbc944d4f8282afL
+#define Q_tiny_LO_9 -0x155b50b48L
+#endif
+
+#if defined(COMPILE_FOR_CDFNORMINV)
+// Using [P,Q]_tiny_[HI,LO]_k, HI in Q50, LO in Q84
+#define P_tiny_HI_0 -0xbce768cfL
+#define P_tiny_LO_0 -0x6824d442L
+#define P_tiny_HI_1 -0xb3f23f158aL
+#define P_tiny_LO_1 0x120e225b6L
+#define P_tiny_HI_2 -0x2e77fdb703eaL
+#define P_tiny_LO_2 -0x1e1d72461L
+#define P_tiny_HI_3 -0x44fbca4f8507eL
+#define P_tiny_LO_3 -0xd2fb9bf1L
+#define P_tiny_HI_4 -0x25be85812224dcL
+#define P_tiny_LO_4 -0x14663c6d2L
+#define P_tiny_HI_5 -0x56d9a544fd76f0L
+#define P_tiny_LO_5 -0x1e3fd12d9L
+#define P_tiny_HI_6 0xb44c46b00008ccL
+#define P_tiny_LO_6 0x123f14b79L
+#define P_tiny_HI_7 0x22eb3f29425cc2dL
+#define P_tiny_LO_7 -0x1f47840b1L
+#define P_tiny_HI_8 0x6b5068e2aa0bc1L
+#define P_tiny_LO_8 -0xd830044aL
+#define P_tiny_HI_9 0x1e496a7253435eL
+#define P_tiny_LO_9 -0xf06a1c9L
+
+#define Q_tiny_HI_0 -0x85933cdaL
+#define Q_tiny_LO_0 -0xb5b39d61L
+#define Q_tiny_HI_1 -0x7f3de4b69fL
+#define Q_tiny_LO_1 -0x151d1cd35L
+#define Q_tiny_HI_2 -0x20dd8dc1da27L
+#define Q_tiny_LO_2 -0x1706945d7L
+#define Q_tiny_HI_3 -0x30dc92d1cd231L
+#define Q_tiny_LO_3 0xabde03f9L
+#define Q_tiny_HI_4 -0x1af5fcee397d58L
+#define Q_tiny_LO_4 -0xc3530d28L
+#define Q_tiny_HI_5 -0x42639eeec1d051L
+#define Q_tiny_LO_5 0x662b41ecL
+#define Q_tiny_HI_6 0x6182b99f6ca998L
+#define Q_tiny_LO_6 0x938a5e35L
+#define Q_tiny_HI_7 0x17a6848dc07624aL
+#define Q_tiny_LO_7 0x8a0484b7L
+#define Q_tiny_HI_8 0x105ecd6aac52b12L
+#define Q_tiny_LO_8 0x1d1e38258L
+#define Q_tiny_HI_9 0xbc944d4f8282afL
+#define Q_tiny_LO_9 -0x155b50b48L
+#endif
+
+// erfinv(+-1) = +-Inf with divide by zero
+// erfinv(x) |x| > 1, real is NaN with invalid
+// erfinv(NaN) is NaN, invalid if input is signalling NaN
+// erfinv(x) is (2/rt(pi)) x for |x| < 2^-30
+#define EXCEPTION_HANDLING_ERFINV(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT expo_x = __riscv_vand ( \
+ __riscv_vsrl (F_AS_U ((vx)), MAN_LEN, (vlen)), 0x7FF, (vlen)); \
+ VBOOL x_large = __riscv_vmsgeu (expo_x, EXP_BIAS, (vlen)); \
+ VBOOL x_small = __riscv_vmsltu (expo_x, EXP_BIAS - 30, (vlen)); \
+ (special_args) = __riscv_vmor (x_large, x_small, (vlen)); \
+ if (__riscv_vcpop ((special_args), (vlen)) > 0) \
+ { \
+ VFLOAT abs_x = __riscv_vfsgnj ((vx), fp_posOne, (vlen)); \
+ VBOOL x_gt_1 = __riscv_vmfgt (abs_x, fp_posOne, (vlen)); \
+ VBOOL x_eq_1 = __riscv_vmfeq (abs_x, fp_posOne, (vlen)); \
+ /* substitute |x| > 1 with sNaN */ \
+ (vx) = __riscv_vfmerge ((vx), fp_sNaN, x_gt_1, (vlen)); \
+ /* substitute |x| = 1 with +/-Inf and generate div-by-zero signal \
+ */ \
+ VFLOAT tmp = VFMV_VF (fp_posZero, (vlen)); \
+ tmp = __riscv_vfsgnj (tmp, (vx), (vlen)); \
+ tmp = __riscv_vfrec7 (x_eq_1, tmp, (vlen)); \
+ (vy_special) = __riscv_vfadd ((special_args), (vx), (vx), (vlen)); \
+ (vy_special) = __riscv_vmerge ((vy_special), tmp, x_eq_1, (vlen)); \
+ tmp = __riscv_vfmul (x_small, (vx), RTPI_BY2_LO, (vlen)); \
+ tmp = __riscv_vfmacc (x_small, tmp, RTPI_BY2_HI, (vx), (vlen)); \
+ (vy_special) = __riscv_vmerge ((vy_special), tmp, x_small, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posZero, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+
+// erfcinv(0) = Inf, erfcinv(2) = -Inf with divide by zero
+// erfcinv(x) x outside [0, 2], real is NaN with invalid
+// erfcinv(NaN) is NaN, invalid if input is signalling NaN
+#define EXCEPTION_HANDLING_ERFCINV(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ IDENTIFY (vclass, 0x39F, (special_args), (vlen)); \
+ VBOOL x_ge_2 = __riscv_vmfge ((vx), 0x1.0p1, (vlen)); \
+ (special_args) = __riscv_vmor ((special_args), x_ge_2, (vlen)); \
+ if (__riscv_vcpop ((special_args), (vlen)) > 0) \
+ { \
+ VBOOL x_gt_2 = __riscv_vmfgt ((vx), 0x1.0p1, (vlen)); \
+ VBOOL x_lt_0 = __riscv_vmflt ((vx), fp_posZero, (vlen)); \
+ /* substitute x > 2 or x < 0 with sNaN */ \
+ (vx) = __riscv_vfmerge ((vx), fp_sNaN, x_gt_2, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_sNaN, x_lt_0, (vlen)); \
+ /* substitute x = 0 or 2 with +/-Inf and generate div-by-zero \
+ * signal */ \
+ VFLOAT tmp = VFMV_VF (fp_posZero, (vlen)); \
+ VFLOAT x_tmp = __riscv_vfrsub ((vx), fp_posOne, (vlen)); \
+ tmp = __riscv_vfsgnj (tmp, x_tmp, (vlen)); \
+ VBOOL x_eq_2 = __riscv_vmfeq ((vx), 0x1.0p1, (vlen)); \
+ VBOOL x_eq_0 = __riscv_vmfeq ((vx), fp_posZero, (vlen)); \
+ VBOOL pm_Inf = __riscv_vmor (x_eq_2, x_eq_0, (vlen)); \
+ tmp = __riscv_vfrec7 (pm_Inf, tmp, (vlen)); \
+ (vy_special) = __riscv_vfsub ((special_args), (vx), (vx), (vlen)); \
+ (vy_special) = __riscv_vmerge ((vy_special), tmp, pm_Inf, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posOne, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+
+// cdfnorminv(0) = -Inf, erfcinv(1) = Inf with divide by zero
+// cdfnorminv(x) x outside [0, 1], real is NaN with invalid
+// cdfnorminv(NaN) is NaN, invalid if input is signalling NaN
+#define EXCEPTION_HANDLING_CDFNORMINV(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ IDENTIFY (vclass, 0x39F, (special_args), (vlen)); \
+ VBOOL x_ge_1 = __riscv_vmfge ((vx), fp_posOne, (vlen)); \
+ (special_args) = __riscv_vmor ((special_args), x_ge_1, (vlen)); \
+ if (__riscv_vcpop ((special_args), (vlen)) > 0) \
+ { \
+ VBOOL x_gt_1 = __riscv_vmfgt ((vx), fp_posOne, (vlen)); \
+ VBOOL x_lt_0 = __riscv_vmflt ((vx), fp_posZero, (vlen)); \
+ /* substitute x > 1 or x < 0 with sNaN */ \
+ (vx) = __riscv_vfmerge ((vx), fp_sNaN, x_gt_1, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_sNaN, x_lt_0, (vlen)); \
+ /* substitute x = 0 or 1 with +/-Inf and generate div-by-zero \
+ * signal */ \
+ VFLOAT tmp = VFMV_VF (fp_posZero, (vlen)); \
+ VFLOAT x_tmp = __riscv_vfsub ((vx), 0x1.0p-1, (vlen)); \
+ tmp = __riscv_vfsgnj (tmp, x_tmp, (vlen)); \
+ VBOOL x_eq_1 = __riscv_vmfeq ((vx), fp_posOne, (vlen)); \
+ VBOOL x_eq_0 = __riscv_vmfeq ((vx), fp_posZero, (vlen)); \
+ VBOOL pm_Inf = __riscv_vmor (x_eq_1, x_eq_0, (vlen)); \
+ tmp = __riscv_vfrec7 (pm_Inf, tmp, (vlen)); \
+ (vy_special) = __riscv_vfsub ((special_args), (vx), (vx), (vlen)); \
+ (vy_special) = __riscv_vmerge ((vy_special), tmp, pm_Inf, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), 0x1.0p-1, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+
+// Compute -log(2^(-n_adjust) * x), where x < 1
+#define NEG_LOGX_4_TRANSFORM(vx, n_adjust, y_hi, y_lo, vlen) \
+ do \
+ { \
+ /* work on entire vector register */ \
+ VFLOAT vx_in = (vx); \
+ VINT n = __riscv_vadd ( \
+ __riscv_vsra (F_AS_I (vx_in), MAN_LEN - 8, (vlen)), 0x96, vlen); \
+ n = __riscv_vsub (__riscv_vsra (n, 8, vlen), EXP_BIAS, vlen); \
+ VFLOAT scale = I_AS_F (__riscv_vsll ( \
+ __riscv_vrsub (n, EXP_BIAS, (vlen)), MAN_LEN, (vlen))); \
+ vx_in = __riscv_vfmul (vx_in, scale, (vlen)); \
+ /* x is scaled, and -log(x) is 2 atanh(w/2); w = 2(1-x)/(1+x) */ \
+ n = __riscv_vsub (n, (n_adjust), (vlen)); \
+ VFLOAT n_flt = __riscv_vfcvt_f (n, (vlen)); \
+ VFLOAT numer = __riscv_vfrsub (vx_in, fp_posOne, (vlen)); \
+ /* note that 1-x is exact as 1/2 < x < 2 */ \
+ numer = __riscv_vfadd (numer, numer, (vlen)); \
+ VFLOAT denom = __riscv_vfadd (vx_in, fp_posOne, (vlen)); \
+ VFLOAT delta_denom = __riscv_vfadd ( \
+ __riscv_vfrsub (denom, fp_posOne, (vlen)), vx_in, (vlen)); \
+ /* note that 1 - denom is exact even if denom > 2 */ \
+ /* becase 1 has many trailing zeros */ \
+ VFLOAT r_hi, r_lo, r; \
+ DIV_N1D2 (numer, denom, delta_denom, r_hi, r_lo, (vlen)); \
+ r = __riscv_vfadd (r_hi, r_lo, (vlen)); \
+ /* for the original unscaled x, we have */ \
+ /* -log(x) = -n * log(2) + 2 atanh(-w/2) */ \
+ /* where w = 2(1-x)/(1+x); -w = 2(x-1)/(x+1) */ \
+ VFLOAT A, B; \
+ A = __riscv_vfmadd (n_flt, NEG_LOG2_HI, r_hi, (vlen)); \
+ B = __riscv_vfmsub (n_flt, NEG_LOG2_HI, A, (vlen)); \
+ B = __riscv_vfadd (B, r_hi, (vlen)); \
+ B = __riscv_vfadd (r_lo, B, (vlen)); \
+ VFLOAT rsq = __riscv_vfmul (r, r, (vlen)); \
+ VFLOAT rcube = __riscv_vfmul (rsq, r, (vlen)); \
+ VFLOAT r6 = __riscv_vfmul (rcube, rcube, (vlen)); \
+ VFLOAT poly_right = PSTEP ( \
+ 0x1.74681ff881228p-14, rsq, \
+ PSTEP (0x1.39751be23e4a3p-16, 0x1.30a893993e73dp-18, rsq, vlen), \
+ vlen); \
+ VFLOAT poly_left = PSTEP ( \
+ 0x1.999999996ce82p-7, rsq, \
+ PSTEP (0x1.249249501b1adp-9, 0x1.c71c47e7189f6p-12, rsq, vlen), \
+ vlen); \
+ poly_left = __riscv_vfmacc (poly_left, r6, poly_right, (vlen)); \
+ poly_left = PSTEP (0x1.55555555555dbp-4, rsq, poly_left, (vlen)); \
+ B = __riscv_vfmacc (B, NEG_LOG2_LO, n_flt, (vlen)); \
+ B = __riscv_vfmacc (B, rcube, poly_left, (vlen)); \
+ FAST2SUM (A, B, (y_hi), (y_lo), (vlen)); \
+ /* A + B is -log(x) with extra precision, |B| \le ulp(A)/2 */ \
+ } \
+ while (0)
+
+// This macro computes w_hi + w_lo = sqrt(y_hi + y_lo) in floating point
+// and 1/(w_hi + w_lo) as a Q63 fixed-point T
+// y_hi, y_lo is normalized on input; that is y_hi has
+// full working precision of the sum y_hi + y_lo
+// and 2 log(2) < y_hi < 1100 log(2)
+#define SQRTX_4_TRANSFORM(y_hi, y_lo, w_hi, w_lo, T, t_sc, t_sc_inv, vlen) \
+ do \
+ { \
+ (w_hi) = __riscv_vfsqrt ((y_hi), (vlen)); \
+ (w_lo) = __riscv_vfnmsub ((w_hi), (w_hi), (y_hi), (vlen)); \
+ (w_lo) = __riscv_vfadd ((w_lo), (y_lo), (vlen)); \
+ VFLOAT recip; \
+ recip = __riscv_vfadd ((y_hi), (y_hi), (vlen)); \
+ recip = __riscv_vfrec7 (recip, (vlen)); \
+ recip = __riscv_vfmul (recip, (w_hi), (vlen)); \
+ (w_lo) = __riscv_vfmul ((w_lo), recip, (vlen)); \
+ /* w_hi + w_lo is sqrt(y_hi + y_lo) to extra precision */ \
+ /* now compute T = t_sc/(w_hi + w_lo) as fixed point */ \
+ VFLOAT t_lo = VFMV_VF ((t_sc), (vlen)); \
+ VFLOAT t_hi = __riscv_vfdiv (t_lo, (w_hi), (vlen)); \
+ (T) = __riscv_vfcvt_x (t_hi, (vlen)); \
+ t_lo = __riscv_vfnmsac (t_lo, (w_hi), t_hi, (vlen)); \
+ t_lo = __riscv_vfnmsac (t_lo, (w_lo), t_hi, (vlen)); \
+ t_lo = __riscv_vfmul (t_lo, t_hi, (vlen)); \
+ t_lo = __riscv_vfmul (t_lo, (t_sc_inv), vlen); \
+ (T) = __riscv_vadd ((T), __riscv_vfcvt_x (t_lo, (vlen)), (vlen)); \
+ } \
+ while (0)
+
+#define ERFCINV_PQ_TINY(T, p_hi_tiny, p_lo_tiny, q_hi_tiny, q_lo_tiny, vlen) \
+ do \
+ { \
+ /* T is in scale of 65 */ \
+ VINT P, Q; \
+ P = PSTEP_I_SRA (P_tiny_7, T, 4, \
+ PSTEP_I_SRA (P_tiny_8, P_tiny_9, 4, T, (vlen)), \
+ (vlen)); \
+ /* P in Q_56 */ \
+ P = PSTEP_I_SRA (P_tiny_5, T, 1, \
+ PSTEP_I_SRA (P_tiny_6, P, 2, T, (vlen)), (vlen)); \
+ /* P in Q_57 */ \
+ P = PSTEP_I_SRA (P_tiny_3, T, 1, PSTEP_I (P_tiny_4, P, T, (vlen)), \
+ (vlen)); \
+ /* P in Q_60 */ \
+ P = PSTEP_I_SLL (P_tiny_1, T, 1, \
+ PSTEP_I_SRA (P_tiny_2, P, 1, T, (vlen)), (vlen)); \
+ /* P in Q_64 */ \
+ P = PSTEP_I (P_tiny_0, T, P, (vlen)); \
+ /* P in Q_66 */ \
+ \
+ Q = PSTEP_I_SRA (Q_tiny_7, T, 2, \
+ PSTEP_I_SRA (Q_tiny_8, Q_tiny_9, 3, T, (vlen)), \
+ (vlen)); \
+ /* Q in Q_56 */ \
+ Q = PSTEP_I_SRA (Q_tiny_5, T, 1, \
+ PSTEP_I_SRA (Q_tiny_6, Q, 2, T, (vlen)), (vlen)); \
+ /* Q in Q_57 */ \
+ Q = PSTEP_I_SRA (Q_tiny_3, T, 2, PSTEP_I (Q_tiny_4, Q, T, (vlen)), \
+ (vlen)); \
+ /* P in Q_59 */ \
+ Q = PSTEP_I_SLL (Q_tiny_1, T, 1, PSTEP_I (Q_tiny_2, Q, T, (vlen)), \
+ (vlen)); \
+ /* Q in Q_64 */ \
+ Q = PSTEP_I (Q_tiny_0, T, Q, (vlen)); \
+ /* Q in Q_66 */ \
+ \
+ p_hi_tiny = __riscv_vfcvt_f (P, (vlen)); \
+ p_lo_tiny = __riscv_vfcvt_f ( \
+ __riscv_vsub (P, __riscv_vfcvt_x (p_hi_tiny, (vlen)), (vlen)), \
+ (vlen)); \
+ p_lo_tiny = __riscv_vfadd (p_lo_tiny, DELTA_P0_tiny, (vlen)); \
+ q_hi_tiny = __riscv_vfcvt_f (Q, vlen); \
+ q_lo_tiny = __riscv_vfcvt_f ( \
+ __riscv_vsub (Q, __riscv_vfcvt_x (q_hi_tiny, (vlen)), (vlen)), \
+ (vlen)); \
+ q_lo_tiny = __riscv_vfadd (q_lo, DELTA_Q0_tiny, (vlen)); \
+ } \
+ while (0)
+
+#define UPDATE_P_LO(COEFF, T, P_HI, P_LO, P_tmp, K, vlen) \
+ do \
+ { \
+ (P_LO) = PSTEP_I_HI ((COEFF), (T), (P_LO), (vlen)); \
+ (P_tmp) = __riscv_vmul ((T), (P_HI), (vlen)); \
+ (P_tmp) = VSRL_I_AS_U ((P_tmp), (K), (vlen)); \
+ (P_LO) = __riscv_vadd ((P_LO), (P_tmp), (vlen)); \
+ } \
+ while (0)
+
+#define ERFCINV_PQ_HILO_TINY(T, p_hi_tiny, p_lo_tiny, q_hi_tiny, q_lo_tiny, \
+ vlen) \
+ do \
+ { \
+ /* T is in scale of 64 */ \
+ VINT P_HI, P_LO, Q_HI, Q_LO, P_tmp, Q_tmp; \
+ \
+ P_HI = VMVI_VX (P_tiny_HI_9, (vlen)); \
+ P_LO = VMVI_VX (P_tiny_LO_9, (vlen)); \
+ \
+ UPDATE_P_LO (P_tiny_LO_8, (T), P_HI, P_LO, P_tmp, 30, (vlen)); \
+ P_HI = PSTEP_I_HI (P_tiny_HI_8, (T), P_HI, (vlen)); \
+ UPDATE_P_LO (P_tiny_LO_7, (T), P_HI, P_LO, P_tmp, 30, (vlen)); \
+ P_HI = PSTEP_I_HI (P_tiny_HI_7, (T), P_HI, (vlen)); \
+ UPDATE_P_LO (P_tiny_LO_6, (T), P_HI, P_LO, P_tmp, 30, (vlen)); \
+ P_HI = PSTEP_I_HI (P_tiny_HI_6, (T), P_HI, (vlen)); \
+ UPDATE_P_LO (P_tiny_LO_5, (T), P_HI, P_LO, P_tmp, 30, (vlen)); \
+ P_HI = PSTEP_I_HI (P_tiny_HI_5, (T), P_HI, (vlen)); \
+ UPDATE_P_LO (P_tiny_LO_4, (T), P_HI, P_LO, P_tmp, 30, (vlen)); \
+ P_HI = PSTEP_I_HI (P_tiny_HI_4, (T), P_HI, (vlen)); \
+ UPDATE_P_LO (P_tiny_LO_3, (T), P_HI, P_LO, P_tmp, 30, (vlen)); \
+ P_HI = PSTEP_I_HI (P_tiny_HI_3, (T), P_HI, (vlen)); \
+ UPDATE_P_LO (P_tiny_LO_2, (T), P_HI, P_LO, P_tmp, 30, (vlen)); \
+ P_HI = PSTEP_I_HI (P_tiny_HI_2, (T), P_HI, (vlen)); \
+ UPDATE_P_LO (P_tiny_LO_1, (T), P_HI, P_LO, P_tmp, 30, (vlen)); \
+ P_HI = PSTEP_I_HI (P_tiny_HI_1, (T), P_HI, (vlen)); \
+ UPDATE_P_LO (P_tiny_LO_0, (T), P_HI, P_LO, P_tmp, 30, (vlen)); \
+ P_HI = PSTEP_I_HI (P_tiny_HI_0, (T), P_HI, (vlen)); \
+ \
+ Q_HI = VMVI_VX (Q_tiny_HI_9, (vlen)); \
+ Q_LO = VMVI_VX (Q_tiny_LO_9, (vlen)); \
+ \
+ UPDATE_P_LO (Q_tiny_LO_8, (T), Q_HI, Q_LO, Q_tmp, 30, (vlen)); \
+ Q_HI = PSTEP_I_HI (Q_tiny_HI_8, (T), Q_HI, (vlen)); \
+ UPDATE_P_LO (Q_tiny_LO_7, (T), Q_HI, Q_LO, Q_tmp, 30, (vlen)); \
+ Q_HI = PSTEP_I_HI (Q_tiny_HI_7, (T), Q_HI, (vlen)); \
+ UPDATE_P_LO (Q_tiny_LO_6, (T), Q_HI, Q_LO, Q_tmp, 30, (vlen)); \
+ Q_HI = PSTEP_I_HI (Q_tiny_HI_6, (T), Q_HI, (vlen)); \
+ UPDATE_P_LO (Q_tiny_LO_5, (T), Q_HI, Q_LO, Q_tmp, 30, (vlen)); \
+ Q_HI = PSTEP_I_HI (Q_tiny_HI_5, (T), Q_HI, (vlen)); \
+ UPDATE_P_LO (Q_tiny_LO_4, (T), Q_HI, Q_LO, Q_tmp, 30, (vlen)); \
+ Q_HI = PSTEP_I_HI (Q_tiny_HI_4, (T), Q_HI, (vlen)); \
+ UPDATE_P_LO (Q_tiny_LO_3, (T), Q_HI, Q_LO, Q_tmp, 30, (vlen)); \
+ Q_HI = PSTEP_I_HI (Q_tiny_HI_3, (T), Q_HI, (vlen)); \
+ UPDATE_P_LO (Q_tiny_LO_2, (T), Q_HI, Q_LO, Q_tmp, 30, (vlen)); \
+ Q_HI = PSTEP_I_HI (Q_tiny_HI_2, (T), Q_HI, (vlen)); \
+ UPDATE_P_LO (Q_tiny_LO_1, (T), Q_HI, Q_LO, Q_tmp, 30, (vlen)); \
+ Q_HI = PSTEP_I_HI (Q_tiny_HI_1, (T), Q_HI, (vlen)); \
+ UPDATE_P_LO (Q_tiny_LO_0, (T), Q_HI, Q_LO, Q_tmp, 30, (vlen)); \
+ Q_HI = PSTEP_I_HI (Q_tiny_HI_0, (T), Q_HI, (vlen)); \
+ \
+ VFLOAT A = __riscv_vfcvt_f (P_HI, (vlen)); \
+ p_lo_tiny = __riscv_vfcvt_f (P_LO, (vlen)); \
+ p_hi_tiny = __riscv_vfmadd (p_lo_tiny, 0x1.0p-34, A, (vlen)); \
+ p_lo_tiny \
+ = __riscv_vfmadd (p_lo_tiny, 0x1.0p-34, \
+ __riscv_vfsub (A, p_hi_tiny, (vlen)), (vlen)); \
+ VFLOAT B = __riscv_vfcvt_f (Q_HI, (vlen)); \
+ q_lo_tiny = __riscv_vfcvt_f (Q_LO, (vlen)); \
+ q_hi_tiny = __riscv_vfmadd (q_lo_tiny, 0x1.0p-34, B, (vlen)); \
+ q_lo_tiny \
+ = __riscv_vfmadd (q_lo_tiny, 0x1.0p-34, \
+ __riscv_vfsub (B, q_hi_tiny, (vlen)), (vlen)); \
+ } \
+ while (0)
diff --git a/sysdeps/riscv/rvd/veclibm/include/rvvlm_invhyperD.h b/sysdeps/riscv/rvd/veclibm/include/rvvlm_invhyperD.h
new file mode 100644
index 0000000000..0e6d9080a2
--- /dev/null
+++ b/sysdeps/riscv/rvd/veclibm/include/rvvlm_invhyperD.h
@@ -0,0 +1,194 @@
+/*
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+//
+
+#define LOG2_HI 0x1.62e42fefa4000p-1
+#define LOG2_LO -0x1.8432a1b0e2634p-43
+#define LOG2_BY2_HI 0x1.62e42fefa4000p-2
+#define LOG2_BY2_LO -0x1.8432a1b0e2634p-44
+#define ONE_Q60 0x1000000000000000
+
+#if defined(COMPILE_FOR_ACOSH)
+#define PLUS_MINUS_ONE -0x1.0p0
+#else
+#define PLUS_MINUS_ONE 0x1.0p0
+#endif
+
+#define EXCEPTION_HANDLING_ACOSH(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VFLOAT vxm1 = __riscv_vfsub ((vx), fp_posOne, (vlen)); \
+ VUINT vclass = __riscv_vfclass (vxm1, (vlen)); \
+ IDENTIFY (vclass, class_negative, (special_args), (vlen)); \
+ vxm1 = __riscv_vfmerge (vxm1, fp_sNaN, (special_args), (vlen)); \
+ IDENTIFY (vclass, class_NaN | class_Inf | class_negative | class_Zero, \
+ (special_args), (vlen)); \
+ if (__riscv_vcpop ((special_args), (vlen)) > 0) \
+ { \
+ (vy_special) = __riscv_vfmul ((special_args), vxm1, vxm1, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posOne, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+
+#define EXCEPTION_HANDLING_ASINH(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ VBOOL Inf_or_NaN_or_pm0; \
+ IDENTIFY (vclass, class_NaN | class_Inf | class_Zero, \
+ Inf_or_NaN_or_pm0, (vlen)); \
+ VUINT expo_x = __riscv_vsrl (F_AS_U (vx), MAN_LEN, (vlen)); \
+ expo_x = __riscv_vand (expo_x, 0x7FF, (vlen)); \
+ VBOOL x_small = __riscv_vmsltu (expo_x, EXP_BIAS - 30, (vlen)); \
+ (special_args) = __riscv_vmor (Inf_or_NaN_or_pm0, x_small, (vlen)); \
+ if (__riscv_vcpop ((special_args), (vlen)) > 0) \
+ { \
+ VFLOAT tmp \
+ = __riscv_vfmadd (x_small, (vx), -0x1.0p-60, (vx), (vlen)); \
+ (vy_special) = __riscv_vmerge ((vy_special), tmp, x_small, (vlen)); \
+ tmp = __riscv_vfadd (Inf_or_NaN_or_pm0, (vx), (vx), (vlen)); \
+ (vy_special) = __riscv_vmerge ((vy_special), tmp, \
+ Inf_or_NaN_or_pm0, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posZero, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+
+#define EXCEPTION_HANDLING_ATANH(vx, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT expo_x = __riscv_vand ( \
+ __riscv_vsrl (F_AS_U (vx), MAN_LEN, (vlen)), 0x7FF, (vlen)); \
+ VBOOL x_large = __riscv_vmsgeu (expo_x, EXP_BIAS, (vlen)); \
+ VBOOL x_small = __riscv_vmsltu (expo_x, EXP_BIAS - 30, (vlen)); \
+ (special_args) = __riscv_vmor (x_large, x_small, (vlen)); \
+ if (__riscv_vcpop ((special_args), (vlen)) > 0) \
+ { \
+ VFLOAT abs_x = __riscv_vfsgnj ((vx), fp_posOne, (vlen)); \
+ VBOOL x_gt_1 = __riscv_vmfgt (abs_x, fp_posOne, (vlen)); \
+ VBOOL x_eq_1 = __riscv_vmfeq (abs_x, fp_posOne, (vlen)); \
+ /* substitute |x| > 1 with sNaN */ \
+ (vx) = __riscv_vfmerge ((vx), fp_sNaN, x_gt_1, (vlen)); \
+ /* substitute |x| = 1 with +/-Inf and generate div-by-zero signal \
+ */ \
+ VFLOAT tmp = VFMV_VF (fp_posZero, (vlen)); \
+ tmp = __riscv_vfsgnj (tmp, (vx), (vlen)); \
+ tmp = __riscv_vfrec7 (x_eq_1, tmp, (vlen)); \
+ (vy_special) = __riscv_vfadd ((special_args), (vx), (vx), (vlen)); \
+ (vy_special) = __riscv_vmerge ((vy_special), tmp, x_eq_1, (vlen)); \
+ tmp = __riscv_vfmadd (x_small, (vx), 0x1.0p-60, (vx), (vlen)); \
+ (vy_special) = __riscv_vmerge ((vy_special), tmp, x_small, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posZero, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+
+// scale x down by 2^(-550) and set u to 0 if x >= 2^500
+#define SCALE_X(vx, n, u, vlen) \
+ do \
+ { \
+ VUINT expo_x = __riscv_vsrl (F_AS_U ((vx)), MAN_LEN, (vlen)); \
+ VBOOL x_large = __riscv_vmsgeu (expo_x, EXP_BIAS + 500, (vlen)); \
+ (n) = __riscv_vxor ((n), (n), (vlen)); \
+ (n) = __riscv_vmerge ((n), 550, x_large, (vlen)); \
+ (u) = VFMV_VF (PLUS_MINUS_ONE, (vlen)); \
+ (u) = __riscv_vfmerge ((u), fp_posZero, x_large, (vlen)); \
+ (vx) = I_AS_F (__riscv_vsub ( \
+ F_AS_I (vx), __riscv_vsll ((n), MAN_LEN, (vlen)), (vlen))); \
+ } \
+ while (0)
+
+// 2^(-50) <= X <= 2^500, u is -1 or 0
+// If u is -1, 1 <= X < 2^500
+#define XSQ_PLUS_U_ACOSH(vx, u, A, a, vlen) \
+ do \
+ { \
+ VFLOAT P, p; \
+ PROD_X1Y1 ((vx), (vx), P, p, (vlen)); \
+ VFLOAT tmp1, tmp2; \
+ FAST2SUM (P, (u), tmp1, tmp2, (vlen)); \
+ tmp2 = __riscv_vfadd (tmp2, p, (vlen)); \
+ FAST2SUM (tmp1, tmp2, (A), (a), (vlen)); \
+ } \
+ while (0)
+
+#define XSQ_PLUS_U_ASINH(vx, u, A, a, vlen) \
+ do \
+ { \
+ VFLOAT P, p; \
+ PROD_X1Y1 ((vx), (vx), P, p, (vlen)); \
+ VFLOAT tmp1, tmp2; \
+ POS2SUM (P, (u), tmp1, tmp2, (vlen)); \
+ tmp2 = __riscv_vfadd (tmp2, p, (vlen)); \
+ POS2SUM (tmp1, tmp2, (A), (a), (vlen)); \
+ } \
+ while (0)
+
+// scale x down by 2^(-550) and set u to 0 if x >= 2^500
+#define SCALE_4_LOG(S, s, n, vlen) \
+ do \
+ { \
+ VINT expo_x = __riscv_vsra (F_AS_I ((S)), MAN_LEN - 8, (vlen)); \
+ expo_x = __riscv_vadd (expo_x, 0x96, (vlen)); \
+ expo_x = __riscv_vsra (expo_x, 8, (vlen)); \
+ VINT n_adjust = __riscv_vsub (expo_x, EXP_BIAS, (vlen)); \
+ (n) = __riscv_vadd ((n), n_adjust, (vlen)); \
+ expo_x = __riscv_vsll (__riscv_vrsub (expo_x, 2 * EXP_BIAS, (vlen)), \
+ MAN_LEN, (vlen)); \
+ (S) = I_AS_F (__riscv_vsub ( \
+ F_AS_I ((S)), __riscv_vsll (n_adjust, MAN_LEN, (vlen)), (vlen))); \
+ (s) = __riscv_vfmul ((s), I_AS_F (expo_x), (vlen)); \
+ } \
+ while (0)
+
+#define TRANSFORM_2_ATANH(S, s, numer, delta_numer, denom, delta_denom, vlen) \
+ do \
+ { \
+ VFLOAT S_tmp = __riscv_vfsub ((S), fp_posOne, (vlen)); \
+ FAST2SUM (S_tmp, (s), (numer), (delta_numer), (vlen)); \
+ (numer) = __riscv_vfadd ((numer), (numer), (vlen)); \
+ (delta_numer) = __riscv_vfadd ((delta_numer), (delta_numer), (vlen)); \
+ S_tmp = VFMV_VF (fp_posOne, (vlen)); \
+ FAST2SUM (S_tmp, (S), (denom), (delta_denom), (vlen)); \
+ (delta_denom) = __riscv_vfadd ((delta_denom), (s), (vlen)); \
+ } \
+ while (0)
+
+#define LOG_POLY(r, r_lo, poly, vlen) \
+ do \
+ { \
+ VFLOAT rsq = __riscv_vfmul ((r), (r), (vlen)); \
+ VFLOAT rcube = __riscv_vfmul (rsq, (r), (vlen)); \
+ VFLOAT r6 = __riscv_vfmul (rcube, rcube, (vlen)); \
+ VFLOAT poly_right \
+ = PSTEP (0x1.c71c543983a27p-12, rsq, \
+ PSTEP (0x1.7465c27ee47d0p-14, rsq, \
+ PSTEP (0x1.39af2e90a6554p-16, \
+ 0x1.2e74f2255e096p-18, rsq, (vlen)), \
+ (vlen)), \
+ (vlen)); \
+ VFLOAT poly_left = PSTEP ( \
+ 0x1.555555555558cp-4, rsq, \
+ PSTEP (0x1.9999999982550p-7, 0x1.2492493f7cc71p-9, rsq, (vlen)), \
+ (vlen)); \
+ (poly) = __riscv_vfmadd (poly_right, r6, poly_left, (vlen)); \
+ (poly) = __riscv_vfmadd (poly, rcube, (r_lo), (vlen)); \
+ } \
+ while (0)
diff --git a/sysdeps/riscv/rvd/veclibm/include/rvvlm_trigD.h b/sysdeps/riscv/rvd/veclibm/include/rvvlm_trigD.h
new file mode 100644
index 0000000000..96685e5eac
--- /dev/null
+++ b/sysdeps/riscv/rvd/veclibm/include/rvvlm_trigD.h
@@ -0,0 +1,297 @@
+/*
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+//
+
+#define PIBY2_INV 0x1.45f306dc9c883p-1
+#define PIBY2_HI 0x1.921fb54442d18p+0
+#define PIBY2_MID 0x1.1a62633145c07p-54
+#define PIBY2_LO -0x1.f1976b7ed8fbcp-110
+#define PI_HI 0x1.921fb54442d18p+1
+#define PI_MID 0x1.1a62633145c07p-53
+
+#if defined(COMPILE_FOR_SIN) || defined(COMPILE_FOR_TAN)
+#define FUNC_NEAR_ZERO(small_x, vx, vlen) \
+ __riscv_vfmadd ((small_x), (vx), 0x1.0p-60, (vx), (vlen))
+#elif defined(COMPILE_FOR_SINPI) || defined(COMPILE_FOR_TANPI)
+#define FUNC_NEAR_ZERO(small_x, vx, vlen) \
+ __riscv_vfmadd ((small_x), (vx), PI_HI, \
+ __riscv_vfmul ((small_x), (vx), PI_MID, (vlen)), (vlen))
+#elif defined(COMPILE_FOR_SINCOS)
+#define SIN_NEAR_ZERO(small_x, vx, vlen) \
+ __riscv_vfmadd ((small_x), (vx), 0x1.0p-60, (vx), (vlen))
+#define COS_NEAR_ZERO(small_x, vx, vlen) \
+ __riscv_vfadd ((small_x), (vx), 0x1.0p0, (vlen))
+#elif defined(COMPILE_FOR_SINCOSPI)
+#define SIN_NEAR_ZERO(small_x, vx, vlen) \
+ __riscv_vfmadd ((small_x), (vx), PI_HI, \
+ __riscv_vfmul ((small_x), (vx), PI_MID, (vlen)), (vlen))
+#define COS_NEAR_ZERO(small_x, vx, vlen) \
+ __riscv_vfadd ((small_x), (vx), 0x1.0p0, (vlen))
+#else
+#define FUNC_NEAR_ZERO(small_x, vx, vlen) \
+ __riscv_vfadd ((small_x), (vx), 0x1.0p0, (vlen))
+#endif
+
+#define EXCEPTION_HANDLING_TRIG(vx, expo_x, special_args, vy_special, vlen) \
+ do \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ VBOOL NaN_Inf; \
+ IDENTIFY (vclass, class_NaN | class_Inf, NaN_Inf, (vlen)); \
+ VBOOL small_x \
+ = __riscv_vmsleu ((expo_x), EXP_BIAS - MAN_LEN - 5, vlen); \
+ (special_args) = __riscv_vmor (NaN_Inf, small_x, vlen); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ /* Substitute Inf with sNaN */ \
+ VBOOL id_mask; \
+ IDENTIFY (vclass, class_Inf, id_mask, (vlen)); \
+ (vy_special) = FUNC_NEAR_ZERO (small_x, vx, vlen); \
+ VFLOAT vy_NaN_Inf = __riscv_vfmerge (vx, fp_sNaN, id_mask, (vlen)); \
+ vy_NaN_Inf \
+ = __riscv_vfadd (NaN_Inf, vy_NaN_Inf, vy_NaN_Inf, (vlen)); \
+ (vy_special) \
+ = __riscv_vmerge ((vy_special), vy_NaN_Inf, NaN_Inf, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posZero, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+
+#define EXCEPTION_HANDLING_SINCOS(vx, expo_x, special_args, vy_special, \
+ vz_special, vlen) \
+ do \
+ { \
+ VUINT vclass = __riscv_vfclass ((vx), (vlen)); \
+ VBOOL NaN_Inf; \
+ IDENTIFY (vclass, class_NaN | class_Inf, NaN_Inf, (vlen)); \
+ VBOOL small_x \
+ = __riscv_vmsleu ((expo_x), EXP_BIAS - MAN_LEN - 5, vlen); \
+ (special_args) = __riscv_vmor (NaN_Inf, small_x, vlen); \
+ UINT nb_special_args = __riscv_vcpop ((special_args), (vlen)); \
+ if (nb_special_args > 0) \
+ { \
+ /* Substitute Inf with sNaN */ \
+ VBOOL id_mask; \
+ IDENTIFY (vclass, class_Inf, id_mask, (vlen)); \
+ (vy_special) = SIN_NEAR_ZERO (small_x, vx, vlen); \
+ (vz_special) = COS_NEAR_ZERO (small_x, vx, vlen); \
+ VFLOAT vy_NaN_Inf = __riscv_vfmerge (vx, fp_sNaN, id_mask, (vlen)); \
+ vy_NaN_Inf \
+ = __riscv_vfadd (NaN_Inf, vy_NaN_Inf, vy_NaN_Inf, (vlen)); \
+ (vy_special) \
+ = __riscv_vmerge ((vy_special), vy_NaN_Inf, NaN_Inf, (vlen)); \
+ (vz_special) \
+ = __riscv_vmerge ((vz_special), vy_NaN_Inf, NaN_Inf, (vlen)); \
+ (vx) = __riscv_vfmerge ((vx), fp_posZero, (special_args), (vlen)); \
+ } \
+ } \
+ while (0)
+
+// This is a macro for trigometric argument reduction for |x| >= 2^24.
+#define LARGE_ARGUMENT_REDUCTION_Piby2(vx, vlen, x_large, n_xlarge, r_xlarge, \
+ r_delta_xlarge) \
+ do \
+ { \
+ /* All variable are local except for those given as arguments above. */ \
+ /* First, set the non-large argument to 2**30 so that we can go through \
+ the same code without worrying about unexpected behavior. */ \
+ VBOOL x_not_large = __riscv_vmnot ((x_large), vlen); \
+ VFLOAT VX = __riscv_vfmerge (vx, 0x1.0p30, x_not_large, vlen); \
+ \
+ /* Get exponent of VX, normalize it to 1 <= |VX| < 2 */ \
+ VINT lsb_x = U_AS_I (__riscv_vsrl (F_AS_U (VX), 52, vlen)); \
+ lsb_x = __riscv_vand (lsb_x, 0x7ff, \
+ vlen); /* this is the biased exponent */ \
+ /* lsb of X is the unbiased exponent - 52, = biased_exponent - \
+ * (1023+52) \
+ */ \
+ lsb_x = __riscv_vsub (lsb_x, 1075, vlen); \
+ \
+ VUINT expo_mask = VMVU_VX (0x7ff, vlen); \
+ expo_mask \
+ = __riscv_vsll (expo_mask, 52, vlen); /* 0x7FF0000000000000 */ \
+ VFLOAT scale = U_AS_F (__riscv_vand (F_AS_U (VX), expo_mask, vlen)); \
+ scale = __riscv_vfmul (scale, 0x1.0p-500, vlen); \
+ \
+ expo_mask = __riscv_vnot (expo_mask, vlen); /* 0x800FFFFFFFFFFFFF */ \
+ VUINT expo_1 = VMVU_VX (0x3ff, vlen); \
+ expo_1 = __riscv_vsll (expo_1, 52, vlen); \
+ \
+ VX = U_AS_F (__riscv_vand (F_AS_U (VX), expo_mask, vlen)); \
+ VX = U_AS_F (__riscv_vor (F_AS_U (VX), expo_1, vlen)); \
+ /* At this point, |VX| in in [1, 2), but lsb of the original x is \
+ recorded \
+ \ \
+ We figure out which portions of 2/pi is needed. Recall \
+ that the need is to get N mod 4 and R, where x * (2/pi) = N + R, \
+ |R| \
+ <= 1/2. So we do not need the portions of 2/pi whose product with x is \
+ an integer >= 4 Also, from the first relevant portion of 2/pi, we only \
+ needed 5 portions of 2/pi \
+ \ \
+ We figure out the first index of 2/pi that is needed using lsb_x \
+ This first index is FLOOR( (max(lsb_x,2) - 2) / 52 ), which can be \
+ computed as FLOOR( (20165 * (max(lsb_x,2) - 2)) / 2^20 ) \
+ */ \
+ VUINT j_start = I_AS_U (__riscv_vmax (lsb_x, 2, vlen)); \
+ j_start = __riscv_vsub (j_start, 2, vlen); \
+ j_start = __riscv_vmul (j_start, 20165, vlen); \
+ j_start = __riscv_vsrl (j_start, 20, vlen); \
+ VUINT ind = __riscv_vsll (j_start, 3, \
+ vlen); /* 8 bytes for indexing into table */ \
+ \
+ /* \
+ Need to compute y * 2ovpi_tbl[j] in 2 pieces, lsb(y*2ovpi_tbl[j]) is \
+ -52 + 500 - (52 (j+1)); we chose Peg = sign(2ovpi_tbl[j]) x \
+ 2^(52+53) * lsb that is, sgn * 2^(501 - 52*j) \
+ */ \
+ VFLOAT two_by_pi; \
+ two_by_pi = __riscv_vluxei64 (dbl_2ovpi_tbl, ind, vlen); \
+ VUINT peg_expo = VMVU_VX (1524, vlen); /* bias + 501 */ \
+ \
+ peg_expo = __riscv_vnmsac (peg_expo, 52, j_start, \
+ vlen); /* biased expo of peg */ \
+ VFLOAT peg = U_AS_F (__riscv_vsll (peg_expo, 52, vlen)); \
+ peg = __riscv_vfsgnj (peg, two_by_pi, vlen); \
+ peg = __riscv_vfsgnjx (peg, VX, vlen); \
+ VFLOAT S = __riscv_vfmadd (VX, two_by_pi, peg, vlen); \
+ S = __riscv_vfsub (S, peg, vlen); \
+ VFLOAT s = __riscv_vfmsub (VX, two_by_pi, S, vlen); \
+ \
+ VFLOAT prod_0 = S; \
+ VFLOAT prod_1 = s; \
+ prod_0 = __riscv_vfmul (prod_0, scale, vlen); \
+ \
+ ind = __riscv_vadd (ind, 8, vlen); \
+ two_by_pi = __riscv_vluxei64 (dbl_2ovpi_tbl, ind, vlen); \
+ peg_expo = __riscv_vsub (peg_expo, 52, vlen); \
+ peg = U_AS_F (__riscv_vsll (peg_expo, 52, vlen)); \
+ peg = __riscv_vfsgnj (peg, two_by_pi, vlen); \
+ peg = __riscv_vfsgnjx (peg, VX, vlen); \
+ S = __riscv_vfmadd (VX, two_by_pi, peg, vlen); \
+ S = __riscv_vfsub (S, peg, vlen); \
+ s = __riscv_vfmsub (VX, two_by_pi, S, vlen); \
+ prod_1 = __riscv_vfadd (prod_1, S, vlen); \
+ VFLOAT prod_2 = I_AS_F (__riscv_vor (F_AS_I (s), F_AS_I (s), vlen)); \
+ prod_1 = __riscv_vfmul (prod_1, scale, vlen); \
+ \
+ ind = __riscv_vadd (ind, 8, vlen); \
+ two_by_pi = __riscv_vluxei64 (dbl_2ovpi_tbl, ind, vlen); \
+ peg_expo = __riscv_vsub (peg_expo, 52, vlen); \
+ peg = U_AS_F (__riscv_vsll (peg_expo, 52, vlen)); \
+ peg = __riscv_vfsgnj (peg, two_by_pi, vlen); \
+ peg = __riscv_vfsgnjx (peg, VX, vlen); \
+ S = __riscv_vfmadd (VX, two_by_pi, peg, vlen); \
+ S = __riscv_vfsub (S, peg, vlen); \
+ s = __riscv_vfmsub (VX, two_by_pi, S, vlen); \
+ prod_2 = __riscv_vfadd (prod_2, S, vlen); \
+ VFLOAT prod_3 = I_AS_F (__riscv_vor (F_AS_I (s), F_AS_I (s), vlen)); \
+ prod_2 = __riscv_vfmul (prod_2, scale, vlen); \
+ \
+ /* \
+ At this point, we can get N from prod_0, prod_1, prod_2 \
+ and start the summation for the reduced fraction \
+ In case of |VX| >= 2^54, prod_0 can be set to 0 \
+ That is scale >= 2^(54-500) \
+ */ \
+ VBOOL ignore_prod_0 = __riscv_vmfge (scale, 0x1.0p-446, vlen); \
+ prod_0 = __riscv_vfmerge (prod_0, 0.0, ignore_prod_0, vlen); \
+ \
+ /* \
+ extracting the integer part of SUM prod_j; \
+ put in precaution that the value may be too big so that \
+ rounded integer value is not exact in FP format \
+ */ \
+ VFLOAT flt_n = __riscv_vfmul (prod_0, 0x1.0p-12, vlen); \
+ (n_xlarge) = __riscv_vfcvt_x (flt_n, vlen); \
+ flt_n = __riscv_vfcvt_f ((n_xlarge), vlen); \
+ prod_0 = __riscv_vfnmsac (prod_0, 0x1.0p12, flt_n, vlen); \
+ \
+ flt_n = __riscv_vfmul (prod_1, 0x1.0p-12, vlen); \
+ (n_xlarge) = __riscv_vfcvt_x (flt_n, vlen); \
+ flt_n = __riscv_vfcvt_f ((n_xlarge), vlen); \
+ prod_1 = __riscv_vfnmsac (prod_1, 0x1.0p12, flt_n, vlen); \
+ \
+ /* we are now safe to get N from prod_0 + prod_1 + prod_2 */ \
+ flt_n = __riscv_vfadd (prod_1, prod_2, vlen); \
+ flt_n = __riscv_vfadd (flt_n, prod_0, vlen); \
+ (n_xlarge) = __riscv_vfcvt_x (flt_n, vlen); \
+ flt_n = __riscv_vfcvt_f ((n_xlarge), vlen); \
+ prod_0 = __riscv_vfsub (prod_0, flt_n, vlen); \
+ \
+ VFLOAT r_hi = __riscv_vfadd (prod_0, prod_1, vlen); \
+ VFLOAT r_lo = __riscv_vfsub (prod_0, r_hi, vlen); \
+ r_lo = __riscv_vfadd (r_lo, prod_1, vlen); \
+ \
+ VFLOAT tmp_1, tmp_2; \
+ tmp_1 = __riscv_vfadd (r_hi, prod_2, vlen); \
+ tmp_2 = __riscv_vfsub (r_hi, tmp_1, vlen); \
+ tmp_2 = __riscv_vfadd (tmp_2, prod_2, vlen); \
+ r_hi = tmp_1; \
+ r_lo = __riscv_vfadd (r_lo, tmp_2, vlen); \
+ \
+ ind = __riscv_vadd (ind, 8, vlen); \
+ two_by_pi = __riscv_vluxei64 (dbl_2ovpi_tbl, ind, vlen); \
+ peg_expo = __riscv_vsub (peg_expo, 52, vlen); \
+ peg = U_AS_F (__riscv_vsll (peg_expo, 52, vlen)); \
+ peg = __riscv_vfsgnj (peg, two_by_pi, vlen); \
+ peg = __riscv_vfsgnjx (peg, VX, vlen); \
+ S = __riscv_vfmadd (VX, two_by_pi, peg, vlen); \
+ S = __riscv_vfsub (S, peg, vlen); \
+ s = __riscv_vfmsub (VX, two_by_pi, S, vlen); \
+ prod_3 = __riscv_vfadd (prod_3, S, vlen); \
+ VFLOAT prod_4 = I_AS_F (__riscv_vor (F_AS_I (s), F_AS_I (s), vlen)); \
+ prod_3 = __riscv_vfmul (prod_3, scale, vlen); \
+ \
+ tmp_1 = __riscv_vfadd (r_hi, prod_3, vlen); \
+ tmp_2 = __riscv_vfsub (r_hi, tmp_1, vlen); \
+ tmp_2 = __riscv_vfadd (tmp_2, prod_3, vlen); \
+ r_hi = tmp_1; \
+ r_lo = __riscv_vfadd (r_lo, tmp_2, vlen); \
+ \
+ ind = __riscv_vadd (ind, 8, vlen); \
+ two_by_pi = __riscv_vluxei64 (dbl_2ovpi_tbl, ind, vlen); \
+ peg_expo = __riscv_vsub (peg_expo, 52, vlen); \
+ peg = U_AS_F (__riscv_vsll (peg_expo, 52, vlen)); \
+ peg = __riscv_vfsgnj (peg, two_by_pi, vlen); \
+ peg = __riscv_vfsgnjx (peg, VX, vlen); \
+ S = __riscv_vfmadd (VX, two_by_pi, peg, vlen); \
+ S = __riscv_vfsub (S, peg, vlen); \
+ prod_4 = __riscv_vfadd (prod_4, S, vlen); \
+ prod_4 = __riscv_vfmul (prod_4, scale, vlen); \
+ \
+ tmp_1 = __riscv_vfadd (r_hi, prod_4, vlen); \
+ tmp_2 = __riscv_vfsub (r_hi, tmp_1, vlen); \
+ tmp_2 = __riscv_vfadd (tmp_2, prod_4, vlen); \
+ r_hi = tmp_1; \
+ r_lo = __riscv_vfadd (r_lo, tmp_2, vlen); \
+ \
+ /* \
+ Finally, (r_hi + r_lo) * pi/2 is the reduced argument \
+ we want: that is x - N * pi/2 \
+ */ \
+ (r_xlarge) = __riscv_vfmul (r_hi, PIBY2_HI, vlen); \
+ (r_delta_xlarge) = __riscv_vfmsub (r_hi, PIBY2_HI, (r_xlarge), vlen); \
+ (r_delta_xlarge) \
+ = __riscv_vfmacc ((r_delta_xlarge), PIBY2_MID, r_hi, vlen); \
+ (r_delta_xlarge) \
+ = __riscv_vfmacc ((r_delta_xlarge), PIBY2_HI, r_lo, vlen); \
+ } \
+ while (0)
diff --git a/sysdeps/unix/sysv/linux/riscv/libmvec.abilist b/sysdeps/unix/sysv/linux/riscv/libmvec.abilist
new file mode 100644
index 0000000000..9d7b426027
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/riscv/libmvec.abilist
@@ -0,0 +1,455 @@
+GLIBC_2.41 _ZGV1N2v_exp F
+GLIBC_2.41 _ZGV1N4v_exp F
+GLIBC_2.41 _ZGV2N2v_exp F
+GLIBC_2.41 _ZGV2N4v_exp F
+GLIBC_2.41 _ZGV2N8v_exp F
+GLIBC_2.41 _ZGV4N4v_exp F
+GLIBC_2.41 _ZGV4N8v_exp F
+GLIBC_2.41 _ZGV4N16v_exp F
+GLIBC_2.41 _ZGV8N8v_exp F
+GLIBC_2.41 _ZGV8N16v_exp F
+GLIBC_2.41 _ZGV8N32v_exp F
+
+GLIBC_2.41 _ZGV1N2v_asin F
+GLIBC_2.41 _ZGV1N4v_asin F
+GLIBC_2.41 _ZGV2N2v_asin F
+GLIBC_2.41 _ZGV2N4v_asin F
+GLIBC_2.41 _ZGV2N8v_asin F
+GLIBC_2.41 _ZGV4N4v_asin F
+GLIBC_2.41 _ZGV4N8v_asin F
+GLIBC_2.41 _ZGV4N16v_asin F
+GLIBC_2.41 _ZGV8N8v_asin F
+GLIBC_2.41 _ZGV8N16v_asin F
+GLIBC_2.41 _ZGV8N32v_asin F
+
+GLIBC_2.41 _ZGV1N2v_atan F
+GLIBC_2.41 _ZGV1N4v_atan F
+GLIBC_2.41 _ZGV2N2v_atan F
+GLIBC_2.41 _ZGV2N4v_atan F
+GLIBC_2.41 _ZGV2N8v_atan F
+GLIBC_2.41 _ZGV4N4v_atan F
+GLIBC_2.41 _ZGV4N8v_atan F
+GLIBC_2.41 _ZGV4N16v_atan F
+GLIBC_2.41 _ZGV8N8v_atan F
+GLIBC_2.41 _ZGV8N16v_atan F
+GLIBC_2.41 _ZGV8N32v_atan F
+
+GLIBC_2.41 _ZGV1N2v_acos F
+GLIBC_2.41 _ZGV1N4v_acos F
+GLIBC_2.41 _ZGV2N2v_acos F
+GLIBC_2.41 _ZGV2N4v_acos F
+GLIBC_2.41 _ZGV2N8v_acos F
+GLIBC_2.41 _ZGV4N4v_acos F
+GLIBC_2.41 _ZGV4N8v_acos F
+GLIBC_2.41 _ZGV4N16v_acos F
+GLIBC_2.41 _ZGV8N8v_acos F
+GLIBC_2.41 _ZGV8N16v_acos F
+GLIBC_2.41 _ZGV8N32v_acos F
+
+GLIBC_2.41 _ZGV1N2v_atanh F
+GLIBC_2.41 _ZGV1N4v_atanh F
+GLIBC_2.41 _ZGV2N2v_atanh F
+GLIBC_2.41 _ZGV2N4v_atanh F
+GLIBC_2.41 _ZGV2N8v_atanh F
+GLIBC_2.41 _ZGV4N4v_atanh F
+GLIBC_2.41 _ZGV4N8v_atanh F
+GLIBC_2.41 _ZGV4N16v_atanh F
+GLIBC_2.41 _ZGV8N8v_atanh F
+GLIBC_2.41 _ZGV8N16v_atanh F
+GLIBC_2.41 _ZGV8N32v_atanh F
+
+GLIBC_2.41 _ZGV1N2v_exp10 F
+GLIBC_2.41 _ZGV1N4v_exp10 F
+GLIBC_2.41 _ZGV2N2v_exp10 F
+GLIBC_2.41 _ZGV2N4v_exp10 F
+GLIBC_2.41 _ZGV2N8v_exp10 F
+GLIBC_2.41 _ZGV4N4v_exp10 F
+GLIBC_2.41 _ZGV4N8v_exp10 F
+GLIBC_2.41 _ZGV4N16v_exp10 F
+GLIBC_2.41 _ZGV8N8v_exp10 F
+GLIBC_2.41 _ZGV8N16v_exp10 F
+GLIBC_2.41 _ZGV8N32v_exp10 F
+
+GLIBC_2.41 _ZGV1N2v_exp2 F
+GLIBC_2.41 _ZGV1N4v_exp2 F
+GLIBC_2.41 _ZGV2N2v_exp2 F
+GLIBC_2.41 _ZGV2N4v_exp2 F
+GLIBC_2.41 _ZGV2N8v_exp2 F
+GLIBC_2.41 _ZGV4N4v_exp2 F
+GLIBC_2.41 _ZGV4N8v_exp2 F
+GLIBC_2.41 _ZGV4N16v_exp2 F
+GLIBC_2.41 _ZGV8N8v_exp2 F
+GLIBC_2.41 _ZGV8N16v_exp2 F
+GLIBC_2.41 _ZGV8N32v_exp2 F
+
+GLIBC_2.41 _ZGV1N2v_tan F
+GLIBC_2.41 _ZGV1N4v_tan F
+GLIBC_2.41 _ZGV2N2v_tan F
+GLIBC_2.41 _ZGV2N4v_tan F
+GLIBC_2.41 _ZGV2N8v_tan F
+GLIBC_2.41 _ZGV4N4v_tan F
+GLIBC_2.41 _ZGV4N8v_tan F
+GLIBC_2.41 _ZGV4N16v_tan F
+GLIBC_2.41 _ZGV8N8v_tan F
+GLIBC_2.41 _ZGV8N16v_tan F
+GLIBC_2.41 _ZGV8N32v_tan F
+
+GLIBC_2.41 _ZGV1N2v_tanh F
+GLIBC_2.41 _ZGV1N4v_tanh F
+GLIBC_2.41 _ZGV2N2v_tanh F
+GLIBC_2.41 _ZGV2N4v_tanh F
+GLIBC_2.41 _ZGV2N8v_tanh F
+GLIBC_2.41 _ZGV4N4v_tanh F
+GLIBC_2.41 _ZGV4N8v_tanh F
+GLIBC_2.41 _ZGV4N16v_tanh F
+GLIBC_2.41 _ZGV8N8v_tanh F
+GLIBC_2.41 _ZGV8N16v_tanh F
+GLIBC_2.41 _ZGV8N32v_tanh F
+
+GLIBC_2.41 _ZGV1N2vv_pow F
+GLIBC_2.41 _ZGV1N4vv_pow F
+GLIBC_2.41 _ZGV2N2vv_pow F
+GLIBC_2.41 _ZGV2N4vv_pow F
+GLIBC_2.41 _ZGV2N8vv_pow F
+GLIBC_2.41 _ZGV4N4vv_pow F
+GLIBC_2.41 _ZGV4N8vv_pow F
+GLIBC_2.41 _ZGV4N16vv_pow F
+GLIBC_2.41 _ZGV8N8vv_pow F
+GLIBC_2.41 _ZGV8N16vv_pow F
+GLIBC_2.41 _ZGV8N32vv_pow F
+
+GLIBC_2.41 _ZGV1N2v_sin F
+GLIBC_2.41 _ZGV1N4v_sin F
+GLIBC_2.41 _ZGV2N2v_sin F
+GLIBC_2.41 _ZGV2N4v_sin F
+GLIBC_2.41 _ZGV2N8v_sin F
+GLIBC_2.41 _ZGV4N4v_sin F
+GLIBC_2.41 _ZGV4N8v_sin F
+GLIBC_2.41 _ZGV4N16v_sin F
+GLIBC_2.41 _ZGV8N8v_sin F
+GLIBC_2.41 _ZGV8N16v_sin F
+GLIBC_2.41 _ZGV8N32v_sin F
+
+GLIBC_2.41 _ZGV1N2v_log F
+GLIBC_2.41 _ZGV1N4v_log F
+GLIBC_2.41 _ZGV2N2v_log F
+GLIBC_2.41 _ZGV2N4v_log F
+GLIBC_2.41 _ZGV2N8v_log F
+GLIBC_2.41 _ZGV4N4v_log F
+GLIBC_2.41 _ZGV4N8v_log F
+GLIBC_2.41 _ZGV4N16v_log F
+GLIBC_2.41 _ZGV8N8v_log F
+GLIBC_2.41 _ZGV8N16v_log F
+GLIBC_2.41 _ZGV8N32v_log F
+
+GLIBC_2.41 _ZGV1N2v_cos F
+GLIBC_2.41 _ZGV1N4v_cos F
+GLIBC_2.41 _ZGV2N2v_cos F
+GLIBC_2.41 _ZGV2N4v_cos F
+GLIBC_2.41 _ZGV2N8v_cos F
+GLIBC_2.41 _ZGV4N4v_cos F
+GLIBC_2.41 _ZGV4N8v_cos F
+GLIBC_2.41 _ZGV4N16v_cos F
+GLIBC_2.41 _ZGV8N8v_cos F
+GLIBC_2.41 _ZGV8N16v_cos F
+GLIBC_2.41 _ZGV8N32v_cos F
+
+GLIBC_2.41 _ZGV1N2v_acosh F
+GLIBC_2.41 _ZGV1N4v_acosh F
+GLIBC_2.41 _ZGV2N2v_acosh F
+GLIBC_2.41 _ZGV2N4v_acosh F
+GLIBC_2.41 _ZGV2N8v_acosh F
+GLIBC_2.41 _ZGV4N4v_acosh F
+GLIBC_2.41 _ZGV4N8v_acosh F
+GLIBC_2.41 _ZGV4N16v_acosh F
+GLIBC_2.41 _ZGV8N8v_acosh F
+GLIBC_2.41 _ZGV8N16v_acosh F
+GLIBC_2.41 _ZGV8N32v_acosh F
+
+GLIBC_2.41 _ZGV1N2v_acospi F
+GLIBC_2.41 _ZGV1N4v_acospi F
+GLIBC_2.41 _ZGV2N2v_acospi F
+GLIBC_2.41 _ZGV2N4v_acospi F
+GLIBC_2.41 _ZGV2N8v_acospi F
+GLIBC_2.41 _ZGV4N4v_acospi F
+GLIBC_2.41 _ZGV4N8v_acospi F
+GLIBC_2.41 _ZGV4N16v_acospi F
+GLIBC_2.41 _ZGV8N8v_acospi F
+GLIBC_2.41 _ZGV8N16v_acospi F
+GLIBC_2.41 _ZGV8N32v_acospi F
+
+GLIBC_2.41 _ZGV1N2v_asinh F
+GLIBC_2.41 _ZGV1N4v_asinh F
+GLIBC_2.41 _ZGV2N2v_asinh F
+GLIBC_2.41 _ZGV2N4v_asinh F
+GLIBC_2.41 _ZGV2N8v_asinh F
+GLIBC_2.41 _ZGV4N4v_asinh F
+GLIBC_2.41 _ZGV4N8v_asinh F
+GLIBC_2.41 _ZGV4N16v_asinh F
+GLIBC_2.41 _ZGV8N8v_asinh F
+GLIBC_2.41 _ZGV8N16v_asinh F
+GLIBC_2.41 _ZGV8N32v_asinh F
+
+GLIBC_2.41 _ZGV1N2v_asinpi F
+GLIBC_2.41 _ZGV1N4v_asinpi F
+GLIBC_2.41 _ZGV2N2v_asinpi F
+GLIBC_2.41 _ZGV2N4v_asinpi F
+GLIBC_2.41 _ZGV2N8v_asinpi F
+GLIBC_2.41 _ZGV4N4v_asinpi F
+GLIBC_2.41 _ZGV4N8v_asinpi F
+GLIBC_2.41 _ZGV4N16v_asinpi F
+GLIBC_2.41 _ZGV8N8v_asinpi F
+GLIBC_2.41 _ZGV8N16v_asinpi F
+GLIBC_2.41 _ZGV8N32v_asinpi F
+
+GLIBC_2.41 _ZGV1N2vv_atan2 F
+GLIBC_2.41 _ZGV1N4vv_atan2 F
+GLIBC_2.41 _ZGV2N2vv_atan2 F
+GLIBC_2.41 _ZGV2N4vv_atan2 F
+GLIBC_2.41 _ZGV2N8vv_atan2 F
+GLIBC_2.41 _ZGV4N4vv_atan2 F
+GLIBC_2.41 _ZGV4N8vv_atan2 F
+GLIBC_2.41 _ZGV4N16vv_atan2 F
+GLIBC_2.41 _ZGV8N8vv_atan2 F
+GLIBC_2.41 _ZGV8N16vv_atan2 F
+GLIBC_2.41 _ZGV8N32vv_atan2 F
+
+GLIBC_2.41 _ZGV1N2vv_atan2pi F
+GLIBC_2.41 _ZGV1N4vv_atan2pi F
+GLIBC_2.41 _ZGV2N2vv_atan2pi F
+GLIBC_2.41 _ZGV2N4vv_atan2pi F
+GLIBC_2.41 _ZGV2N8vv_atan2pi F
+GLIBC_2.41 _ZGV4N4vv_atan2pi F
+GLIBC_2.41 _ZGV4N8vv_atan2pi F
+GLIBC_2.41 _ZGV4N16vv_atan2pi F
+GLIBC_2.41 _ZGV8N8vv_atan2pi F
+GLIBC_2.41 _ZGV8N16vv_atan2pi F
+GLIBC_2.41 _ZGV8N32vv_atan2pi F
+
+GLIBC_2.41 _ZGV1N2v_atanpi F
+GLIBC_2.41 _ZGV1N4v_atanpi F
+GLIBC_2.41 _ZGV2N2v_atanpi F
+GLIBC_2.41 _ZGV2N4v_atanpi F
+GLIBC_2.41 _ZGV2N8v_atanpi F
+GLIBC_2.41 _ZGV4N4v_atanpi F
+GLIBC_2.41 _ZGV4N8v_atanpi F
+GLIBC_2.41 _ZGV4N16v_atanpi F
+GLIBC_2.41 _ZGV8N8v_atanpi F
+GLIBC_2.41 _ZGV8N16v_atanpi F
+GLIBC_2.41 _ZGV8N32v_atanpi F
+
+GLIBC_2.41 _ZGV1N2v_expint1 F
+GLIBC_2.41 _ZGV1N4v_expint1 F
+GLIBC_2.41 _ZGV2N2v_expint1 F
+GLIBC_2.41 _ZGV2N4v_expint1 F
+GLIBC_2.41 _ZGV2N8v_expint1 F
+GLIBC_2.41 _ZGV4N4v_expint1 F
+GLIBC_2.41 _ZGV4N8v_expint1 F
+GLIBC_2.41 _ZGV4N16v_expint1 F
+GLIBC_2.41 _ZGV8N8v_expint1 F
+GLIBC_2.41 _ZGV8N16v_expint1 F
+GLIBC_2.41 _ZGV8N32v_expint1 F
+
+GLIBC_2.41 _ZGV1N2v_expm1 F
+GLIBC_2.41 _ZGV1N4v_expm1 F
+GLIBC_2.41 _ZGV2N2v_expm1 F
+GLIBC_2.41 _ZGV2N4v_expm1 F
+GLIBC_2.41 _ZGV2N8v_expm1 F
+GLIBC_2.41 _ZGV4N4v_expm1 F
+GLIBC_2.41 _ZGV4N8v_expm1 F
+GLIBC_2.41 _ZGV4N16v_expm1 F
+GLIBC_2.41 _ZGV8N8v_expm1 F
+GLIBC_2.41 _ZGV8N16v_expm1 F
+GLIBC_2.41 _ZGV8N32v_expm1 F
+
+GLIBC_2.41 _ZGV1N2v_cosh F
+GLIBC_2.41 _ZGV1N4v_cosh F
+GLIBC_2.41 _ZGV2N2v_cosh F
+GLIBC_2.41 _ZGV2N4v_cosh F
+GLIBC_2.41 _ZGV2N8v_cosh F
+GLIBC_2.41 _ZGV4N4v_cosh F
+GLIBC_2.41 _ZGV4N8v_cosh F
+GLIBC_2.41 _ZGV4N16v_cosh F
+GLIBC_2.41 _ZGV8N8v_cosh F
+GLIBC_2.41 _ZGV8N16v_cosh F
+GLIBC_2.41 _ZGV8N32v_cosh F
+
+GLIBC_2.41 _ZGV1N2v_sinh F
+GLIBC_2.41 _ZGV1N4v_sinh F
+GLIBC_2.41 _ZGV2N2v_sinh F
+GLIBC_2.41 _ZGV2N4v_sinh F
+GLIBC_2.41 _ZGV2N8v_sinh F
+GLIBC_2.41 _ZGV4N4v_sinh F
+GLIBC_2.41 _ZGV4N8v_sinh F
+GLIBC_2.41 _ZGV4N16v_sinh F
+GLIBC_2.41 _ZGV8N8v_sinh F
+GLIBC_2.41 _ZGV8N16v_sinh F
+GLIBC_2.41 _ZGV8N32v_sinh F
+
+GLIBC_2.41 _ZGV1N2v_sinpi F
+GLIBC_2.41 _ZGV1N4v_sinpi F
+GLIBC_2.41 _ZGV2N2v_sinpi F
+GLIBC_2.41 _ZGV2N4v_sinpi F
+GLIBC_2.41 _ZGV2N8v_sinpi F
+GLIBC_2.41 _ZGV4N4v_sinpi F
+GLIBC_2.41 _ZGV4N8v_sinpi F
+GLIBC_2.41 _ZGV4N16v_sinpi F
+GLIBC_2.41 _ZGV8N8v_sinpi F
+GLIBC_2.41 _ZGV8N16v_sinpi F
+GLIBC_2.41 _ZGV8N32v_sinpi F
+
+GLIBC_2.41 _ZGV1N2v_cospi F
+GLIBC_2.41 _ZGV1N4v_cospi F
+GLIBC_2.41 _ZGV2N2v_cospi F
+GLIBC_2.41 _ZGV2N4v_cospi F
+GLIBC_2.41 _ZGV2N8v_cospi F
+GLIBC_2.41 _ZGV4N4v_cospi F
+GLIBC_2.41 _ZGV4N8v_cospi F
+GLIBC_2.41 _ZGV4N16v_cospi F
+GLIBC_2.41 _ZGV8N8v_cospi F
+GLIBC_2.41 _ZGV8N16v_cospi F
+GLIBC_2.41 _ZGV8N32v_cospi F
+
+GLIBC_2.41 _ZGV1N2v_tanpi F
+GLIBC_2.41 _ZGV1N4v_tanpi F
+GLIBC_2.41 _ZGV2N2v_tanpi F
+GLIBC_2.41 _ZGV2N4v_tanpi F
+GLIBC_2.41 _ZGV2N8v_tanpi F
+GLIBC_2.41 _ZGV4N4v_tanpi F
+GLIBC_2.41 _ZGV4N8v_tanpi F
+GLIBC_2.41 _ZGV4N16v_tanpi F
+GLIBC_2.41 _ZGV8N8v_tanpi F
+GLIBC_2.41 _ZGV8N16v_tanpi F
+GLIBC_2.41 _ZGV8N32v_tanpi F
+
+GLIBC_2.41 _ZGV1N2v_tgamma F
+GLIBC_2.41 _ZGV1N4v_tgamma F
+GLIBC_2.41 _ZGV2N2v_tgamma F
+GLIBC_2.41 _ZGV2N4v_tgamma F
+GLIBC_2.41 _ZGV2N8v_tgamma F
+GLIBC_2.41 _ZGV4N4v_tgamma F
+GLIBC_2.41 _ZGV4N8v_tgamma F
+GLIBC_2.41 _ZGV4N16v_tgamma F
+GLIBC_2.41 _ZGV8N8v_tgamma F
+GLIBC_2.41 _ZGV8N16v_tgamma F
+GLIBC_2.41 _ZGV8N32v_tgamma F
+
+GLIBC_2.41 _ZGV1N2v_lgamma F
+GLIBC_2.41 _ZGV1N4v_lgamma F
+GLIBC_2.41 _ZGV2N2v_lgamma F
+GLIBC_2.41 _ZGV2N4v_lgamma F
+GLIBC_2.41 _ZGV2N8v_lgamma F
+GLIBC_2.41 _ZGV4N4v_lgamma F
+GLIBC_2.41 _ZGV4N8v_lgamma F
+GLIBC_2.41 _ZGV4N16v_lgamma F
+GLIBC_2.41 _ZGV8N8v_lgamma F
+GLIBC_2.41 _ZGV8N16v_lgamma F
+GLIBC_2.41 _ZGV8N32v_lgamma F
+
+GLIBC_2.41 _ZGV1N2v_log2 F
+GLIBC_2.41 _ZGV1N4v_log2 F
+GLIBC_2.41 _ZGV2N2v_log2 F
+GLIBC_2.41 _ZGV2N4v_log2 F
+GLIBC_2.41 _ZGV2N8v_log2 F
+GLIBC_2.41 _ZGV4N4v_log2 F
+GLIBC_2.41 _ZGV4N8v_log2 F
+GLIBC_2.41 _ZGV4N16v_log2 F
+GLIBC_2.41 _ZGV8N8v_log2 F
+GLIBC_2.41 _ZGV8N16v_log2 F
+GLIBC_2.41 _ZGV8N32v_log2 F
+
+GLIBC_2.41 _ZGV1N2v_log10 F
+GLIBC_2.41 _ZGV1N4v_log10 F
+GLIBC_2.41 _ZGV2N2v_log10 F
+GLIBC_2.41 _ZGV2N4v_log10 F
+GLIBC_2.41 _ZGV2N8v_log10 F
+GLIBC_2.41 _ZGV4N4v_log10 F
+GLIBC_2.41 _ZGV4N8v_log10 F
+GLIBC_2.41 _ZGV4N16v_log10 F
+GLIBC_2.41 _ZGV8N8v_log10 F
+GLIBC_2.41 _ZGV8N16v_log10 F
+GLIBC_2.41 _ZGV8N32v_log10 F
+
+GLIBC_2.41 _ZGV1N2v_cbrt F
+GLIBC_2.41 _ZGV1N4v_cbrt F
+GLIBC_2.41 _ZGV2N2v_cbrt F
+GLIBC_2.41 _ZGV2N4v_cbrt F
+GLIBC_2.41 _ZGV2N8v_cbrt F
+GLIBC_2.41 _ZGV4N4v_cbrt F
+GLIBC_2.41 _ZGV4N8v_cbrt F
+GLIBC_2.41 _ZGV4N16v_cbrt F
+GLIBC_2.41 _ZGV8N8v_cbrt F
+GLIBC_2.41 _ZGV8N16v_cbrt F
+GLIBC_2.41 _ZGV8N32v_cbrt F
+
+GLIBC_2.41 _ZGV1N2v_cdfnorm F
+GLIBC_2.41 _ZGV1N4v_cdfnorm F
+GLIBC_2.41 _ZGV2N2v_cdfnorm F
+GLIBC_2.41 _ZGV2N4v_cdfnorm F
+GLIBC_2.41 _ZGV2N8v_cdfnorm F
+GLIBC_2.41 _ZGV4N4v_cdfnorm F
+GLIBC_2.41 _ZGV4N8v_cdfnorm F
+GLIBC_2.41 _ZGV4N16v_cdfnorm F
+GLIBC_2.41 _ZGV8N8v_cdfnorm F
+GLIBC_2.41 _ZGV8N16v_cdfnorm F
+GLIBC_2.41 _ZGV8N32v_cdfnorm F
+
+GLIBC_2.41 _ZGV1N2v_erfc F
+GLIBC_2.41 _ZGV1N4v_erfc F
+GLIBC_2.41 _ZGV2N2v_erfc F
+GLIBC_2.41 _ZGV2N4v_erfc F
+GLIBC_2.41 _ZGV2N8v_erfc F
+GLIBC_2.41 _ZGV4N4v_erfc F
+GLIBC_2.41 _ZGV4N8v_erfc F
+GLIBC_2.41 _ZGV4N16v_erfc F
+GLIBC_2.41 _ZGV8N8v_erfc F
+GLIBC_2.41 _ZGV8N16v_erfc F
+GLIBC_2.41 _ZGV8N32v_erfc F
+
+GLIBC_2.41 _ZGV1N2v_cdfnorminv F
+GLIBC_2.41 _ZGV1N4v_cdfnorminv F
+GLIBC_2.41 _ZGV2N2v_cdfnorminv F
+GLIBC_2.41 _ZGV2N4v_cdfnorminv F
+GLIBC_2.41 _ZGV2N8v_cdfnorminv F
+GLIBC_2.41 _ZGV4N4v_cdfnorminv F
+GLIBC_2.41 _ZGV4N8v_cdfnorminv F
+GLIBC_2.41 _ZGV4N16v_cdfnorminv F
+GLIBC_2.41 _ZGV8N8v_cdfnorminv F
+GLIBC_2.41 _ZGV8N16v_cdfnorminv F
+GLIBC_2.41 _ZGV8N32v_cdfnorminv F
+
+GLIBC_2.41 _ZGV1N2v_erf F
+GLIBC_2.41 _ZGV1N4v_erf F
+GLIBC_2.41 _ZGV2N2v_erf F
+GLIBC_2.41 _ZGV2N4v_erf F
+GLIBC_2.41 _ZGV2N8v_erf F
+GLIBC_2.41 _ZGV4N4v_erf F
+GLIBC_2.41 _ZGV4N8v_erf F
+GLIBC_2.41 _ZGV4N16v_erf F
+GLIBC_2.41 _ZGV8N8v_erf F
+GLIBC_2.41 _ZGV8N16v_erf F
+GLIBC_2.41 _ZGV8N32v_erf F
+
+GLIBC_2.41 _ZGV1N2v_erfcinv F
+GLIBC_2.41 _ZGV1N4v_erfcinv F
+GLIBC_2.41 _ZGV2N2v_erfcinv F
+GLIBC_2.41 _ZGV2N4v_erfcinv F
+GLIBC_2.41 _ZGV2N8v_erfcinv F
+GLIBC_2.41 _ZGV4N4v_erfcinv F
+GLIBC_2.41 _ZGV4N8v_erfcinv F
+GLIBC_2.41 _ZGV4N16v_erfcinv F
+GLIBC_2.41 _ZGV8N8v_erfcinv F
+GLIBC_2.41 _ZGV8N16v_erfcinv F
+GLIBC_2.41 _ZGV8N32v_erfcinv F
+
+GLIBC_2.41 _ZGV1N2v_erfinv F
+GLIBC_2.41 _ZGV1N4v_erfinv F
+GLIBC_2.41 _ZGV2N2v_erfinv F
+GLIBC_2.41 _ZGV2N4v_erfinv F
+GLIBC_2.41 _ZGV2N8v_erfinv F
+GLIBC_2.41 _ZGV4N4v_erfinv F
+GLIBC_2.41 _ZGV4N8v_erfinv F
+GLIBC_2.41 _ZGV4N16v_erfinv F
+GLIBC_2.41 _ZGV8N8v_erfinv F
+GLIBC_2.41 _ZGV8N16v_erfinv F
+GLIBC_2.41 _ZGV8N32v_erfinv F
--
2.25.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC V4] Enable libmvec support for RISC-V
2024-11-04 4:41 ` Zhijin Zeng
@ 2024-11-05 3:06 ` yulong
0 siblings, 0 replies; 7+ messages in thread
From: yulong @ 2024-11-05 3:06 UTC (permalink / raw)
To: libc-alpha
Cc: Darius Rad, Andrew Waterman, maskray, kito.cheng, wuwei2016,
jiawei, shihua, chenyixuan, Jeff Law, Palmer Dabbelt,
Zhijin Zeng
Hi, Zhijin Zeng:
Thank you for your contribution.
I am still working on the relevant work, but not pushed it to upstream.
Because there is an urgent project being done recently. After that, I
will send patch to upstream as soon as possible.
Thanks!
yulong
在 2024/11/4 12:41, Zhijin Zeng 写道:
> Hi yulong, do you have any further progress? I finish a new version
> libmvec support for risc-v, which also base on implementations by
> Palmer's team over at Rivos.
>
> https://github.com/rivosinc/veclibm/
>
> I can't find the vector function name mangling of risc-v, so I define it
> as follows, maybe it's incorrect, but I think it's worhting discussing.
>
> _ZGV<x>N<y>v<v...>_<func_name>
>
> 'x' is the LMUL, if the LMUL is 1/2/4/8 and 'x' is 1/2/4/8.
>
> 'y' is the count of elements also 'simdlen' in gcc.
>
> 'v..' depends on the number of parameter, there are as many 'v'
> characters as there are parameters.
>
> 'func_name' is the scalar function name.
>
> This path have supported vectorized version for the following math
> function in risc-v (although now only support VLENB <= 256, it's very
> easy to extend to larger VLENB). Besides, I also finish the gcc patch to
> support libmvec in risc-v.
>
> exp/asin/atan/acos/atanh/exp10/exp2/tan/tanh/pow/sin/log/cos/acosh/asinh/atan2/expm1/tgamma/lgamma/log2/log10/cbrt/erfc/erf/cosh/sinh
>
> Hi Palmer, I temporarily change the Copyright information in some files
> which come from veclibm, it's not a viaolation of your Copyright,
> actually I don't know how to solve the conflict between LGPL and
> Apache2.0. If you know, please tell me to fix it, thank you.
>
> Zhijin Zeng
>
>
> 在 2024/5/10 21:06, yulong 写道:
>> 在 2024/5/1 0:26, Palmer Dabbelt 写道:
>>> On Wed, 24 Apr 2024 22:07:31 PDT (-0700), jeffreyalaw@gmail.com wrote:
>>>>
>>>> On 4/15/24 1:21 AM, shiyulong@iscas.ac.cn wrote:
>>>>> From: yulong <shiyulong@iscas.ac.cn>
>>>>>
>>>>> Diff: Chande the version from GLIBC_2.39 to GLIBC_2.40.
>>>>> This patch tries to enable libmvec on RISC-V. I also have demonstrated
>>>>> how this all fits together by adding implementations for vector cos.
>>>>> This patch is a try and we hope to receive valuable comments.
>>>> Just an FYI -- Palmer's team over at Rivos have implementations for a
>>>> number of routines that would fit into libmvec. You might reach out to
>>>> Ping Tak Peter Tang <ptpt@rivosinc.com> for information in his
>>>> implementation.
>>>>
>>>>> https://github.com/rivosinc/veclibm/
>>>>
>>>> THeir implementations may provide good guidance on performant
>>>> implementations of various routines that libmvec typically provides.
>>> Ya, that's the idea of veclibm. The actual functions are written in
>>> a way that's more suitable for some other libraries, but the core
>>> computational implemenations should be the same. A few of us had
>>> briefly talked internally about getting these into glibc, IIUC all
>>> the code was written at Rivos and thus could be copyright assigned to
>>> the FSF and used in glibc. We don't have time to do that right now,
>>> but if you're interested in helping that'd be awesome. We'll need to
>>> be careful with the copyright/licensing, though.
>> Thanks for your reply. I also received an email from Peter Tang. I
>> am very interested in contributing to glibc.
>>> That said, I've never really quite managed to figure out how all the
>>> libmvec stuff is supposed to fit together. I'm more worried about
>>> the ABI side of things than the implementation, so I think starting
>>> with just one function to get the ABI template figure out is a
>>> reasonable way to go and we can get the rest of the implementations
>>> ported over next. The first thing that jumps out on the ABI side of
>>> things is cos() taking EMUL=2 types, I'm not sure if there's a reason
>>> for that but it seems we'd want EMUL=1 to fit more data in the
>>> argument registers?
>> Setting EMUL=2 is just a personal experiment. I think you are right
>> and I will improve it in the next version.
>>> Also, I think some of this can be split out: the
>>> roundtoint/converttoint isn't really a libmvec thing (see
>>> https://inbox.sourceware.org/libc-alpha/20220803174258.4235-1-palmer@rivosinc.com/,
>>> which fails some test), and ptr_barrier() can probably be pulled out
>>> to something generic as it's the same as arm64's version.
>>>
>>> I'm also only seeing draft versions of the vector intrinsics. I know
>>> we merged them into GCC and usually that means things are stable, but
>>> we merged these pre-freeze (based on some assertions things wouldn't
>>> change) and things have drifted around a bit it the spec. I think
>>> we're probably safe just depending on the types, if there's no frozen
>>> version we should at least write down exactly which version we're
>>> following though.
>> We are currently developing based on the latest branches. Can we
>> declare that we are following RVV 1.0?
>>> Also: are there GCC patches for these? It'd be great to be able to
>>> test things through the whole codegen stack so we can make sure it
>>> works.
>> Unfortunately, there are no patches for GCC right now. This may be the
>> direction of future work.
>>>> jeff
> This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not an intended recipient of this message, please delete it and any attachment from your system and notify the sender immediately by reply e-mail. Unintended recipients should not use, copy, disclose or take any action based on this message or any information contained in this message. Emails cannot be guaranteed to be secure or error free as they can be intercepted, amended, lost or destroyed, and you should take full responsibility for security checking.
>
> 本邮件及其任何附件具有保密性质,并可能受其他保护或不允许被披露给第三方。如阁下误收到本邮件,敬请立即以回复电子邮件的方式通知发件人,并将本邮件及其任何附件从阁下系统中予以删除。如阁下并非本邮件写明之收件人,敬请切勿使用、复制、披露本邮件或其任何内容,亦请切勿依本邮件或其任何内容而采取任何行动。电子邮件无法保证是一种安全和不会出现任何差错的通信方式,可能会被拦截、修改、丢失或损坏,收件人需自行负责做好安全检查。
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-11-05 3:06 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-15 7:21 [RFC V4] Enable libmvec support for RISC-V shiyulong
2024-04-25 5:07 ` Jeff Law
2024-04-29 1:12 ` yulong
2024-04-30 16:26 ` Palmer Dabbelt
2024-05-10 13:06 ` yulong
2024-11-04 4:41 ` Zhijin Zeng
2024-11-05 3:06 ` yulong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).